Scaling Latent Reasoning via Looped Language Models Paper • 2510.25741 • Published Oct 29, 2025 • 221 • 8
Why Can't Transformers Learn Multiplication? Reverse-Engineering Reveals Long-Range Dependency Pitfalls Paper • 2510.00184 • Published Sep 30, 2025 • 16 • 3
Dynamic Chunking for End-to-End Hierarchical Sequence Modeling Paper • 2507.07955 • Published Jul 10, 2025 • 26 • 4
Energy-Based Transformers are Scalable Learners and Thinkers Paper • 2507.02092 • Published Jul 2, 2025 • 69 • 26
Energy-Based Transformers are Scalable Learners and Thinkers Paper • 2507.02092 • Published Jul 2, 2025 • 69 • 26
APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay Paper • 2504.03601 • Published Apr 4, 2025 • 17 • 4
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach Paper • 2502.05171 • Published Feb 7, 2025 • 151 • 12
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach Paper • 2502.05171 • Published Feb 7, 2025 • 151 • 12