R-Horizon: How Far Can Your Large Reasoning Model Really Go in Breadth and Depth? Paper โข 2510.08189 โข Published Oct 9, 2025 โข 26
view article Article The Transformers Library: standardizing model definitions +2 May 15, 2025 โข 121
view article Article You could have designed state of the art positional encoding Nov 25, 2024 โข 428
SmolVLM: Redefining small and efficient multimodal models Paper โข 2504.05299 โข Published Apr 7, 2025 โข 202
view article Article LLM Inference on Edge: A Fun and Easy Guide to run LLMs via React Native on your Phone! Mar 7, 2025 โข 89
Domino: Eliminating Communication in LLM Training via Generic Tensor Slicing and Overlapping Paper โข 2409.15241 โข Published Sep 23, 2024 โข 1
Scaling Laws for Floating Point Quantization Training Paper โข 2501.02423 โข Published Jan 5, 2025 โข 26
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper โข 2404.14219 โข Published Apr 22, 2024 โข 259
Small-scale proxies for large-scale Transformer training instabilities Paper โข 2309.14322 โข Published Sep 25, 2023 โข 21
Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets Paper โข 2201.02177 โข Published Jan 6, 2022 โข 2
view article Article A failed experiment: Infini-Attention, and why we should keep trying? +1 Aug 14, 2024 โข 73
Grokfast: Accelerated Grokking by Amplifying Slow Gradients Paper โข 2405.20233 โข Published May 30, 2024 โข 7
Transformer Explainer: Interactive Learning of Text-Generative Models Paper โข 2408.04619 โข Published Aug 8, 2024 โข 174