SonicMoE: Accelerating MoE with IO and Tile-aware Optimizations Paper • 2512.14080 • Published 9 days ago • 5
The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text Paper • 2506.05209 • Published Jun 5 • 59
SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published Apr 7 • 202
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published Feb 4 • 252
Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations Paper • 2405.18392 • Published May 28, 2024 • 12