ZClip: Adaptive Spike Mitigation for LLM Pre-Training Paper โข 2504.02507 โข Published Apr 3, 2025 โข 88
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases Paper โข 2402.14905 โข Published Feb 22, 2024 โข 134