COSMOS: A Hybrid Adaptive Optimizer for Memory-Efficient Training of LLMs Paper • 2502.17410 • Published Feb 24, 2025
LLMs Can Generate a Better Answer by Aggregating Their Own Responses Paper • 2503.04104 • Published Mar 6, 2025 • 1
SlimMoE: Structured Compression of Large MoE Models via Expert Slimming and Distillation Paper • 2506.18349 • Published Jun 23, 2025 • 13