Efficient Long-context Language Model Training by Core Attention Disaggregation Paper • 2510.18121 • Published Oct 20 • 120
FastWan Collection models trained with video sparse attention: https://arxiv.org/abs/2505.13389 and distillation • 9 items • Updated 17 days ago • 10
FastWan Collection models trained with video sparse attention: https://arxiv.org/abs/2505.13389 and distillation • 9 items • Updated 17 days ago • 10
FastWan Collection models trained with video sparse attention: https://arxiv.org/abs/2505.13389 and distillation • 9 items • Updated 17 days ago • 10
FastWan Collection models trained with video sparse attention: https://arxiv.org/abs/2505.13389 and distillation • 9 items • Updated 17 days ago • 10
Satori-SWE: Evolutionary Test-Time Scaling for Sample-Efficient Software Engineering Paper • 2505.23604 • Published May 29 • 23