GlotSuite Collection GlotSuite: Paving the Way for Bringing Generative AI to Underserved Communities • 15 items • Updated 3 minutes ago • 1
The Role of Language Imbalance in Cross-lingual Generalisation: Insights from Cloned Language Experiments Paper • 2404.07982 • Published Apr 11, 2024 • 1
Challenging the Evaluator: LLM Sycophancy Under User Rebuttal Paper • 2509.16533 • Published Sep 20, 2025 • 1
Omnilingual MT: Machine Translation for 1,600 Languages Paper • 2603.16309 • Published 26 days ago • 21
Qianfan-OCR: A Unified End-to-End Model for Document Intelligence Paper • 2603.13398 • Published Mar 11 • 153
Languages identification Collection a variety of pre-trained language identification models • 9 items • Updated Jul 31, 2025 • 2
OLDI and friends Collection This collection groups the datasets that have been featured as part of WMT’s Open Language Data Initiative shared task. • 5 items • Updated 18 days ago • 5
Insights from the ICLR Peer Review and Rebuttal Process Paper • 2511.15462 • Published Nov 19, 2025 • 7
mmBERT: a modern multilingual encoder Collection mmBERT is trained on 3T tokens from over 1800 languages, showing SoTA scores on benchmarks and exceptional low-resource performance • 16 items • Updated Sep 9, 2025 • 53
CoBia: Constructed Conversations Can Trigger Otherwise Concealed Societal Biases in LLMs Paper • 2510.09871 • Published Oct 10, 2025 • 3
Multi-Turn Puzzles: Evaluating Interactive Reasoning and Strategic Dialogue in LLMs Paper • 2508.10142 • Published Aug 13, 2025 • 3
view changelog Hugging Face Changelog Connect Your MCP Client to the Hugging Face Hub Jun 6, 2025 • 114
view article Article Reachy Mini - The Open-Source Robot for Today's and Tomorrow's AI Builders Jul 9, 2025 • 793
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language Paper • 2506.20920 • Published Jun 26, 2025 • 78