SYNTH Collection Fully generalist synthetic dataset and SOTA small reasoners • 3 items • Updated Nov 10, 2025 • 11
SindBERT, the Sailor: Charting the Seas of Turkish NLP Paper • 2510.21364 • Published Oct 24, 2025 • 1
The German Commons - 154 Billion Tokens of Openly Licensed Text for German Language Models Paper • 2510.13996 • Published Oct 15, 2025 • 9