Training datasets for the fineweb2 classifier.
EuroLingua-GPT
community
AI & ML interests
None defined yet.
Recent Activity
View all activity
Organization Card
EuroLingua-GPT
🧠 What is EuroLingua-GPT?
EuroLingua-GPT is a multilingual large language model initiative led by Fraunhofer IAIS, AI Sweden, and TU Dresden. It aims to build a state-of-the-art open-source LLM tailored for Europe, covering 37 European languages and beyond.
🎯 Project Goal
- Develop a high-performing multilingual LLM optimized for European languages.
- Collect, curate, and evaluate large-scale multilingual datasets.
- Train and align the model using the latest in transformer and instruction-tuning techniques.
- Openly release the model to support research, innovation, and responsible AI development in Europe.
- Training Framework: GitHub - Modalities
🗓️ Project Timeline May 1, 2024 – October 1, 2025
models
0
None public yet
datasets
10
Eurolingua/DCLM-200-100k-exact-dedup
Viewer
•
Updated
•
17.7M
•
11
Eurolingua/DCLM-200-100k-unfiltered
Viewer
•
Updated
•
18.9M
•
108
•
1
Eurolingua/HPLT3-198-500k
Preview
•
Updated
•
328
•
1
Eurolingua/fw2_edu_scores
Updated
•
9
Eurolingua/truthfulqax
Updated
•
638
•
1
Eurolingua/gsm8kx
Updated
•
6.03k
•
2
Eurolingua/hellaswagx
Updated
•
744
•
1
Eurolingua/arcx
Updated
•
1.07k
•
1
Eurolingua/mmlux
Updated
•
2.01k
•
1
Eurolingua/tokenizer_final_dataset
Updated
•
6