-
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 144 -
Training language models to follow instructions with human feedback
Paper • 2203.02155 • Published • 24 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper • 2307.09288 • Published • 248 -
The Llama 3 Herd of Models
Paper • 2407.21783 • Published • 117
Dmitrij Gusev
mftrash
AI & ML interests
None yet
Organizations
None yet
Post-training
-
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 144 -
Training language models to follow instructions with human feedback
Paper • 2203.02155 • Published • 24 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper • 2307.09288 • Published • 248 -
The Llama 3 Herd of Models
Paper • 2407.21783 • Published • 117
models
0
None public yet
datasets
0
None public yet