-
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Paper • 2501.18585 • Published • 61 -
RWKV-7 "Goose" with Expressive Dynamic State Evolution
Paper • 2503.14456 • Published • 153 -
DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning
Paper • 2503.15265 • Published • 46 -
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
Paper • 2503.15558 • Published • 50
Collections
Discover the best community collections!
Collections including paper arxiv:2508.06471
-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 24 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 152 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
-
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 509 -
zai-org/GLM-4.6
Text Generation • 357B • Updated • 84.9k • • 1.2k -
deepseek-ai/DeepSeek-R1
Text Generation • 685B • Updated • 994k • • 13.1k -
deepseek-ai/DeepSeek-V3.2-Exp
Text Generation • Updated • 51.2k • • 967
-
gradientai/Llama-3-8B-Instruct-Gradient-1048k
Text Generation • Updated • 9.02k • 680 -
Are Your LLMs Capable of Stable Reasoning?
Paper • 2412.13147 • Published • 93 -
RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation
Paper • 2412.11919 • Published • 36 -
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper • 2412.18925 • Published • 107
-
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 19 -
Evaluating Large Language Models Trained on Code
Paper • 2107.03374 • Published • 8 -
Training language models to follow instructions with human feedback
Paper • 2203.02155 • Published • 24 -
GPT-4 Technical Report
Paper • 2303.08774 • Published • 7
-
Two Minds Better Than One: Collaborative Reward Modeling for LLM Alignment
Paper • 2505.10597 • Published -
COIG-P: A High-Quality and Large-Scale Chinese Preference Dataset for Alignment with Human Values
Paper • 2504.05535 • Published • 44 -
nvidia/HelpSteer3
Viewer • Updated • 133k • 4.04k • 97 -
nvidia/Nemotron-RL-instruction_following
Preview • Updated • 99 • 11
-
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Paper • 2501.18585 • Published • 61 -
RWKV-7 "Goose" with Expressive Dynamic State Evolution
Paper • 2503.14456 • Published • 153 -
DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning
Paper • 2503.15265 • Published • 46 -
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
Paper • 2503.15558 • Published • 50
-
gradientai/Llama-3-8B-Instruct-Gradient-1048k
Text Generation • Updated • 9.02k • 680 -
Are Your LLMs Capable of Stable Reasoning?
Paper • 2412.13147 • Published • 93 -
RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation
Paper • 2412.11919 • Published • 36 -
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper • 2412.18925 • Published • 107
-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 24 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 152 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
-
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 19 -
Evaluating Large Language Models Trained on Code
Paper • 2107.03374 • Published • 8 -
Training language models to follow instructions with human feedback
Paper • 2203.02155 • Published • 24 -
GPT-4 Technical Report
Paper • 2303.08774 • Published • 7
-
Two Minds Better Than One: Collaborative Reward Modeling for LLM Alignment
Paper • 2505.10597 • Published -
COIG-P: A High-Quality and Large-Scale Chinese Preference Dataset for Alignment with Human Values
Paper • 2504.05535 • Published • 44 -
nvidia/HelpSteer3
Viewer • Updated • 133k • 4.04k • 97 -
nvidia/Nemotron-RL-instruction_following
Preview • Updated • 99 • 11
-
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 509 -
zai-org/GLM-4.6
Text Generation • 357B • Updated • 84.9k • • 1.2k -
deepseek-ai/DeepSeek-R1
Text Generation • 685B • Updated • 994k • • 13.1k -
deepseek-ai/DeepSeek-V3.2-Exp
Text Generation • Updated • 51.2k • • 967