Nemotron-Post-Training-v3 Collection Collection of datasets used in the post-training phase of Nemotron Nano v3. • 7 items • Updated 9 days ago • 53
TC-Light: Temporally Consistent Relighting for Dynamic Long Videos Paper • 2506.18904 • Published Jun 23, 2025 • 10
Sherlock: Self-Correcting Reasoning in Vision-Language Models Paper • 2505.22651 • Published May 28, 2025 • 48
Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining Paper • 2410.00564 • Published Oct 1, 2024 • 1
Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning Paper • 2504.15275 • Published Apr 21, 2025 • 2
Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining Paper • 2410.00564 • Published Oct 1, 2024 • 1
The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models Paper • 2505.22617 • Published May 28, 2025 • 131
Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning Paper • 2504.15275 • Published Apr 21, 2025 • 2
PURE Collection PRM and fine-tuned LLM used in our PURE github repo: https://github.com/CJReinforce/PURE • 5 items • Updated May 22, 2025 • 2
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model Paper • 2503.24290 • Published Mar 31, 2025 • 62