Mixing Mechanisms: How Language Models Retrieve Bound Entities In-Context Paper • 2510.06182 • Published Oct 7, 2025 • 8
ASPO: Asymmetric Importance Sampling Policy Optimization Paper • 2510.06062 • Published Oct 7, 2025 • 13
Scaling Code-Assisted Chain-of-Thoughts and Instructions for Model Reasoning Paper • 2510.04081 • Published Oct 5, 2025 • 23
Fathom-DeepResearch: Unlocking Long Horizon Information Retrieval and Synthesis for SLMs Paper • 2509.24107 • Published Sep 28, 2025 • 78
TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning Paper • 2510.06217 • Published Oct 7, 2025 • 63
Margin Adaptive DPO: Leveraging Reward Model for Granular Control in Preference Optimization Paper • 2510.05342 • Published Oct 6, 2025 • 5