F-GRPO: Factorized Group-Relative Policy Optimization for Unified Candidate Generation and Ranking
Paper • 2605.12995 • Published • 1
We're the McAuley Lab at UC San Diego with PI Prof. Julian McAuley, focusing on cool machine learning and natural language processing applications!
F-GRPO: Factorized Group-Relative Policy Optimization for Unified Candidate Generation and Ranking
Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning