Models from the paper "LaSeR: Reinforcement Learning with Last-Token Self-Rewarding"
Wenkai Yang
Keven16
AI & ML interests
None yet
Recent Activity
upvoted
a
paper
4 days ago
Mixture of Horizons in Action Chunking
upvoted
a
paper
about 1 month ago
Visual Backdoor Attacks on MLLM Embodied Decision Making via Contrastive
Trigger Learning
commented on
a paper
about 1 month ago
Critique-RL: Training Language Models for Critiquing through Two-Stage
Reinforcement Learning
Organizations
None yet