MBZUAI/LLaVA-Meta-Llama-3-8B-Instruct-FT-S2
Text Generation • 8B • Updated • 11 • 4
Natural Language Processing, Machine Learning, and Computer Vision
CEPO: RLVR Self-Distillation using Contrastive Evidence Policy Optimization
SafeDiffusion-R1: Online Reward Steering for Safe Diffusion Post-Training