Controllable Exploration in Hybrid-Policy RLVR for Multi-Modal Reasoning
• Computer Science > Machine Learning [Submitted on 22 Feb 2026] Title:Controllable Exploration in Hybrid-Policy RLVR for Multi-Modal Reasoning View PDF HTML (experimental)Abstract