Controllable Exploration in Hybrid-Policy RLVR for Multi-Modal Reasoning

Controllable Exploration in Hybrid-Policy RLVR for Multi-Modal Reasoning

• Computer Science > Machine Learning [Submitted on 22 Feb 2026] Title:Controllable Exploration in Hybrid-Policy RLVR for Multi-Modal Reasoning View PDF HTML (experimental)Abstract

Research & Labs · February 25, 2026 (updated February 25, 2026) · 2 min · 246 words