Recursive Belief Vision Language Model

• Computer Science > Artificial Intelligence [Submitted on 24 Feb 2026] Title:Recursive Belief Vision Language Model View PDF HTML (experimental)Abstract:Current vision-language-action (VLA) models struggle with long-horizon manipulation under partial observability. • Most existing approaches remain observation-driven, relying on short context windows or repeated queries to vision-language models (VLMs). • This leads to loss of task progress, action repetition under perceptual aliasing, and high inference latency. • Semantic reasoning alone is not the primary bottleneck in long-horizon manipulation. • Instead, VLAs lack persistent, action-conditioned state representations and exhibit limited temporal and physical reasoning, making them ill-suited for multi-stage control. • This paper introduces RB-VLA, a belief-centric architecture trained with self-supervised world-model objectives that maintains a compact latent state encoding task-relevant history, dynamics, and object interactions.

Article Summaries:

Computer Science > Artificial Intelligence [Submitted on 24 Feb 2026] Title:Recursive Belief Vision Language Model View PDF HTML (experimental)Abstract:Current vision-language-action (VLA) models struggle with long-horizon manipulation under partial observability. Most existing approaches remain observation-driven, relying on short context windows or repeated queries to vision-language models (VLMs). This leads to loss of task progress, action repetition under perceptual aliasing, and high inference latency. Semantic reasoning alone is not the primary bottleneck in long-horizon manipulation. I

Sources:

https://arxiv.org/abs/2602.20659 (Latest source article published: 2026-02-25 05:00 UTC)