Global Prior Meets Local Consistency: Dual-Memory Augmented Vision-Language-Action Model for Efficient Robotic Manipulation

• Computer Science > Robotics [Submitted on 22 Feb 2026] Title:Global Prior Meets Local Consistency: Dual-Memory Augmented Vision-Language-Action Model for Efficient Robotic Manipulation View PDF HTML (experimental)Abstract:Hierarchical Vision-Language-Action (VLA) models have rapidly become a dominant paradigm for robotic manipulation. • It typically comprising a Vision-Language backbone for perception and understanding, together with a generative policy for action generation. • However, its performance is increasingly bottlenecked by the action generation proceess. • (i) Low inference efficiency. • A pronounced distributional gap between isotropic noise priors and target action distributions, which increases denoising steps and the incidence of infeasible samples. • (ii) Poor robustness.

Article Summaries:

Computer Science > Robotics [Submitted on 22 Feb 2026] Title:Global Prior Meets Local Consistency: Dual-Memory Augmented Vision-Language-Action Model for Efficient Robotic Manipulation View PDF HTML (experimental)Abstract:Hierarchical Vision-Language-Action (VLA) models have rapidly become a dominant paradigm for robotic manipulation. It typically comprising a Vision-Language backbone for perception and understanding, together with a generative policy for action generation. However, its performance is increasingly bottlenecked by the action generation proceess. (i) Low inference efficiency. A

Sources:

https://arxiv.org/abs/2602.20200 (Latest source article published: 2026-02-25 05:00 UTC)