NVIDIA adds Cosmos Policy to its world foundation models

• is continuously expanding its NVIDIA Cosmos world foundation models, or WFMs, to tackle problems in robotics, autonomous vehicle development, and industrial vision AI. • The company recently introduced Cosmos Policy, its latest research on advancing robot control and planning using Cosmos WFMs. • Cosmos Policy is a new robot control policy that post-trains the Cosmos Predict-2 world foundation model for manipulation tasks. • It directly encodes robot actions and future states into the model, achieving state-of-the-art (SOTA) performance on LIBERO and RoboCasa benchmarks, said NVIDIA. • The company obtained Cosmos Policy by fine-tuning Cosmos Predict, a WFM trained to predict future frames. • Instead of introducing new architectural components or separate action modules, Cosmos Policy adapts the pretrained model directly through a single stage of post-training on robot demonstration data.

Article Summaries:

NVIDIA has expanded its Cosmos world foundation models (WFMs) by adding Cosmos Policy, a new robot‑control policy built on the Cosmos Predict‑2 WFM. The policy is created through a single post‑training stage on robot demonstration data, without adding new architectural components. By encoding actions, states, and success scores as latent frames and using the same diffusion process as video generation, Cosmos Policy learns visuomotor control, world modeling, and value prediction jointly. It achieves state‑of‑the‑art results on the LIBERO and RoboCasa benchmarks and can be deployed as either a direct action generator or a planning policy that evaluates candidate actions.
NVIDIA has expanded its Cosmos world foundation models (WFMs) by adding Cosmos Policy, a new robot‑control policy built on the Cosmos Predict‑2 WFM. The policy is created through a single post‑training stage on robot demonstration data, without adding new architectural components. By treating robot actions, states, and success scores as latent video frames, Cosmos Policy learns to predict action sequences, future observations, and expected returns in a unified diffusion framework. The approach achieved state‑of‑the‑art results on the LIBERO and RoboCasa manipulation benchmarks, demonstrating improved visuomotor control, world modeling, and planning capabilities for robotics and autonomous systems.

Sources:

https://www.therobotreport.com/nvidia-adds-cosmos-policy-world-foundation-models/ (Latest source article published: 2026-02-19 13:13 UTC)