Deploying Open Source Vision Language Models (VLM) on Jetson

• Deploying Open Source Vision Language Models (VLM) on Jetson Vision-Language Models (VLMs) mark a significant leap in AI by blending visual perception with semantic reasoning. • Moving beyond traditional models constrained by fixed labels, VLMs utilize a joint embedding space to interpret and discuss complex, open-ended environments using natural language. • The rapid evolution of reasoning accuracy and efficiency has made these models ideal for edge devices. • The NVIDIA Jetson family, ranging from the high-performance AGX Thor and AGX Orin to the compact Orin Nano Super is purpose-built to drive accelerated applications for physical AI and robotics, providing the optimized runtime necessary for leading open source models. • In this tutorial, we will demonstrate how to deploy the NVIDIA Cosmos Reason 2B model across the Jetson lineup using the vLLM framework. • We will also guide you through connecting this model to the Live VLM WebUI, enabling a real-time, webcam-based interface for interactive physical AI.

Article Summaries:

NVIDIA has released a tutorial for deploying its open‑source Vision‑Language Model (VLM), Cosmos Reasoning 2B, on Jetson edge devices. The guide shows how to download the FP8‑quantized checkpoint via the NGC CLI, pull the appropriate vLLM Docker image for AGX Thor, AGX Orin, or Orin Nano Super, and launch the model as a volume‑mounted container. It also explains connecting the Live VLM WebUI for real‑time, webcam‑based interaction. Supported Jetson boards require JetPack 6 or 7, an NVMe SSD (≈5 GB for weights, ≈8 GB for the container), and use 0.8 GB GPU memory on Thor/Orin and 0.65 GB on the memory‑constrained Nano. The workflow demonstrates efficient, on‑device VLM inference for robotics and physical AI applications.
NVIDIA demonstrates how to run the open‑source Cosmos Reason 2B vision‑language model on its Jetson edge platform. Using the vLLM inference framework, the tutorial shows downloading the FP8‑quantized checkpoint from NVIDIA NGC, pulling the appropriate Docker image for AGX Thor, AGX Orin, or Orin Nano Super, and launching the model as a container. The setup connects to the Live VLM WebUI for a webcam‑based, real‑time interface. Supported devices require JetPack 6 (Orin) or JetPack 7 (Thor), an NVMe SSD, and 0.8 GB GPU memory for the larger models, with a 256‑token limit on the Nano. This workflow enables edge AI applications that blend visual perception with natural‑language reasoning.

Sources:

https://huggingface.co/blog/nvidia/cosmos-on-jetson (Latest source article published: 2026-02-24 00:00 UTC)