Build with Kimi K2.5 Multimodal VLM Using NVIDIA GPU-Accelerated Endpoints

• Kimi K2.5 is a multimodal vision‑language model trained with Megatron‑LM. • It contains 1 trillion parameters, 384 experts, a single dense layer, and 3.2% activation per token. • MoonViT3d vision tower converts images and video frames into embeddings for visual processing. • Kimi excels at chat, reasoning, coding, math across text, image, and video inputs. • Developers can prototype Kimi K2.5 for free on NVIDIA GPU‑accelerated endpoints via the Developer Program. • Deploy with vLLM or fine‑tune using NVIDIA NeMo for domain‑specific applications.

Article Summaries:

NVIDIA has announced Kimi K2.5, a 1‑trillion‑parameter open vision‑language model (VLM) that supports text, image, and video inputs. Built on the Megatron‑LM framework, Kimi K2.5 uses a mixture‑of‑experts (MoE) architecture with 384 experts and a single dense layer, achieving a 3.2 % activation rate per token. The model features a 262K‑token context window, 61 layers (60 MoE), 64 attention heads, and a 164K‑token vocabulary that includes vision‑specific tokens. NVIDIA offers free prototyping via GPU‑accelerated endpoints on build.nvidia.com and API access through its Developer Program, with support for deployment via vLLM and fine‑tuning through NeMo.

Sources:

https://developer.nvidia.com/blog/build-with-kimi-k2-5-multimodal-vlm-using-nvidia-gpu-accelerated-endpoints/