Introducing Waypoint-1: Real-time interactive video diffusion from Overworld

• Waypoint‑1 is Overworld’s real‑time interactive video diffusion model, controllable via text, mouse, and keyboard. • Trained on 10,000 hours of diverse video‑game footage with control inputs and captions. • Uses a frame‑causal rectified flow transformer, learning to denoise future frames from past context. • Latent, compressed‑frame model delivers zero‑latency camera movement and keyboard control on consumer hardware. • Generates procedural, interactive worlds frame‑by‑frame, allowing seamless exploration without pre‑built assets. • Weights available on Hugging Face (Waypoint‑1‑Small, Medium soon) and demo on Overworld Stream.

Article Summaries:

Waypoint-1: Real-time Interactive Video Diffusion from Overworld Waypoint-1 Weights on the Hub - Waypoint-1-Small - Waypoint-1-Medium (Coming Soon!) Try Out The Model Overworld Stream: https://overworld.stream What is Waypoint-1? Waypoint-1 is Overworld’s real-time-interactive video diffusion model, controllable and prompted via text, mouse, and keyboard. You can give the model some frames, run the model, and have it create a world you can step into and interact with. The backbone of the model is a frame-causal rectified flow transformer trained on 10,000 hours of diverse video game footage pa

Sources:

https://huggingface.co/blog/waypoint-1