Your guide to Provisioned Throughput (PT) on Vertex AI

• Your guide to Provisioned Throughput (PT) on Vertex AI Senior Product Manager, Vertex AI Our most intelligent model available yet for complex tasks on Gemini Enterprise and Vertex AI When AI agents make thousands of decisions a day, consistent performance isn’t just a technical detail - it’s a business requirement. • Provisioned Throughput (PT) solves this by giving you reserved resources that guarantee capacity and predictable performance. • To help you scale, we are updating PT on Vertex AI with three key improvements: Model diversity:Run the right model for the right job. • Model diversity:Run the right model for the right job. • Multimodal innovation:Process text, images, and video seamlessly. • Multimodal innovation:Process text, images, and video seamlessly.

Article Summaries:

Google’s Vertex AI has expanded its Provisioned Throughput (PT) offering to deliver predictable compute for AI agents. The update introduces three key improvements: (1) Model diversity - PT now covers a broader portfolio, including Anthropic models (private preview) and popular open‑source models such as Llama 4 and Qwen3, all managed from a single console. (2) Multimodal innovation - dedicated PT is available for Gemini 3, Nano Banana, Gemini Live API, and video models Veo 3/3.1, enabling reliable real‑time audio, video, and image processing. (3) Operational flexibility - new 1‑week PT terms and proactive capacity planning tools let teams align compute with short‑term business spikes.

Sources:

https://cloud.google.com/blog/products/ai-machine-learning/provisioned-throughput-on-vertex-ai/ (Latest source article published: 2026-02-18 17:00 UTC)