• AWS News Blog Announcing Amazon SageMaker Inference for custom Amazon Nova models | Since we launched Amazon Nova customization in Amazon SageMaker AI at AWS NY Summit 2025, customers have been asking for the same capabilities with Amazon Nova as they do when they customize open weights models in Amazon SageMaker Inference. • They also wanted have more control and flexibility in custom model inference over instance types, auto-scaling policies, context length, and concurrency settings that production workloads demand. • Today, we’re announcing the general availability of custom Nova model support in Amazon SageMaker Inference, a production-grade, configurable, and cost-efficient managed inference service to deploy and scale full-rank customized Nova models. • You can now experience an end-to-end customization journey to train Nova Micro, Nova Lite, and Nova 2 Lite models with reasoning capabilities using Amazon SageMaker Training Jobs or Amazon HyperPod and seamlessly deploy them with managed inference infrastructure of Amazon SageMaker AI. • With Amazon SageMaker Inference for custom Nova models, you can reduce inference cost through optimized GPU utilization using Amazon Elastic Compute Cloud (Amazon EC2) G5 and G6 instances over P5 instances, auto-scaling based on 5-minute usage patterns, and configurable inference parameters. • This feature enables deployment of customized Nova models with continued pre-training, supervised fine-tuning, or reinforcement fine-tuning for your use cas

Article Summaries:

  • AWS has announced the general‑availability of Amazon SageMaker Inference for custom Amazon Nova models, extending the platform’s production‑grade inference capabilities to Nova Micro, Nova Lite, and Nova 2 Lite. The new service lets customers fine‑tune and deploy their own Nova models with configurable parameters such as context length, concurrency, and batch size, and supports auto‑scaling based on 5‑minute usage patterns. It offers cost‑efficient GPU utilization on EC2 G5 and G6 instances, with optional P5 instances for higher‑performance workloads. Deployment can be done via SageMaker Studio or the SageMaker AI SDK, and endpoints are provisioned with a range of instance types for each model tier.

Sources: