• By Piyush Srivastava and Karnik Modi Character.ai, a leading AI entertainment platform with about 20 million worldwide users, wanted to optimize GPU performance and achieve lower inference costs for its application, which requires low-latency performance at large scale. • They approached DigitalOcean and AMD in order to achieve this goal. • Working closely together, the Character.ai, AMD, and DigitalOcean teams optimized AMD Instinct™ MI300X and MI325X GPU platforms, resulting in a 2x production inference throughput. • In optimized configurations, DigitalOcean delivered high request density per node while maintaining exceptional p90 responsiveness for initial token and sustained token generation throughput, outperforming prior deployments on generic, non-optimized GPU infrastructure. • These gains were achieved through platform-level optimizations, including clever parallelization strategies for large Mixture-of-Experts models, efficient FP8 execution paths, optimized kernels with AITER, topology-aware GPU allocation, and production-ready Kubernetes orchestration through DigitalOcean Kubernetes (DOKS). • Together, these capabilities allowed Character.ai to scale inference predictably without increasing operational burden.

Article Summaries:

  • Character.ai, an AI‑entertainment platform with roughly 20 million users, partnered with DigitalOcean and AMD to double its production inference throughput. By jointly optimizing AMD Instinct MI300X/MI325X GPUs-implementing FP8 execution, advanced parallelization for Mixture‑of‑Experts models, topology‑aware allocation, and Kubernetes‑based orchestration-DigitalOcean achieved twice the request‑per‑second rate of prior generic GPU deployments while maintaining strict latency targets. The Qwen3‑235B Instruct FP8 model saw a 2× QPS gain on an 8‑GPU server, enabling Character.ai to scale inference predictably. The collaboration culminated in a multi‑year, eight‑figure GPU infrastructure agreement with DigitalOcean.

Sources: