• Written by Sky Kistler. • Our goal was straightforward: host Kafka on Kubernetes via Strimzi and deprecate our existing EC2-backed Kafka clusters, which in total comprised 500+ brokers serving tens of millions of messages per second and storing over a petabyte in live topic data. • The motivation was equally straightforward. • Our EC2 brokers were cumbersomely managed with Terraform, Puppet, and a collection of custom AWS CLI tooling. • Rotations and interventions were orchestrated directly from operator laptops. • It worked, but it was slow, error-prone, and increasingly expensive, especially as the number and size of our clusters grew exponentially.
Article Summaries:
- Reddit has migrated its petabyte‑scale Kafka fleet from EC2‑based brokers to Kubernetes using the Strimzi CNCF project. The move aimed to reduce operational toil, lower costs, and enable declarative management of tens of millions of messages per second. Key constraints forced a zero‑downtime, metadata‑preserving strategy: Kafka had to remain fully available, client offsets could not be rewritten, and clients were hard‑coded to specific broker IPs. Consequently, new Kubernetes‑hosted brokers were added to the existing cluster, allowing legacy EC2 nodes to coexist temporarily. The migration demonstrates how large, critical Kafka deployments can be transitioned to container orchestration while maintaining continuous service.
Sources: