• Liang Mou | Staff Software Engineer, Logging Platform Yisheng Zhou | Software Engineer II, Logging Platform Elizabeth (Vi) Nguyen | Software Engineer I, Logging Platform Owen Zhang | Senior Software Engineer, Logging Platform Introduction As Pinterest has grown, the demand for a robust, real-time, and cost-effective database ingestion platform has become increasingly urgent. • Our data ecosystem powers a diverse set of use cases - from analytics and machine learning to product features and business intelligence - all of which depend on timely and reliable data. • However, our legacy ingestion landscape was built on batch-oriented workflows and a patchwork of database dump solutions, each developed and maintained by different teams. • This fragmentation made it difficult to deliver the performance, reliability, and agility required by modern data workloads. • In this blog series, we’ll share our journey in building Pinterest’s next-generation database ingestion framework. • In this first part, we’ll discuss the legacy challenges we faced, the architectural principles that shaped our new solution, and the key optimizations that enabled us to achieve significant improvements in latency, efficiency, and compliance.
Article Summaries:
- Pinterest has launched a unified, change‑data‑capture (CDC) ingestion framework to replace its fragmented, batch‑based legacy pipelines. Built on Debezium/TiCDC, Kafka, Flink, Spark and Iceberg, the new system pulls database changes from MySQL, TiDB and KVStore in under a second, delivering them to downstream stores within minutes rather than hours. It processes only changed rows, supports row‑level deletions, and guarantees at‑least‑once delivery, cutting compute and storage waste and improving compliance. The architecture is generic, scalable to petabyte‑scale workloads, and config‑driven, enabling rapid onboarding of thousands of pipelines across Pinterest’s data ecosystem.
Sources: