• Datadog handles millions of logs per second, requiring instant config updates across thousands of containers. • User‑defined log parsing rules are applied immediately, demanding a lightning‑fast propagation mechanism in a multi‑tenant environment. • “Context data” refers to per‑tenant configuration that must be reliably distributed to every workload container. • Traditional CRUD interfaces hide the complexity of ensuring all containers see changes without lag or inconsistency. • The engineering team built a custom internal system to deliver config updates with sub‑second latency and high availability. • This approach guarantees that log processing, data scanning, and quota settings stay in sync across the entire Datadog backend.
Article Summaries:
- Datadog’s log‑processing platform must handle millions of logs per second across thousands of containers, so any user‑defined configuration change-such as a new log‑parsing rule-has to reach every container almost instantly. The company’s internal system, designed to propagate per‑tenant “context data” with low latency while maintaining high reliability, addresses the complexity of a multi‑tenant, distributed environment where failures are inevitable. Simple approaches, like a single relational database, fall short at this scale. The new architecture ensures rapid, fault‑tolerant distribution of configuration updates to all workload containers.
Sources: