• Improving MySQL® Cluster Uptime: Designing Advanced Detection, Mitigation, and Consensus with Group Replication 2 December 2025 / GlobalIntroduction At Uber, engineers rely on MySQL® for applications that need relational databases. • MySQL is the preferred choice for use cases that require ACID transactions and relational data modeling with a SQL interface. • We support over 2,600 MySQL clusters. • MySQL clusters follow the topology of a single primary and multiple-replica model. • The replication is by default asynchronous, where the replica nodes poll the binlogs from the primary server. • Only the primary node is used to serve write requests, and the read requests are served from replicas in a round-robin fashion with region affinity.
Article Summaries:
- Uber’s engineering team has upgraded its MySQL cluster architecture to boost uptime and reduce downtime. The company, which supports over 2,600 clusters, previously relied on a single‑primary, asynchronous‑replica model with external tools (HC and MEPA) for failure detection and promotion. These tools caused delays, sometimes exceeding the 120‑second SLA, and were vulnerable to their own failures. Uber now implements MySQL Group Replication (MGR) with a three‑node consensus group using Paxos, allowing automatic, rapid primary election from within the database. This decentralized approach eliminates external dependencies, shortens mean time to detect and resolve, and enhances overall high‑availability reliability.
Sources: