• How we completely rearchitected Mussel, our storage engine for derived data, and lessons learned from the migration from Mussel V1 to V2. • By Shravan Gaonkar , Chandramouli Rangarajan , Yanhan Zhang How we completely rearchitected Mussel, our storage engine for derived data, and lessons learned from the migration from Mussel V1 to V2. • Airbnb’s core key-value store, internally known as Mussel, bridges offline and online workloads, providing highly scalable bulk load capabilities combined with single-digit millisecond reads. • Since first writing about Mussel in a 2022 blog post , we have completely deprecated the storage backend of the original system (what we now call Mussel v1) and have replaced it with a NewSQL backend which we are referring to as Mussel v2. • Mussel v2 has been running successfully in production for a year, and we wanted to share why we undertook this rearchitecture, what the challenges were, and what benefits we got from it. • Why rearchitect Mussel v1 reliably supported Airbnb for years, but new requirements - real-time fraud checks, instant personalization, dynamic pricing, and massive data - demand a platform that combines real-time streaming with bulk ingestion, all while being easy to manage.

Article Summaries:

  • Airbnb’s core key‑value store, Mussel, has been upgraded from its original version (V1) to a new, cloud‑native platform (V2). The migration was driven by growing needs for real‑time fraud checks, instant personalization, dynamic pricing, and massive data volumes that V1’s static hash partitioning and manual Chef‑based scaling could not efficiently support. V2 replaces the tightly coupled, protocol‑specific design with a stateless, Kubernetes‑managed Dispatcher that translates API calls, handles retries, and supports dual‑write migration modes. Reads are simplified through logical tables, while writes are persisted to Kafka for durability and ordered replay. The new architecture delivers dynamic range sharding, predictable sub‑25 ms p99 latency, flexible consistency options, and reduced operational overhead, enabling easier scaling, cost transparency, and automated rollouts.

Sources: