Database Federation: Decentralized and ACL-Compliant Hive™ Databases

• Database Federation: Decentralized and ACL-Compliant Hive™ Databases 19 February / GlobalIntroduction One of Uber’s data warehouses powering the Delivery business outgrew its original design. • More than 16,000 Hive datasets and 10 petabytes from multiple business domains lived inside a single, monolithic database-owned and operated by a centralized delivery Data Solutions team. • While this one-big-bucket setup once simplified onboarding and discovery, scale and organizational growth turned the same design into a liability. • The monolithic design had many limitations. • Shared-Fate Outages Metadata corruption or resource spikes initiated by one team could cascade across the entire database, disrupting unrelated tier-1 workloads and critical business use cases. • Resource Contention and Noisy Neighbors Unbounded, ad-hoc datasets and uneven dataset-count growth also competed for the same Metastore, Apache HDFS™, and compute quotas-degrading query latency for everyone.

Article Summaries:

Uber has re‑engineered its core data warehouse by breaking a single, 10‑petabyte Hive database that housed over 16,000 datasets into a federated system of smaller, domain‑specific databases. The move was driven by recurring outages, resource contention, and operational bottlenecks that a monolithic design caused, as well as governance gaps and overly permissive access controls. The new architecture delivers zero‑downtime migration, tighter ACL enforcement, and improved observability, allowing teams to manage datasets independently while preserving the integrity of machine‑learning pipelines, merchant analytics, and compliance reporting. The initiative aims to enhance scalability, security, and operational efficiency across Uber’s Delivery business.

Sources:

https://www.uber.com/blog/database-federation/ (Latest source article published: 2026-02-19 14:30 UTC)