I/O Observability for Uber's Massive Petabyte-Scale Data Lake

• I/O Observability for Uber’s Massive Petabyte-Scale Data Lake 13 November 2025 / GlobalIntroduction As Uber’s data infrastructure evolves toward a hybrid cloud architecture, understanding data access patterns across our platform is more critical than ever. • This data I/O (Input/Output) observability plays a crucial role in the journey to CloudLake (Uber’s hybrid cloud architecture). • As part of the CloudLake migration, Uber is expanding its compute and storage capacity in the cloud, while gradually decommissioning on-prem capacity. • This opens up a new set of problem statements. • First, the cross-service provider network link is a bottleneck. • Second, colocating workloads with datasets for efficient execution is envisaged, but the challenge arises due to a lot of experimental workloads with no fixed read pattern.

Article Summaries:

I/O Observability for Uber’s Massive Petabyte-Scale Data Lake 13 November 2025 / GlobalIntroduction As Uber’s data infrastructure evolves toward a hybrid cloud architecture, understanding data access patterns across our platform is more critical than ever. This data I/O (Input/Output) observability plays a crucial role in the journey to CloudLake (Uber’s hybrid cloud architecture). As part of the CloudLake migration, Uber is expanding its compute and storage capacity in the cloud, while gradually decommissioning on-prem capacity. This opens up a new set of problem statements. First, the cross-

Sources:

https://www.uber.com/blog/i-o-observability-for-ubers-massive-petabyte-scale-data-lake/