How Uber Indexes Streaming Data with Pull-Based Ingestion in OpenSearch™

• How Uber Indexes Streaming Data with Pull-Based Ingestion in OpenSearch™ 16 December 2025 / GlobalIntroduction At Uber, our business operates in real time. • Whether you’re hailing a ride, ordering from a restaurant, or tracking a delivery, search is the critical starting point. • Our search platform powers these experiences at a massive scale, indexing everything from restaurant menus and destinations to the live locations of drivers and couriers. • Given its central role, our search platform must meet stringent demands for performance, scalability, and data freshness. • To achieve this, its architecture was built on two foundational principles: a pull-based ingestion model and an active-active deployment. • The pull-based model, built on Apache Kafka®, decouples data producers from the search cluster, allowing our platform to ingest data at its own pace for greater reliability.

Article Summaries:

Uber has upgraded its global search platform by adopting a pull‑based ingestion model that pulls data from its cross‑replicated Apache Kafka infrastructure into OpenSearch. This design decouples data producers from the search cluster, allowing the system to absorb traffic spikes, prioritize critical updates, and replay indexing requests without client‑side backpressure logic. Coupled with an active‑active, multi‑region deployment, the architecture keeps indices fresh and highly available for real‑time services such as rides, food delivery, and logistics. Uber’s work on integrating pull‑based indexing into OpenSearch also supports the migration of its in‑house search systems to the open‑source platform.

Sources:

https://www.uber.com/blog/how-uber-indexes-streaming-data-with-pull-based-ingestion-in-opensearch/