Powering Billion-Scale Vector Search with OpenSearch

• Introduction At Uber, our systems handle massive amounts of data daily, from ridesharing to delivery. • We’ve traditionally used keyword-based search with Apache Lucene™. • However, we needed to move beyond simple keyword matching to semantic search to understand the meaning behind searches. • To achieve this, we adopted Amazon® OpenSearch as our vector search engine. • Its scalability, performance, and flexibility were key factors in our decision. • This blog post explores our journey of evaluating and implementing OpenSearch for large-scale vector search, focusing specifically on the infrastructure challenges and solutions we encountered.

Article Summaries:

Uber announced it has adopted Amazon OpenSearch to power billion‑scale vector search across its services. The company’s existing keyword search relied on Apache Lucene’s HNSW algorithm, which limited algorithm choice, lacked GPU support, and struggled with high‑dimensional vectors needed for semantic retrieval. OpenSearch offers multiple ANN algorithms, native FAISS integration for future GPU acceleration, and greater scalability. In a 2024 prototype, Uber indexed over 1.5 billion items, each represented by a 400‑dimensional embedding, using batch ingestion. The move marks a strategic partnership with Amazon and deeper engagement with the open‑source community.

Sources:

https://www.uber.com/blog/powering-billion-scale-vector-search-with-opensearch/