Choosing a vector database for ANN search at Reddit

• Written by Chris Fournier. • In 2024, Reddit teams used a variety of solutions to perform approximate nearest neighbour (ANN) vector search. • From Google’s Vertex AI Vector Search and experimenting with using Apache Solr’s ANN vector search for some larger datasets, to Facebook’s FAISS library for smaller datasets (hosted in vertically scaled side-cars). • More and more teams at Reddit wanted a broadly supported ANN vector search solution that was cost effective, had the search features they desired, and that could scale to Reddit-sized data. • To solve this need, in 2025, we sought out the ideal vector database for teams at Reddit. • This post describes the process we used to select the best vector database for Reddit’s needs today.

Article Summaries:

Reddit’s engineering teams in 2024 used a mix of ANN search solutions-Google Vertex AI, Apache Solr, and Facebook’s FAISS-to meet diverse data‑size and feature needs. By 2025, the company sought a single, cost‑effective vector database that could scale to its billion‑vector workloads and support advanced filtering and hybrid search. The selection process began with gathering functional and non‑functional requirements from interested teams, then qualitatively scoring candidate systems against those needs, followed by quantitative benchmarking of top contenders. The post outlines Reddit’s criteria, evaluation steps, and the importance of aligning with internal culture and existing tool preferences, rather than declaring a universal best solution.

Sources:

https://www.reddit.com/r/RedditEng/comments/1ozxnjc/choosing_a_vector_database_for_ann_search_at/