Vector similarity search (VSS) is widely used in recommender systems, information retrieval, and chatbots. Filtered VSS is a variant of this, in which the similarity search is combined with traditional SQL predicates. Over the past years, many database systems have been extended with VSS capabilities through weakly coupled vector index integrations. We claim these indexes should instead be designed with the capabilities of the data system in mind. In this talk, we propose PDXearch, a DuckDB extension for lightweight but fast (filtered) vector similarity search. Our approach is tailored for analytical database systems. It incorporates a SOTA partition-based index and integrates tightly with DuckDB for fast predicate evaluation and morsel-driven parallelism. Compared to DuckDB’s official HNSW index, our index is at least 30x faster to construct, using 30-50% less memory, and achieves competitive (filtered) search performance. Furthermore, our design opens the door to high-performance continuous ingestion and out-of-core processing. We aim to bring the community a fast, lightweight, and portable VSS solution.
Simon van Noort is an MSc student at the VU Amsterdam & UvA. He’s part of the CWI’s Database Architectures group, working on the PDXearch extension with Leonardo Kuffo and Peter Boncz. Formerly, he was an intern at Google, Datadog, and Onramper.