The Dutch Seminar
on Data Systems Design

An initiative to bring together research groups working on data systems in Dutch universities and research institutes.

Fridays4–5 pm
monthly

We hold monthly talks on Fridays from 3:30 PM to 5 PM CET for and by researchers and practitioners designing (and implementing) data systems. The objective is to establish a new forum for the Dutch Data Systems community to come together, foster collaborations between its members, and bring in high quality international speakers. We would like to invite all researchers, especially also PhD students, who are working on related topics to join the events. It is an excellent opportunity to receive feedback early on by researchers in your field.

Upcoming talks

March 25th, 2026 from 16:00 PM to 16:30 PM (Europe/Amsterdam / CET)

34th Edition Seminar

The 34th Edition seminar of DSDSD will feature talks by
Simon van Noort

read more
Mar 25, 2026

Efficient Filtered Vector Similarity Search in Analytical Databases

Simon van Noort

Vector similarity search (VSS) is widely used in recommender systems, information retrieval, and chatbots. Filtered VSS is a variant of this, in which the similarity search is combined with traditional SQL predicates. Over the past years, many database systems have been extended with VSS capabilities through weakly coupled vector index integrations. We claim these indexes should instead be designed with the capabilities of the data system in mind. In this talk, we propose PDXearch, a DuckDB extension for lightweight but fast (filtered) vector similarity search. Our approach is tailored for analytical database systems. It incorporates a SOTA partition-based index and integrates tightly with DuckDB for fast predicate evaluation and morsel-driven parallelism. Compared to DuckDB’s official HNSW index, our index is at least 30x faster to construct, using 30-50% less memory, and achieves competitive (filtered) search performance. Furthermore, our design opens the door to high-performance continuous ingestion and out-of-core processing. We aim to bring the community a fast, lightweight, and portable VSS solution.

Simon van Noort is an MSc student at the VU Amsterdam & UvA. He’s part of the CWI’s Database Architectures group, working on the PDXearch extension with Leonardo Kuffo and Peter Boncz. Formerly, he was an intern at Google, Datadog, and Onramper.

Past talks

Jan 09, 2026

Towards Sanity in Query Languages

Viktor Leis

Why SQL is broken, why that is a problem, and how we can get to better world.

Viktor Leis is a professor in the Computer Science Department at TUM. His research revolves around designing cost-efficient data systems for the cloud and includes core database systems topics such as query processing, query optimization, transaction processing, index structures, and storage.

read more
Jan 09, 2026

Flexible I/O for Database Management Systems with xNVMe

Pinar Tözün

To make the diverse I/O storage paths (e.g., libaio, io_uring, and SPDK) more accessible to users, Samsung created xNVMe. This talk will focus on our experience with integrating xNVMe into DuckDB as a new filesystem extension and demonstrate what this integration enables for DuckDB out of the box.

Pınar Tözün is an Associate Professor and the Head of Data, Systems, and Robotics Section at IT University of Copenhagen (ITU). Her research focuses on resource-aware machine learning, performance characterization of data-intensive systems, and scalability and efficiency of data-intensive systems on modern hardware.

read more
Jan 09, 2026

SIMD-accelerated data processing

Daniel Lemire

Hardware capabilities have advanced dramatically, with PCIe bandwidth doubling roughly every three years, reaching 32 GB/s per channel in PCIe 7.0, high-bandwidth memory delivering hundreds of GB/s, and modern CPUs featuring wider SIMD units capable of processing dozens of bytes per instruction. Yet many software tasks, including JSON parsing, remain CPU-bound and far slower than these interfaces allow. This presentation explores how SIMD instructions enable gigabyte-per-second throughput in real-world data processing. Focusing on the simdjson library, we examine its design for fast structural scanning, on-demand parsing, and minification, along with recent optimizations leveraging C++26 compile-time reflection for efficient serialization and vectorized string escaping. We extend the discussion to related challenges in Unicode validation and correction (as deployed in browsers) and high-speed Base64 encoding/decoding in upcoming JavaScript standards. Through benchmarks on platforms, we demonstrate how these techniques harness modern hardware to deliver orders-of-magnitude speedups, powering systems from Node.js and ClickHouse to web browsers worldwide.

Daniel Lemire is a computer science professor at the University of Quebec (TELUQ). He is among the 1000 most followed programmers in the world on GitHub. His work is found in many standard libraries (.NET, Rust, GCC/glibc++, LLVM/libc, Go, Node.js, etc.) and in the major Web browsers (Safari, Chrome, etc.). His research interests include high-performance programming. He is @lemire on X, and he blogs weekly at https://lemire.me/blog

read more

Tweets by @DSDSDNL