34th Edition Seminar
The 34th Edition seminar of DSDSD will feature talks by
Simon van Noort
If you’d like to receive messages about upcoming talks, please subscribe to the list
by sending an email to dsdsd-list-subscribe@cwi.nl
We hold monthly talks on Fridays from 3:30 PM to 5 PM CET for and by researchers and practitioners designing (and implementing) data systems. The objective is to establish a new forum for the Dutch Data Systems community to come together, foster collaborations between its members, and bring in high quality international speakers. We would like to invite all researchers, especially also PhD students, who are working on related topics to join the events. It is an excellent opportunity to receive feedback early on by researchers in your field.
The 34th Edition seminar of DSDSD will feature talks by
Simon van Noort
Vector similarity search (VSS) is widely used in recommender systems, information retrieval, and chatbots. Filtered VSS is a variant of this, in which the similarity search is combined with traditional SQL predicates. Over the past years, many database systems have been extended with VSS capabilities through weakly coupled vector index integrations. We claim these indexes should instead be designed with the capabilities of the data system in mind. In this talk, we propose PDXearch, a DuckDB extension for lightweight but fast (filtered) vector similarity search. Our approach is tailored for analytical database systems. It incorporates a SOTA partition-based index and integrates tightly with DuckDB for fast predicate evaluation and morsel-driven parallelism. Compared to DuckDB’s official HNSW index, our index is at least 30x faster to construct, using 30-50% less memory, and achieves competitive (filtered) search performance. Furthermore, our design opens the door to high-performance continuous ingestion and out-of-core processing. We aim to bring the community a fast, lightweight, and portable VSS solution.
Simon van Noort is an MSc student at the VU Amsterdam & UvA. He’s part of the CWI’s Database Architectures group, working on the PDXearch extension with Leonardo Kuffo and Peter Boncz. Formerly, he was an intern at Google, Datadog, and Onramper.
Why SQL is broken, why that is a problem, and how we can get to better world.
Viktor Leis is a professor in the Computer Science Department at TUM. His research revolves around designing cost-efficient data systems for the cloud and includes core database systems topics such as query processing, query optimization, transaction processing, index structures, and storage.
To make the diverse I/O storage paths (e.g., libaio, io_uring, and SPDK) more accessible to users, Samsung created xNVMe. This talk will focus on our experience with integrating xNVMe into DuckDB as a new filesystem extension and demonstrate what this integration enables for DuckDB out of the box.
Pınar Tözün is an Associate Professor and the Head of Data, Systems, and Robotics Section at IT University of Copenhagen (ITU). Her research focuses on resource-aware machine learning, performance characterization of data-intensive systems, and scalability and efficiency of data-intensive systems on modern hardware.
Hardware capabilities have advanced dramatically, with PCIe bandwidth doubling roughly every three years, reaching 32 GB/s per channel in PCIe 7.0, high-bandwidth memory delivering hundreds of GB/s, and modern CPUs featuring wider SIMD units capable of processing dozens of bytes per instruction. Yet many software tasks, including JSON parsing, remain CPU-bound and far slower than these interfaces allow. This presentation explores how SIMD instructions enable gigabyte-per-second throughput in real-world data processing. Focusing on the simdjson library, we examine its design for fast structural scanning, on-demand parsing, and minification, along with recent optimizations leveraging C++26 compile-time reflection for efficient serialization and vectorized string escaping. We extend the discussion to related challenges in Unicode validation and correction (as deployed in browsers) and high-speed Base64 encoding/decoding in upcoming JavaScript standards. Through benchmarks on platforms, we demonstrate how these techniques harness modern hardware to deliver orders-of-magnitude speedups, powering systems from Node.js and ClickHouse to web browsers worldwide.
Daniel Lemire is a computer science professor at the University of Quebec (TELUQ). He is among the 1000 most followed programmers in the world on GitHub. His work is found in many standard libraries (.NET, Rust, GCC/glibc++, LLVM/libc, Go, Node.js, etc.) and in the major Web browsers (Safari, Chrome, etc.). His research interests include high-performance programming. He is @lemire on X, and he blogs weekly at https://lemire.me/blog