Sorting is one of the most well studied problems in Computer Science. Research in this area forms the basis of sorting in database systems but focuses mostly on sorting large arrays. Sorting is more complex for relational data as many different types need to be supported, as well as NULL values. There can also be multiple order clauses. Besides sorting the order clause (key) columns, all other selected (payload) columns need to be reordered as well. In this talk, I will discuss how this leads to many interesting implementation decisions that drastically impact end-to-end performance. These decisions have guided the implementation of DuckDB’s new sorting operator. Experimental results show that DuckDB’s sorting implementation is highly competitive with other relational systems in-memory, and gracefully degrades into external sorting when the input size exceeds the limit of available memory.
I am a second year PhD student at the Database Architectures group at CWI. My research focuses on out-of-core query processing and low-level optimization. I am involved with DuckDB, which is where I implement my research. Before starting my PhD I completed a BSc in Computer Science and a MSc in Data Science at Radboud University Nijmegen.