Many SQL databases do not focus on efficient LIST-type support. Scalar functions and aggregations on LIST values often require additional unnesting steps or loading normalized data. However, nested input formats such as JSON are widespread in analytics. Efficient operations directly on these input formats can leverage the potential of SQL engines while increasing the system’s ease of use. However, using this potential is not trivial, as the LIST type’s underlying storage format and operations have to synergize with the relational execution model. DuckDB is a high-performance relational database system for analytics. In this talk, I’ll showcase DuckDB’s internal design choices to support LISTs efficiently and highlight our support of Python-style list comprehension directly in the SQL dialect.
I studied Computer Science from 2016 to 2022 in Ilmenau, Germany. After my Bachelor’s, I got the opportunity for a four-month internship at the CWI in Amsterdam, where I worked on adaptive expression reordering in DuckDB. In 2022, after finishing my studies, I returned to Amsterdam to work for DuckDB Labs as a software engineer.