The Dutch Seminar
on Data Systems Design

An initiative to bring together research groups working on data systems in Dutch universities and research institutes.

Fridays4–5:30 pm
bi-weekly

We hold bi-weekly talks on Fridays from 4:00 PM to 5:30 PM CET for and by researchers and practitioners designing (and implementing) data systems. The objective is to establish a new forum for the Dutch Data Systems community to come together, foster collaborations between its members, and bring in high quality international speakers. We would like to invite all researchers, especially also PhD students, who are working on related topics to join the events. It is an excellent opportunity to receive feedback early on by researchers in your field.

Past talks

May 13, 2022

Algorithms for Relational Knowledge Graphs

Martin Bravenboer

RelationalAI is the next-generation database system for new intelligent data applications based on relational knowledge graphs. RelationalAI complements the modern data stack by allowing data applications to be implemented relationally and declaratively, leveraging knowledge/semantics for reasoning, graph analytics, relational machine learning, and mathematical optimization workloads. RelationalAI as a relational and cloud native system fits naturally in the modern data stack, providing virtually infinite compute and storage capacity, versioning, and a fully managed system. RelationalAI supports the workload of data applications with an expressive relational language (called Rel), novel join algorithms and JIT compilation suitable for complex computational workloads, semantic optimization that leverages knowledge to optimize application logic, and incrementality of the entire system for both data (IVM) and code (live programming). The system utilizes immutable data structures, versioning, parallelism, distribution, out-of-core memory management to support state-of-the-art workload isolation and scalability for simple as well as complex business logic. In our experience, RelationalAI’s expressive, relational, and declarative language leads to a 10-100x reduction in code for complex business domains. Applications are developed faster, with superior quality by bringing non-technical domain experts into the process and by automating away complex programming tasks. We discuss the core innovations that underpin the RelationalAI system: an expressive relational language, worst-case optimal join algorithms, semantic optimization, just-in-time compilation, schema discovery and evolution, incrementality and immutability.

Martin Bravenboer is VP Engineering at RelationalAI where he leads the development of the RelationalAI system. Before RelationalAI, he was CTO at LogicBlox. As a postdoctoral researcher with Prof. Yannis Smaragdakis, he developed the Doop framework for declarative and precise points-to analysis that uses the LogicBlox system. Martin obtained his PhD at Utrecht University in the area of language design and compiler construction.

read more
May 13, 2022

The LDBC Social Network Benchmark - Business Intelligence workload

Gábor Szárnyas

Graph data management techniques are employed in several domains such as finance and enterprise knowledge representation for evaluating graph pattern matching and path finding queries on large data sets. Supporting such queries efficiently yields a number of unique requirements, including the need for a concise query language and graph-aware query optimization techniques. The goal of the Linked Data Benchmark Council (LDBC) is to design standard benchmarks which capture representative categories of graph data management problems, making the performance of systems comparable and facilitating competition among vendors. This talk describes the Business Intelligence workload, a graph OLAP benchmark with global graph queries that use pattern matching, path finding, and aggregation operations. The workload is executed on a dynamic social network graph updated in daily batches of inserts and deletes. We discuss the design process of the benchmark and present its first stable version.

Gábor Szárnyas is a post-doctoral researcher. He obtained his PhD in software engineering in 2019, focusing on the intersection of object-oriented graph models and property graphs. He currently works on efficient graph processing techniques, including formulating graph algorithms in the language of linear algebra (GraphBLAS), implementing graph query engines (SQL/PGQ), and designing graph benchmarks. He serves on the steering committee of the Linked Data Benchmark Council.

Slides

read more
Apr 29, 2022

Glidesort - Efficient In-Memory Adaptive Stable Sorting on Modern Hardware

Orson Peters

Sorting is one of the most common algorithms used in programming, and virtually every standard library contains a routine for it. Despite also being one of the oldest problems out there, surprisingly large improvements are still being found. Some of these are fundamental novelties, and others are optimizations matching the changing performance landscape in modern hardware.

In this talk we present Glidesort, a general purpose in-memory stable comparison sort. It is fully adaptive to both pre-sorted runs in the data similar to Timsort, and low-cardinality inputs similar to Pattern-defeating Quicksort, making it to our knowledge the first practical stable sorting algorithm fully adaptive in both measures. Glidesort achieves a 3x speedup over a Rust’s standard library Timsort routine on sorting random 32-bit integers, with the speedup breaking the order of magnitude barrier for realistic low-cardinality distributions. It achieves this without the use of SIMD, processor-specific intrinsics or assumptions about the type being sorted: it is a fully generic sort taking an arbitrary comparison operator.

Using Glidesort as the motivating example we discuss the principles of efficient stable in-memory partitioning and merging on modern hardware. In particular attention is paid to eliminating branches and interleaving independent parallel loops to efficiently use our modern deeply-pipelined superscalar processors. The lessons learned here are widely applicable to efficient data processing outside of sorting.

Orson Peters is a first-year PhD student at the Database Architecture group at CWI Amsterdam. His research interests are very broad, and span low-level optimization, compression, information theory, cryptography, (parallel) data structures, string processing and more. In particular sorting is an interest, having published pdqsort in 2015 which is now the default unstable sorting algorithm in Rust and Go. His alma mater is Leiden University, where he did his BSc and MSc in Computer Science, specializing in Artificial Intelligence.

read more

Tweets by @DSDSDNL