33rd Edition Seminar
The 33rd Edition seminar of DSDSD will feature talks by
Daniel Lemire
Pinar Tözün
Viktor Leis
If you’d like to receive messages about upcoming talks, please subscribe to the list
by sending an email to dsdsd-list-subscribe@cwi.nl
We hold monthly talks on Fridays from 3:30 PM to 5 PM CET for and by researchers and practitioners designing (and implementing) data systems. The objective is to establish a new forum for the Dutch Data Systems community to come together, foster collaborations between its members, and bring in high quality international speakers. We would like to invite all researchers, especially also PhD students, who are working on related topics to join the events. It is an excellent opportunity to receive feedback early on by researchers in your field.
The 33rd Edition seminar of DSDSD will feature talks by
Daniel Lemire
Pinar Tözün
Viktor Leis
To make the diverse I/O storage paths (e.g., libaio, io_uring, and SPDK) more accessible to users, Samsung created xNVMe. This talk will focus on our experience with integrating xNVMe into DuckDB as a new filesystem extension and demonstrate what this integration enables for DuckDB out of the box.
Pınar Tözün is an Associate Professor and the Head of Data, Systems, and Robotics Section at IT University of Copenhagen (ITU). Her research focuses on resource-aware machine learning, performance characterization of data-intensive systems, and scalability and efficiency of data-intensive systems on modern hardware.
Why SQL is broken, why that is a problem, and how we can get to better world.
Viktor Leis is a professor in the Computer Science Department at TUM. His research revolves around designing cost-efficient data systems for the cloud and includes core database systems topics such as query processing, query optimization, transaction processing, index structures, and storage.
Transfer learning is an effective technique for tuning a deep learning model when training data or computational resources are limited. Instead of training a new model from scratch, the parameters of an existing “base model” are adjusted for a new task. The accuracy of such a fine-tuned model depends on choosing an appropriate base model. Model search automates the selection of such a base model by evaluating the suitability of candidate models for a specific task. This entails inference with each candidate model on task-specific data. With thousands of models available through model stores, the computational cost of model search is a major bottleneck for efficient transfer learning. In this work, we present Alsatian, a novel model search system. Based on the observation that many candidate models overlap to a significant extent and based on a careful bottleneck analysis, we propose optimization techniques that are applicable to many model search frameworks. These optimizations include: (i) splitting models into individual blocks that can be shared across models, (ii) caching of intermediate inference results and model blocks, and (iii) selecting a beneficial search order for models to maximize sharing of cached results. In our evaluation on state-of-the-art deep learning models from computer vision and natural language processing, we show that Alsatian outperforms baselines by up to ~14×.
Nils is a PhD student in the Database Group at the Hasso Plattner Institute (HPI) in Potsdam, under the supervision of Tilmann Rabl. His research focuses on ML systems, particularly ML model management and search. In addition to his research, he contributes to the lecture on big data systems, leads seminars on ML systems, and supervises master’s theses. Before starting his PhD, he earned a master’s degree in IT-Systems Engineering from HPI and a bachelor’s degree in Computer Science from the University of Hamburg. As part of his studies, he completed a six-month internship at SAP Labs France in Sophia Antipolis and spent a semester at ETH Zurich.
The Compute Express Link (CXL) standard enables new forms of memory management and access across devices and servers. Based on PCIe, it enables cache coherent access to remote memory. This widens the design space for database systems by expanding the available memory beyond memory local to the CPU. Efficiently utilizing CXL-attached memory requires conscious decisions of data systems about data placement and management. In this work, we provide an in-depth analysis of database operation performance with data interleaved across multiple CXL memory devices. We evaluate memory access performance for basic access patterns and the performance impact of placing data across multiple CXL memory devices for in-memory column scans and in-memory B+tree operations.
Marcel Weisgut is a PhD student at the Hasso Plattner Institute, specializing in data management utilizing modern hardware in the Data Engineering Systems group led by Tilmann Rabl. He received his master’s degree from HPI in 2021, focusing on in-memory data management in Hasso Plattner’s research group. During his master’s studies, he contributed to the columnar open-source in-memory database system Hyrise and interned with the SAP HANA development team at SAP Labs Korea. His current research focuses on utilizing memory attached to a CPU via the cache-coherent interconnect Compute Express Link (CXL) for database systems.
Deep learning training is an expensive process that extensively uses GPUs. However, not all model training saturates the resources of a single GPU. This problem gets exacerbated with each new GPU generation offering more hardware resources. In this talk, we will first investigate methods to share GPU resources across model training jobs by collocating these jobs on the same GPU to improve hardware utilization. Then, we will explore work sharing opportunities in the data pipelines of model training, furthering the benefits of collocated training.
Pınar Tözün is an Associate Professor at IT University of Copenhagen. Before ITU, she was a research staff member at IBM Almaden Research Center. Prior to joining IBM, she received her PhD from EPFL. Her thesis received ACM SIGMOD Jim Gray Doctoral Dissertation Award Honorable Mention in 2016. Her research focuses on resource-aware machine learning, performance characterization of data-intensive systems, and scalability and efficiency of data-intensive systems on modern hardware.