Integrating Columnar Techniques and Factorization into GraphflowDB

Semih Salihoglu (University of Waterloo)

Graph database management systems (GDBMSs) in contemporary jargon refer to systems that adopt the property graph model and often power applications such as fraud detection and recommendations that require very fast joins of records that represent many-to-many relationships, often beyond the performance that existing relational systems generally provide. In this talk, I will give an overview of GraphflowDB, which is an in-memory GDBMS we are developing at University of Waterloo. I will first make a case for using and adapting storage, compression, and query processing techniques for columnar relational database management systems to architect GDBMSs. I will then give an overview of the system’s query processor, which optimizes the traditional block-based (a.k.a. vectorized) query processors of columnar RDBMSs for many-to-many joins. Specifically, the system’s query processor integrates factorization to avoid data copies under many-to-many joins and exploits the existing list-based physical data storage to avoid copying of lists into intermediate tuples.

Semih Salihoglu is an Assistant Professor at University of Waterloo. His research focuses on developing systems for managing, querying, or doing analytics on graph-structured data. His main on-going systems projects include Graphflow (http://graphflow.io/), which is a new graph database management system that integrates novel storage, indexing and query processing techniques, and GraphSurge (https://github.com/dsg-uwaterloo/graphsurge), which is a system designed to run batch computations over multiple graph views with significant computation sharing. He holds a PhD from Stanford University and is a recipient of the 2018 VLDB best paper award.

Slides