Accurate Summary-based Cardinality Estimation Through the Lens of Cardinality Estimation Graphs

Semih Salihoglu (UWaterloo)

This is a two part talk. The main part of the talk discusses a class of cardinality estimation techniques we refer to as optimistic estimators that store statistics about input relations and small-size joins. These estimators use these statistics in formulas that make independence and uniformity assumptions to make an estimate for a query. We focus on complex join queries and observe that for many queries there are multiple formulas to make an estimate and no obvious choice for which formula to use nor any clear advice from prior literature that have implemented these estimators. We show that these estimators can be modeled as different heuristics that pick bottom-to-top paths in a new framework called cardinality estimation graphs (CEGs). Using the framework, we can describe a suite of possible heuristics to make estimates and empirically evaluate which heuristic performs better on several large query benchmarks. For example, we show that on acyclic queries picking “pessimistic” paths/formulas that produce larger estimates are generally more accurate. We then show that CEGs can also model the novel pessimistic estimators that use linear programs based on worst-case query output bounds. This is done by changing the edge weights in the CEG of optimistic estimators to maximum degree weights. We therefore present a very intuitive and arguably much simpler interpretation for pessimistic estimators than prior work on pessimistic estimators.

In the second and shorter part of the talk, I will briefly discuss the vision of the Kùzu graph database management system that is being developed in my group as a state-of-art system that implements state of the art query processing and storage techniques for managing large graph databases.

Semih Salihoğlu is an Associate Professor and a David R. Cheriton Faculty Fellow at University of Waterloo. His research focuses on developing systems for managing, querying, or doing analytics on graph-structured data. His main on-going systems project is Kùzu, which is a new graph database management system that integrates novel storage, indexing and query processing techniques. He holds a PhD from Stanford University and is a recipient of the VLDB 2018 Best Paper, the VLDB 2022 Best Experiments and Analysis Paper, and a 2023 SIGMOD Research Highlights awards.