Efficient detection of multivariate correlations in static and streaming data

Jens d’Hondt

Correlation analysis is an invaluable tool in many domains, for better understanding the data and extracting salient insights. Most works to date focus on detecting high pairwise correlations. A generalization of this problem with known applications but no known efficient solutions involves the discovery of strong multivariate correlations, i.e., finding vectors (typically in the order of 3 to 5 vectors) that exhibit a strong dependence when considered altogether. In this presentation we propose algorithms for detecting multivariate correlations in static and streaming data. Our algorithms, which rely on novel theoretical results, support four different correlation measures, and allow for additional constraints. Our extensive experimental evaluation examines the properties of our solution and demonstrates that our algorithms outperform the state-of-the-art, typically by two orders of magnitude. Check out supporting material at: https://correlationdetective.com/

Jens d’Hondt is a PhD candidate at the Database group of the Eindhoven University of Technology, supervised by dr. Odysseas Papapetrou. He is currently leading the Correlation Detective project, which aims to build a generic system for multivariate similarity search on large datasets. The project started in September 2021, and has since then lead to a publication at VLDB’22 in Sydney.