Ad-hoc panel on data discovery

Discovering datasets for data scientists: how can we enable the discovery of relevant datasets, as well as the relationships among datasets in a huge data repository? Dataset discovery is the basis on which we can build data augmentation, data integration, building better and more accurate ML models. There is a multitude of approaches to the problem, at the moment: using classic methods such as automated schema matching, semantics-based methods by building knowledge graphs, using humans to annotate relationships, or mining query logs to find related information. Still, all these methods need to be adapted on a case-to-case scenario.

Main panel question: Are we trying to solve “people” and company culture problems with technical solutions? Is data discovery “automatable” or should we focus on humans-in-the-loop solutions? How can (existing) database research contribute to the landscape?

The panel will be moderated by Asterios Katsifodimos (TU Delft).