Data Stations : Combining Data, Compute, and Market Forces

Raul Castro Fernandez (University of Chicago)

In this talk, I will present preliminary work on a new architecture (Data Station) to facilitate data sharing within and across organizations. Data Stations depart from modern data lakes in that both data and derived data products, such as machine learning models, are sealed and cannot be directly seen, accessed, or downloaded by anyone. Data Stations do not deliver data to users; instead, users bring questions to data. This inversion of the usual relationship between data and compute mitigates many of the security risks that are otherwise associated with sharing and working with sensitive data. In the talk, I will motivate the need for data sharing, present the Data Station architecture, explain how this new architecture helps with data sharing, and finish with some technical challenges we are currently addressing. I will also aim to cast this work in the larger research agenda on the economics of data (including data markets) that my group is working on at the University of Chicago.

I am an assistant professor in Computer Science at The University of Chicago. I am interested in all kinds of data problems, and for the last two years, I have been thinking a lot about the economics of data, data markets, and data sharing architectures. Before UChicago, I was a postdoc at MIT, working with Sam Madden and Michael Stonebraker on data discovery and integration. Before, I completed my PhD at Imperial College London, working with Peter Pietzuch on distributed and data processing problems, including stream processing.