Leveraging Generative AI for Data Processing

Immanuel Trummer

The year 2022 has been marked by several breakthrough results in the domain of generative AI, culminating in the rise of tools like ChatGPT, able to solve a variety of language-related tasks without specialized training. In this talk, I outline novel opportunities in the context of data management, enabled by these advances. I discuss several recent research projects, aimed at exploiting advanced language processing for tasks such as parsing a database manual to support automated tuning, or mining data for patterns, described in natural language. Finally, I discuss our recent and ongoing research, aimed at synthesizing code for SQL processing in general-purpose programming languages, while enabling customization via natural language commands.

Immanuel Trummer is assistant professor for computer science at Cornell University. His research covers various aspects of large-scale data management with the goal of making data analysis more efficient and more user-friendly. His publications were selected for “Best of VLDB”, for the ACM SIGMOD Research Highlight Award, and for publication in CACM as CACM Research Highlight. He is a recipient of the Google Faculty Research Award and alumnus of the German National Academic Foundation.