Taking a Peek under the Hood of Snowflake's Metadata Management

Max Heimel

This talk provides an overview of Snowflake’s architecture that was designed to efficiently support complex analytical workloads in the cloud. Looking at the lifecycle of micro partitions, this talk explains pruning, zero-copy cloning, and instant time travel. Pruning is a technique to speed up query processing by filtering out unnecessary micro partitions during query compilation. Zero-copy cloning allows the creation of logical copies of the data without duplicating physical storage. Instant time travel enables the user to query data “as of” a time in the past, even if the current state of the data has changed. We also describe how we utilize cloud resources to automatically reorganize (“cluster”) micro partitions in the background in order to achieve consistent query performance without affecting running customer queries.

Max Heimel holds a PhD in Computer Science from the Database and Information Management Group at TU Berlin. He joined Snowflake in 2015 and is working as a Software Engineer in the areas of query execution and query optimization. Before joining Snowflake, Max worked at IBM and spent several internships at Google.