Decoupling Compute and Storage for Stream Processing Systems - Benefits, Limitations, and Insights

Yingjun Wu

Stream processing is an essential part of modern data infrastructure, but building an efficient and scalable stream processing system can be challenging. Decoupling compute and storage architecture has become an effective way to address these challenges.

In this talk, we discuss the benefits and limitations of the decoupled compute and storage architecture in stream processing systems. We find that, while decoupling compute and storage can help achieve infinite scalability, this approach can lead to data consistency and high latency issues, especially when processing complex continuous queries that require managing extra-large internal states. We then present our solution to address the challenges by implementing a tiered storage mechanism. The tiered storage approach utilizes a combination of high-performance and low-cost storage tiers to minimize data movement between the compute and storage layers while maintaining efficient processing. By the end of the talk, we will present experimental results that demonstrate the balance between performance and cost-efficiency achieved by our proposed approach.

Yingjun Wu is the founder of RisingWave Labs (https://www.risingwave.com/), a database company developing RisingWave, a distributed SQL database for stream processing. Before running the company, Yingjun was a software engineer at the Redshift team, Amazon Web Services, and a researcher at the Database group, IBM Almaden Research Center. Yingjun received his PhD degree from National University of Singapore, and was a visiting PhD at Carnegie Mellon University. He has been working in the field of stream processing and database systems for over a decade.