Databases are becoming commodity, what's next? Chat with Chris from Materialized View
Mar 19, 2024
auto_awesome
Chris Riccomini, ex-Distinguished engineer at Wepay and co-creator of Apache Samza, discusses the evolution of data systems, the rise of Arrow, and the future trends in database systems. Topics include domain-specific databases, object storage for edge systems, developer ergonomics, and the shift towards specialized Postgres-compatible databases.
Consolidation around Postgres-compatible databases for specialized needs like Vector Search or GIS is likely.
Shift towards Spark or Flink for data integration can reshape the landscape efficiently.
Deep dives
Evolution of Data Systems: Postgres as Central Element
The future of data systems could see a consolidation around Postgres-compatible databases, serving specialized needs like Vector Search or GIS. These databases are likely to be built on object storage, streamlining operations. Postgres or MySQL protocols are expected to act as unifying elements between these databases. The database landscape may shrink slightly, with workloads folding into fewer systems.
Data Integration: Rise of Spark and Flink
On the data integration front, systems like Spark and Flink are anticipated to play a more prominent role. These tools are foreseen to manage ETL processes and data lake house maintenance efficiently. A shift towards Spark or Flink for reading and transforming raw files could reshape the data integration landscape.
Edge Computing and Ergonomics: Challenges Ahead
Edge computing's intersection with client-side operations poses significant challenges, particularly around developer ergonomics like CRDT implementations. Hyper-specific vertical databases tailored for low connectivity or reactivity scenarios are expected to thrive in the Edge space. Finding a balance between application engineer-friendly solutions and efficient Edge data operations remains a key focus.
Embracing Object Storage for Durability and Replication
Object storage's role as a primary data store is gaining momentum, offering high durability and multi-region replication with minimal efforts. Architectural advantages, like seamless scalability and multi-region replication, when integrated with systems like S3 Express, make object storage the go-to solution. The decreasing need for complex consensus algorithms enhances data durability across regions.