Semih Salihoğlu, a key contributor to the Kuzu project, dives into the future of graph databases. He elaborates on Kuzu's columnar storage design, emphasizing its efficiency over traditional row-based systems. Discussion highlights include innovative vectorized query processing that boosts performance and enhances analytics. Salihoğlu also explains the challenge of many-to-many relationships and Kuzu's unique approaches to join algorithms, making complex queries faster and less resource-intensive. Overall, this conversation unveils exciting advancements in data management for modern applications.
01:03:35
forum Ask episode
web_stories AI Snips
view_agenda Chapters
auto_awesome Transcript
info_circle Episode notes
question_answer ANECDOTE
Bauplan Use Case
Bauplan, a function-as-a-service company, uses Kuzu to optimize serverless applications.
They generate a graph of the computation, minimize data transfer, and debug within seconds using Kuzu.
volunteer_activism ADVICE
Kuzu for Enhanced Performance
Consider Kuzu if existing graph databases lack performance.
Kuzu offers improved features and performance for graph data management.
insights INSIGHT
Scalability Challenges
A user survey revealed that scalability is a major challenge with graph databases.
Existing technology struggles with large datasets and complex traversals.
Get the Snipd Podcast app to discover more snips from this episode
Kuzu is an embedded graph database that implements Cypher as a library.
It can be easily integrated into various environments—from scripts and Android apps to serverless platforms.
Its design supports both ephemeral, in-memory graphs (ideal for temporary computations) and large-scale persistent graphs where traditional systems struggle with performance and scalability.
Key Architectural Decisions:
Columnar Storage:
Kuzu stores node and relationship properties in separate, contiguous columns. This design reduces I/O by allowing queries to scan only the needed columns, unlike row-based systems (e.g., Neo4j) that read full records even when only a subset of properties is required.
Efficient Join Indexing with CSR:
The join index is maintained using a Compressed Sparse Row (CSR) format. By sorting and compressing relationship data, Kuzu ensures that adjacent node relationships are stored contiguously, minimizing random I/O and speeding up traversals.
Vectorized Query Processing:
Instead of processing one tuple at a time, Kuzu processes blocks (vectors) of tuples. This block-based (or vectorized) approach reduces function-call overhead and improves cache locality, boosting performance for analytic queries.
Factorization and ASP Join:
For many-to-many queries that can generate enormous intermediate results, Kuzu uses factorization to represent data compactly. Its ASP join algorithm integrates factorization, sequential scanning, and sideways information passing to avoid unnecessary full scans and materializations.
Kuzu is optimized for read-heavy, analytic workloads. While batched writes are efficient, the system is less tuned for high-frequency, small transactions. Upcoming features include:
A WebAssembly (Wasm) version for running in browsers.
Enhanced vector and full-text search indices.
Built-in graph data science algorithms for tasks like PageRank and centrality analysis.
Kuzu can be a powerful backend for AI applications in several ways:
Knowledge Graphs:
Store and query complex relationships between entities to support natural language understanding, semantic search, and reasoning tasks.
Graph Data Science:
Run built-in graph algorithms (like PageRank, centrality, or community detection) that help uncover patterns and insights, which can improve recommendation systems, fraud detection, and other AI-driven analyses.
Retrieval-Augmented Generation (RAG):
Integrate with large language models by efficiently retrieving relevant, structured graph data. Kuzu’s vector search capabilities and fast query processing make it ideal for augmenting AI responses with contextual information.
Graph Embeddings & ML Pipelines:
Serve as the foundation for generating graph embeddings, which are used in downstream machine learning tasks—such as clustering, classification, or link prediction—to enhance model performance.
00:00 Introduction to Graph Databases 00:18 Introducing Kuzu: A Modern Graph Database 01:48 Use Cases and Applications of Kuzu 03:03 Kuzu's Research Origins and Scalability 06:18 Columnar Storage vs. Row-Oriented Storage 10:27 Query Processing Techniques in Kuzu 22:22 Compressed Sparse Row (CSR) Storage 27:25 Vectorization in Graph Databases 31:24 Optimizing Query Processors with Vectorization 33:25 Common Wisdom in Graph Databases 35:13 Introducing ASP Join in Kuzu 35:55 Factorization and Efficient Query Processing 39:49 Challenges and Solutions in Graph Databases 45:26 Write Path Optimization in Kuzu 54:10 Future Developments in Kuzu 57:51 Key Takeaways and Final Thoughts