#044 Graphs Aren't Just For Specialists Anymore

22 snips

Feb 28, 2025

Semih Salihoğlu, a key contributor to the Kuzu project, dives into the future of graph databases. He elaborates on Kuzu's columnar storage design, emphasizing its efficiency over traditional row-based systems. Discussion highlights include innovative vectorized query processing that boosts performance and enhances analytics. Salihoğlu also explains the challenge of many-to-many relationships and Kuzu's unique approaches to join algorithms, making complex queries faster and less resource-intensive. Overall, this conversation unveils exciting advancements in data management for modern applications.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

ANECDOTE

Bauplan Use Case

Bauplan, a function-as-a-service company, uses Kuzu to optimize serverless applications.
They generate a graph of the computation, minimize data transfer, and debug within seconds using Kuzu.

ADVICE

Kuzu for Enhanced Performance

Consider Kuzu if existing graph databases lack performance.
Kuzu offers improved features and performance for graph data management.

INSIGHT

Scalability Challenges

A user survey revealed that scalability is a major challenge with graph databases.
Existing technology struggles with large datasets and complex traversals.

Get the Snipd Podcast app to discover more snips from this episode

Get the app

Kuzu is an embedded graph database that implements Cypher as a library.

It can be easily integrated into various environments—from scripts and Android apps to serverless platforms.

Its design supports both ephemeral, in-memory graphs (ideal for temporary computations) and large-scale persistent graphs where traditional systems struggle with performance and scalability.

Key Architectural Decisions:

Columnar Storage:
Kuzu stores node and relationship properties in separate, contiguous columns. This design reduces I/O by allowing queries to scan only the needed columns, unlike row-based systems (e.g., Neo4j) that read full records even when only a subset of properties is required.
Efficient Join Indexing with CSR:
The join index is maintained using a Compressed Sparse Row (CSR) format. By sorting and compressing relationship data, Kuzu ensures that adjacent node relationships are stored contiguously, minimizing random I/O and speeding up traversals.
Vectorized Query Processing:
Instead of processing one tuple at a time, Kuzu processes blocks (vectors) of tuples. This block-based (or vectorized) approach reduces function-call overhead and improves cache locality, boosting performance for analytic queries.
Factorization and ASP Join:
For many-to-many queries that can generate enormous intermediate results, Kuzu uses factorization to represent data compactly. Its ASP join algorithm integrates factorization, sequential scanning, and sideways information passing to avoid unnecessary full scans and materializations.

Kuzu is optimized for read-heavy, analytic workloads. While batched writes are efficient, the system is less tuned for high-frequency, small transactions. Upcoming features include:

A WebAssembly (Wasm) version for running in browsers.
Enhanced vector and full-text search indices.
Built-in graph data science algorithms for tasks like PageRank and centrality analysis.

Kuzu can be a powerful backend for AI applications in several ways:

Knowledge Graphs:
Store and query complex relationships between entities to support natural language understanding, semantic search, and reasoning tasks.
Graph Data Science:
Run built-in graph algorithms (like PageRank, centrality, or community detection) that help uncover patterns and insights, which can improve recommendation systems, fraud detection, and other AI-driven analyses.
Retrieval-Augmented Generation (RAG):
Integrate with large language models by efficiently retrieving relevant, structured graph data. Kuzu’s vector search capabilities and fast query processing make it ideal for augmenting AI responses with contextual information.
Graph Embeddings & ML Pipelines:
Serve as the foundation for generating graph embeddings, which are used in downstream machine learning tasks—such as clustering, classification, or link prediction—to enhance model performance.

Semih Salihoğlu:

Nicolay Gerold:

00:00 Introduction to Graph Databases
00:18 Introducing Kuzu: A Modern Graph Database
01:48 Use Cases and Applications of Kuzu
03:03 Kuzu's Research Origins and Scalability
06:18 Columnar Storage vs. Row-Oriented Storage
10:27 Query Processing Techniques in Kuzu
22:22 Compressed Sparse Row (CSR) Storage
27:25 Vectorization in Graph Databases
31:24 Optimizing Query Processors with Vectorization
33:25 Common Wisdom in Graph Databases
35:13 Introducing ASP Join in Kuzu
35:55 Factorization and Efficient Query Processing
39:49 Challenges and Solutions in Graph Databases
45:26 Write Path Optimization in Kuzu
54:10 Future Developments in Kuzu
57:51 Key Takeaways and Final Thoughts