AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Modern search is broken. There are too many pieces that are glued together.
Each piece works well alone.
Together, they often become a mess.
When you glue these systems together, you create:
I recently built a system where we had query specific post-filters but the requirement to deliver a fixed number of results to the user.
A lot of times, the query had to be run multiple times to achieve the desired amount.
So we had an unpredictable latency. A high load on the backend, where some queries hammered the database 10+ times. A relevance cliff, where results 1-6 look great, but the later ones were poor matches.
Today on How AI Is Built, we are talking to Marek Galovic from TopK.
We talk about how they built a new search database with modern components. "How would search work if we built it today?”
Cloud storage is cheap. Compute is fast. Memory is plentiful.
One system that handles vectors, text, and filters together - not three systems duct-taped into one.
One pass handles everything:
Vector search + Text search + Filters → Single sorted resultBuilt with hand-optimized Rust kernels for both x86 and ARM, the system scales to 100M documents with 200ms P99 latency.
The goal is to do search in 5 lines of code.
Marek Galovic:
Nicolay Gerold:
00:00 Introduction to TopK and Snowflake Comparison
00:35 Architectural Patterns and Custom Formats
01:30 Query Execution Engine Explained
02:56 Distributed Systems and Rust
04:12 Query Execution Process
06:56 Custom File Formats for Search
11:45 Handling Distributed Queries
16:28 Consistency Models and Use Cases
26:47 Exploring Database Versioning and Snapshots
27:27 Performance Benchmarks: Rust vs. C/C++
29:02 Scaling and Latency in Large Datasets
29:39 GPU Acceleration and Use Cases
31:04 Optimizing Search Relevance and Hybrid Search
34:39 Advanced Search Features and Custom Scoring
38:43 Future Directions and Research in AI
47:11 Takeaways for Building AI Applications