How AI Is Built

Nicolay Gerold

Real engineers. Real deployments. Zero hype. We interview the top engineers who actually put AI in production. Learn what the best engineers have figured out through years of experience. Hosted by Nicolay Gerold, CEO of Aisbach and CTO at Proxdeal and Multiply Content.

Episodes

Mentioned books

Mar 13, 2025 • 53min

#046 Building a Search Database From First Principles

Modern search is broken. There are too many pieces that are glued together.Vector databases for semantic searchText engines for keywordsRerankers to fix the resultsLLMs to understand queriesMetadata filters for precisionEach piece works well alone.Together, they often become a mess.When you glue these systems together, you create:Data Consistency Gaps Your vector store knows about documents your text engine doesn't. Which is right?Timing Mismatches New content appears in one system before another. Users see different results depending on which path their query takes.Complexity Explosion Every new component doubles your integration points. Three components means three connections. Five means ten.Performance Bottlenecks Each hop between systems adds latency. A 200ms search becomes 800ms after passing through four components.Brittle Chains When one system fails, your entire search breaks. More pieces mean more breaking points.I recently built a system where we had query specific post-filters but the requirement to deliver a fixed number of results to the user.A lot of times, the query had to be run multiple times to achieve the desired amount.So we had an unpredictable latency. A high load on the backend, where some queries hammered the database 10+ times. A relevance cliff, where results 1-6 look great, but the later ones were poor matches.Today on How AI Is Built, we are talking to Marek Galovic from TopK.We talk about how they built a new search database with modern components. "How would search work if we built it today?”Cloud storage is cheap. Compute is fast. Memory is plentiful.One system that handles vectors, text, and filters together - not three systems duct-taped into one.One pass handles everything:Vector search + Text search + Filters → Single sorted result Built with hand-optimized Rust kernels for both x86 and ARM, the system scales to 100M documents with 200ms P99 latency.The goal is to do search in 5 lines of code.Marek Galovic:LinkedInWebsiteTopK WebsiteTopK DocsNicolay Gerold:⁠LinkedIn⁠⁠X (Twitter)00:00 Introduction to TopK and Snowflake Comparison00:35 Architectural Patterns and Custom Formats01:30 Query Execution Engine Explained02:56 Distributed Systems and Rust04:12 Query Execution Process06:56 Custom File Formats for Search11:45 Handling Distributed Queries16:28 Consistency Models and Use Cases26:47 Exploring Database Versioning and Snapshots27:27 Performance Benchmarks: Rust vs. C/C++29:02 Scaling and Latency in Large Datasets29:39 GPU Acceleration and Use Cases31:04 Optimizing Search Relevance and Hybrid Search34:39 Advanced Search Features and Custom Scoring38:43 Future Directions and Research in AI47:11 Takeaways for Building AI Applications

Mar 6, 2025 • 1h 3min

#045 RAG As Two Things - Prompt Engineering and Search

In this discussion, John Berryman, an expert who transitioned from aerospace engineering to search and machine learning, explores the dual nature of retrieval-augmented generation (RAG). He emphasizes separating search from prompt engineering for optimal performance. Berryman shares insights on effective prompting strategies using familiar structures, testing human evaluations, and managing token limits. He dives into the differences between chat and completion models and highlights practical techniques for tackling AI applications and workflows. It's a deep dive into enhancing interactions with AI!

Feb 28, 2025 • 1h 4min

#044 Graphs Aren't Just For Specialists Anymore

Semih Salihoğlu, a key contributor to the Kuzu project, dives into the future of graph databases. He elaborates on Kuzu's columnar storage design, emphasizing its efficiency over traditional row-based systems. Discussion highlights include innovative vectorized query processing that boosts performance and enhances analytics. Salihoğlu also explains the challenge of many-to-many relationships and Kuzu's unique approaches to join algorithms, making complex queries faster and less resource-intensive. Overall, this conversation unveils exciting advancements in data management for modern applications.

Feb 20, 2025 • 1h 11min

#043 Knowledge Graphs Won't Fix Bad Data

Juan Sequeda, a Principal Scientist at data.world and an authority on knowledge graphs, shares his insights on improving data quality. He discusses the importance of integrating technical and business metadata to create a 'brain' for AI applications. Sequeda explains how traditional silos hinder effective data management and emphasizes the need for collaboration in startups. He also addresses the balance between automation and human oversight in knowledge graphs and outlines strategies for defining robust entities and relationships, ensuring accurate data connections.

Feb 13, 2025 • 1h 34min

#042 Temporal RAG, Embracing Time for Smarter, Reliable Knowledge Graphs

Daniel Davis is an expert on knowledge graphs. He has a background in risk assessment and complex systems—from aerospace to cybersecurity. Now he is working on “Temporal RAG” in TrustGraph.Time is a critical—but often ignored—dimension in data. Whether it’s threat intelligence, legal contracts, or API documentation, every data point has a temporal context that affects its reliability and usefulness. To manage this, systems must track when data is created, updated, or deleted, and ideally, preserve versions over time.Three Types of Data:Observations:Definition: Measurable, verifiable recordings (e.g., “the hat reads ‘Sunday Running Club’”).Characteristics: Require supporting evidence and may be updated as new data becomes available.Assertions:Definition: Subjective interpretations (e.g., “the hat is greenish”).Characteristics: Involve human judgment and come with confidence levels; they may change over time.Facts:Definition: Immutable, verified information that remains constant.Characteristics: Rare in dynamic environments because most data evolves; serve as the “bedrock” of trust.By clearly categorizing data into these buckets, systems can monitor freshness, detect staleness, and better manage dependencies between components (like code and its documentation).Integrating Temporal Data into Knowledge Graphs:Challenge:Traditional knowledge graphs and schemas (e.g., schema.org) rarely integrate time beyond basic metadata. Long documents may only provide a single timestamp, leaving the context of internal details untracked.Solution:Attach detailed temporal metadata (such as creation, update, and deletion timestamps) during data ingestion. Use versioning to maintain historical context. This allows systems to:Assess whether data is current or stale.Detect conflicts when updates occur.Employ Bayesian methods to adjust trust metrics as more information accumulates.Key Takeaways:Focus on Specialization:Build tools that do one thing well. For example, design a simple yet extensible knowledge graph rather than relying on overly complex ontologies.Integrate Temporal Metadata:Always timestamp data operations and version records. This is key to understanding data freshness and evolution.Adopt Robust Infrastructure:Use scalable, proven technologies to connect specialized modules via APIs. This reduces maintenance overhead compared to systems overloaded with connectors and extra features.Leverage Bayesian Updates:Start with initial trust metrics based on observed data and refine them as new evidence arrives.Mind the Big Picture:Avoid working in isolated silos. Emphasize a holistic system design that maintains in situ context and promotes collaboration across teams.Daniel DavisCognitive CoreTrustGraphYouTubeLinkedInDiscordNicolay Gerold:⁠LinkedIn⁠⁠X (Twitter)00:00 Introduction to Temporal Dimensions in Data 00:53 Timestamping and Versioning Data 01:35 Introducing Daniel Davis and Temporal RAG 01:58 Three Buckets of Data: Observations, Assertions, and Facts 03:22 Dynamic Data and Data Freshness 05:14 Challenges in Integrating Time in Knowledge Graphs 09:41 Defining Observations, Assertions, and Facts 12:57 The Role of Time in Data Trustworthiness 46:58 Chasing White Whales in AI 47:58 The Problem with Feature Overload 48:43 Connector Maintenance Challenges 50:02 The Swiss Army Knife Analogy 51:16 API Meshes and Glue Code 54:14 The Importance of Software Infrastructure 01:00:10 The Need for Specialized Tools 01:13:25 Outro and Future Plans

Feb 6, 2025 • 1h 34min

#041 Context Engineering, How Knowledge Graphs Help LLMs Reason

Robert Caulk, who leads Emergent Methods and has over 1,000 academic citations, dives into the fascinating world of knowledge graphs and their integration with large language models (LLMs). He discusses how these graphs help AI systems connect complex data relationships, enhancing reasoning accuracy. The conversation also touches on the challenges of multilingual entity extraction and the need for context engineering to improve AI-generated content. Additionally, Caulk shares insights into upcoming features for real-time event tracking and the future of project management tools.

Jan 31, 2025 • 52min

#040 Vector Database Quantization, Product, Binary, and Scalar

Zain Hasan, a former ML engineer at Weaviate and now a Senior AI/ML Engineer at Together, dives into the fascinating world of vector database quantization. He explains how quantization can drastically reduce storage costs, likening it to image compression. Zain discusses three quantization methods: binary, product, and scalar, each with unique trade-offs in precision and efficiency. He also addresses the speed and memory usage challenges of managing vector data, and hints at exciting future applications, including brain-computer interfaces.

Jan 23, 2025 • 53min

#039 Local-First Search, How to Push Search To End-Devices

Alex Garcia, a developer passionate about making vector search practical, discusses his creation, SQLiteVec. He emphasizes its lightweight design and how it simplifies local AI applications. The conversation reveals the efficiency of SQLiteVec's brute force searches, with impressive performance metrics at scale. Garcia also dives into challenges like data synchronization and fine-tuning embedding models. His insights on binary quantization and future innovations in local search highlight the evolution of user-friendly machine learning tools.

Jan 9, 2025 • 1h 14min

#038 AI-Powered Search, Context Is King, But Your RAG System Ignores Two-Thirds of It

Trey Grainger, author of 'AI-Powered Search' and an expert in search systems, joins the conversation to unravel the complexities of retrieval and generation in AI. He presents the concept of 'GARRAG,' where retrieval and generation enhance each other. Trey dives into the importance of user context, discussing how behavior signals improve search personalization. He shares insights on moving from simple vector similarity to advanced models and offers practical advice for engineers on choosing effective tools, promoting a structured, modular approach for better search results.

Jan 3, 2025 • 49min

#037 Chunking for RAG: Stop Breaking Your Documents Into Meaningless Pieces

Brandon Smith, a research engineer at Chroma known for his extensive work on chunking techniques for retrieval-augmented generation systems, shares his insights on optimizing semantic search. He discusses the common misconceptions surrounding chunk sizes and overlap, highlighting the challenges of maintaining context in dense content. Smith criticizes existing strategies, such as OpenAI's 800-token chunks, and emphasizes the importance of coherent parsing. He also introduces innovative approaches to enhance contextual integrity in document processing, paving the way for improved information retrieval.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

App store banner

Play store banner