
How AI Is Built
Real engineers. Real deployments. Zero hype. We interview the top engineers who actually put AI in production. Learn what the best engineers have figured out through years of experience. Hosted by Nicolay Gerold, CEO of Aisbach and CTO at Proxdeal and Multiply Content.
Latest episodes

May 20, 2025 • 1h 3min
#049 BAML: The Programming Language That Turns LLMs into Predictable Functions
In this discussion, Vaibhav Gupta, co-founder of Boundary, dives into BAML, a programming language designed to streamline AI pipelines. He emphasizes treating large language model (LLM) calls as typed functions, which enhances reliability and simplifies error handling. The podcast explores concepts like Schema-Aligned Parsing and the drawbacks of traditional JSON constraints. Vaibhav also discusses the importance of simplicity in programming and how BAML facilitates better interactions between technical and non-technical users, ensuring robust AI solutions.

May 20, 2025 • 1h 13min
#049 TAKEAWAYS BAML: The Programming Language That Turns LLMs into Predictable Functions
Dive into the fascinating world of AI with insights on treating large language models as predictable functions. Discover the importance of clear contracts for input and output to enhance reliability. The discussion also covers effective prompt engineering, including the benefits of simplicity and innovative symbol tuning techniques. Uncover the concept of Schema-Aligned Parsing to manage diverse data formats seamlessly. Plus, learn how to keep humans sharp in a field where outputs are often already correct!

May 13, 2025 • 7min
#048 TAKEAWAYS Why Your AI Agents Need Permission to Act, Not Just Read
The discussion centers on the necessity of human oversight in AI workflows. It reveals how AI can reach 90% accuracy but still falter in trust-sensitive tasks. The innovative approach involves adding a human approval layer for crucial actions. Dexter Horthy shares insights from his '12-factor agents' that serve as guiding principles for building reliable AI. They also explore the challenges of training LLMs toward mediocrity and the essential infrastructure needed for effective human-in-the-loop systems.

May 11, 2025 • 57min
#048 Why Your AI Agents Need Permission to Act, Not Just Read
Dexter Horthy, the Founder of Human Layer, discusses the importance of integrating human approval into AI actions to enhance trust and utility. He shares insights from his '12-factor agents' framework, emphasizing that AI should request permission before executing critical tasks. The conversation delves into the limitations of current AI capabilities, the challenges of managing human-in-the-loop systems, and the need for robust context engineering. Dexter's approach aims to strike a balance between automation and human oversight, revolutionizing how AI can operate in real-world scenarios.

Mar 27, 2025 • 57min
#047 Architecting Information for Search, Humans, and Artificial Intelligence
Jorge Arango, an expert in information architecture, shares insights on aligning systems with user mental models. He emphasizes that effective designs bridge user understanding and system data, creating learnable interfaces. Jorge discusses how contextual organization simplifies decision-making, tackling the paradox of choice. He also highlights the importance of progressive disclosure to accommodate users of varying expertise, and examines the transformative impact of large language models on search experiences.

Mar 13, 2025 • 53min
#046 Building a Search Database From First Principles
Modern search is broken. There are too many pieces that are glued together.Vector databases for semantic searchText engines for keywordsRerankers to fix the resultsLLMs to understand queriesMetadata filters for precisionEach piece works well alone.Together, they often become a mess.When you glue these systems together, you create:Data Consistency Gaps Your vector store knows about documents your text engine doesn't. Which is right?Timing Mismatches New content appears in one system before another. Users see different results depending on which path their query takes.Complexity Explosion Every new component doubles your integration points. Three components means three connections. Five means ten.Performance Bottlenecks Each hop between systems adds latency. A 200ms search becomes 800ms after passing through four components.Brittle Chains When one system fails, your entire search breaks. More pieces mean more breaking points.I recently built a system where we had query specific post-filters but the requirement to deliver a fixed number of results to the user.A lot of times, the query had to be run multiple times to achieve the desired amount.So we had an unpredictable latency. A high load on the backend, where some queries hammered the database 10+ times. A relevance cliff, where results 1-6 look great, but the later ones were poor matches.Today on How AI Is Built, we are talking to Marek Galovic from TopK.We talk about how they built a new search database with modern components. "How would search work if we built it today?”Cloud storage is cheap. Compute is fast. Memory is plentiful.One system that handles vectors, text, and filters together - not three systems duct-taped into one.One pass handles everything:Vector search + Text search + Filters → Single sorted result
Built with hand-optimized Rust kernels for both x86 and ARM, the system scales to 100M documents with 200ms P99 latency.The goal is to do search in 5 lines of code.Marek Galovic:LinkedInWebsiteTopK WebsiteTopK DocsNicolay Gerold:LinkedInX (Twitter)00:00 Introduction to TopK and Snowflake Comparison00:35 Architectural Patterns and Custom Formats01:30 Query Execution Engine Explained02:56 Distributed Systems and Rust04:12 Query Execution Process06:56 Custom File Formats for Search11:45 Handling Distributed Queries16:28 Consistency Models and Use Cases26:47 Exploring Database Versioning and Snapshots27:27 Performance Benchmarks: Rust vs. C/C++29:02 Scaling and Latency in Large Datasets29:39 GPU Acceleration and Use Cases31:04 Optimizing Search Relevance and Hybrid Search34:39 Advanced Search Features and Custom Scoring38:43 Future Directions and Research in AI47:11 Takeaways for Building AI Applications

Mar 6, 2025 • 1h 3min
#045 RAG As Two Things - Prompt Engineering and Search
In this discussion, John Berryman, an expert who transitioned from aerospace engineering to search and machine learning, explores the dual nature of retrieval-augmented generation (RAG). He emphasizes separating search from prompt engineering for optimal performance. Berryman shares insights on effective prompting strategies using familiar structures, testing human evaluations, and managing token limits. He dives into the differences between chat and completion models and highlights practical techniques for tackling AI applications and workflows. It's a deep dive into enhancing interactions with AI!

Feb 28, 2025 • 1h 4min
#044 Graphs Aren't Just For Specialists Anymore
Semih Salihoğlu, a key contributor to the Kuzu project, dives into the future of graph databases. He elaborates on Kuzu's columnar storage design, emphasizing its efficiency over traditional row-based systems. Discussion highlights include innovative vectorized query processing that boosts performance and enhances analytics. Salihoğlu also explains the challenge of many-to-many relationships and Kuzu's unique approaches to join algorithms, making complex queries faster and less resource-intensive. Overall, this conversation unveils exciting advancements in data management for modern applications.

Feb 20, 2025 • 1h 11min
#043 Knowledge Graphs Won't Fix Bad Data
Juan Sequeda, a Principal Scientist at data.world and an authority on knowledge graphs, shares his insights on improving data quality. He discusses the importance of integrating technical and business metadata to create a 'brain' for AI applications. Sequeda explains how traditional silos hinder effective data management and emphasizes the need for collaboration in startups. He also addresses the balance between automation and human oversight in knowledge graphs and outlines strategies for defining robust entities and relationships, ensuring accurate data connections.

Feb 13, 2025 • 1h 34min
#042 Temporal RAG, Embracing Time for Smarter, Reliable Knowledge Graphs
Daniel Davis is an expert on knowledge graphs. He has a background in risk assessment and complex systems—from aerospace to cybersecurity. Now he is working on “Temporal RAG” in TrustGraph.Time is a critical—but often ignored—dimension in data. Whether it’s threat intelligence, legal contracts, or API documentation, every data point has a temporal context that affects its reliability and usefulness. To manage this, systems must track when data is created, updated, or deleted, and ideally, preserve versions over time.Three Types of Data:Observations:Definition: Measurable, verifiable recordings (e.g., “the hat reads ‘Sunday Running Club’”).Characteristics: Require supporting evidence and may be updated as new data becomes available.Assertions:Definition: Subjective interpretations (e.g., “the hat is greenish”).Characteristics: Involve human judgment and come with confidence levels; they may change over time.Facts:Definition: Immutable, verified information that remains constant.Characteristics: Rare in dynamic environments because most data evolves; serve as the “bedrock” of trust.By clearly categorizing data into these buckets, systems can monitor freshness, detect staleness, and better manage dependencies between components (like code and its documentation).Integrating Temporal Data into Knowledge Graphs:Challenge:Traditional knowledge graphs and schemas (e.g., schema.org) rarely integrate time beyond basic metadata. Long documents may only provide a single timestamp, leaving the context of internal details untracked.Solution:Attach detailed temporal metadata (such as creation, update, and deletion timestamps) during data ingestion. Use versioning to maintain historical context. This allows systems to:Assess whether data is current or stale.Detect conflicts when updates occur.Employ Bayesian methods to adjust trust metrics as more information accumulates.Key Takeaways:Focus on Specialization:Build tools that do one thing well. For example, design a simple yet extensible knowledge graph rather than relying on overly complex ontologies.Integrate Temporal Metadata:Always timestamp data operations and version records. This is key to understanding data freshness and evolution.Adopt Robust Infrastructure:Use scalable, proven technologies to connect specialized modules via APIs. This reduces maintenance overhead compared to systems overloaded with connectors and extra features.Leverage Bayesian Updates:Start with initial trust metrics based on observed data and refine them as new evidence arrives.Mind the Big Picture:Avoid working in isolated silos. Emphasize a holistic system design that maintains in situ context and promotes collaboration across teams.Daniel DavisCognitive CoreTrustGraphYouTubeLinkedInDiscordNicolay Gerold:LinkedInX (Twitter)00:00 Introduction to Temporal Dimensions in Data 00:53 Timestamping and Versioning Data 01:35 Introducing Daniel Davis and Temporal RAG 01:58 Three Buckets of Data: Observations, Assertions, and Facts 03:22 Dynamic Data and Data Freshness 05:14 Challenges in Integrating Time in Knowledge Graphs 09:41 Defining Observations, Assertions, and Facts 12:57 The Role of Time in Data Trustworthiness 46:58 Chasing White Whales in AI 47:58 The Problem with Feature Overload 48:43 Connector Maintenance Challenges 50:02 The Swiss Army Knife Analogy 51:16 API Meshes and Glue Code 54:14 The Importance of Software Infrastructure 01:00:10 The Need for Specialized Tools 01:13:25 Outro and Future Plans