How AI Is Built  cover image

How AI Is Built

Latest episodes

undefined
Mar 27, 2025 • 57min

Architecting Information for Search, Humans, and Artificial Intelligence | S2 E30

Jorge Arango, an expert in information architecture, shares insights on aligning systems with user mental models. He emphasizes that effective designs bridge user understanding and system data, creating learnable interfaces. Jorge discusses how contextual organization simplifies decision-making, tackling the paradox of choice. He also highlights the importance of progressive disclosure to accommodate users of varying expertise, and examines the transformative impact of large language models on search experiences.
undefined
Mar 13, 2025 • 53min

Search in 5 lines of code. Building a search database from first principles | S2 E29

Modern search is broken. There are too many pieces that are glued together.Vector databases for semantic searchText engines for keywordsRerankers to fix the resultsLLMs to understand queriesMetadata filters for precisionEach piece works well alone.Together, they often become a mess.When you glue these systems together, you create:Data Consistency Gaps Your vector store knows about documents your text engine doesn't. Which is right?Timing Mismatches New content appears in one system before another. Users see different results depending on which path their query takes.Complexity Explosion Every new component doubles your integration points. Three components means three connections. Five means ten.Performance Bottlenecks Each hop between systems adds latency. A 200ms search becomes 800ms after passing through four components.Brittle Chains When one system fails, your entire search breaks. More pieces mean more breaking points.I recently built a system where we had query specific post-filters but the requirement to deliver a fixed number of results to the user.A lot of times, the query had to be run multiple times to achieve the desired amount.So we had an unpredictable latency. A high load on the backend, where some queries hammered the database 10+ times. A relevance cliff, where results 1-6 look great, but the later ones were poor matches.Today on How AI Is Built, we are talking to Marek Galovic from TopK.We talk about how they built a new search database with modern components. "How would search work if we built it today?”Cloud storage is cheap. Compute is fast. Memory is plentiful.One system that handles vectors, text, and filters together - not three systems duct-taped into one.One pass handles everything:Vector search + Text search + Filters → Single sorted result Built with hand-optimized Rust kernels for both x86 and ARM, the system scales to 100M documents with 200ms P99 latency.The goal is to do search in 5 lines of code.Marek Galovic:LinkedInWebsiteTopK WebsiteTopK DocsNicolay Gerold:⁠LinkedIn⁠⁠X (Twitter)00:00 Introduction to TopK and Snowflake Comparison00:35 Architectural Patterns and Custom Formats01:30 Query Execution Engine Explained02:56 Distributed Systems and Rust04:12 Query Execution Process06:56 Custom File Formats for Search11:45 Handling Distributed Queries16:28 Consistency Models and Use Cases26:47 Exploring Database Versioning and Snapshots27:27 Performance Benchmarks: Rust vs. C/C++29:02 Scaling and Latency in Large Datasets29:39 GPU Acceleration and Use Cases31:04 Optimizing Search Relevance and Hybrid Search34:39 Advanced Search Features and Custom Scoring38:43 Future Directions and Research in AI47:11 Takeaways for Building AI Applications
undefined
Mar 6, 2025 • 1h 3min

RAG is two things. Prompt Engineering and Search. Keep it Separate | S2 E28

In this discussion, John Berryman, an expert who transitioned from aerospace engineering to search and machine learning, explores the dual nature of retrieval-augmented generation (RAG). He emphasizes separating search from prompt engineering for optimal performance. Berryman shares insights on effective prompting strategies using familiar structures, testing human evaluations, and managing token limits. He dives into the differences between chat and completion models and highlights practical techniques for tackling AI applications and workflows. It's a deep dive into enhancing interactions with AI!
undefined
Feb 28, 2025 • 1h 4min

Graphs aren't just for specialists anymore. They are one import away | S2 E27

Semih Salihoğlu, a key contributor to the Kuzu project, dives into the future of graph databases. He elaborates on Kuzu's columnar storage design, emphasizing its efficiency over traditional row-based systems. Discussion highlights include innovative vectorized query processing that boosts performance and enhances analytics. Salihoğlu also explains the challenge of many-to-many relationships and Kuzu's unique approaches to join algorithms, making complex queries faster and less resource-intensive. Overall, this conversation unveils exciting advancements in data management for modern applications.
undefined
Feb 20, 2025 • 1h 11min

Knowledge Graphs Won't Fix Bad Data | S2 E26

Juan Sequeda, a Principal Scientist at data.world and an authority on knowledge graphs, shares his insights on improving data quality. He discusses the importance of integrating technical and business metadata to create a 'brain' for AI applications. Sequeda explains how traditional silos hinder effective data management and emphasizes the need for collaboration in startups. He also addresses the balance between automation and human oversight in knowledge graphs and outlines strategies for defining robust entities and relationships, ensuring accurate data connections.
undefined
Feb 13, 2025 • 1h 34min

Temporal RAG: Embracing Time for Smarter, Reliable Knowledge Graphs | S2 E25

Daniel Davis is an expert on knowledge graphs. He has a background in risk assessment and complex systems—from aerospace to cybersecurity. Now he is working on “Temporal RAG” in TrustGraph.Time is a critical—but often ignored—dimension in data. Whether it’s threat intelligence, legal contracts, or API documentation, every data point has a temporal context that affects its reliability and usefulness. To manage this, systems must track when data is created, updated, or deleted, and ideally, preserve versions over time.Three Types of Data:Observations:Definition: Measurable, verifiable recordings (e.g., “the hat reads ‘Sunday Running Club’”).Characteristics: Require supporting evidence and may be updated as new data becomes available.Assertions:Definition: Subjective interpretations (e.g., “the hat is greenish”).Characteristics: Involve human judgment and come with confidence levels; they may change over time.Facts:Definition: Immutable, verified information that remains constant.Characteristics: Rare in dynamic environments because most data evolves; serve as the “bedrock” of trust.By clearly categorizing data into these buckets, systems can monitor freshness, detect staleness, and better manage dependencies between components (like code and its documentation).Integrating Temporal Data into Knowledge Graphs:Challenge:Traditional knowledge graphs and schemas (e.g., schema.org) rarely integrate time beyond basic metadata. Long documents may only provide a single timestamp, leaving the context of internal details untracked.Solution:Attach detailed temporal metadata (such as creation, update, and deletion timestamps) during data ingestion. Use versioning to maintain historical context. This allows systems to:Assess whether data is current or stale.Detect conflicts when updates occur.Employ Bayesian methods to adjust trust metrics as more information accumulates.Key Takeaways:Focus on Specialization:Build tools that do one thing well. For example, design a simple yet extensible knowledge graph rather than relying on overly complex ontologies.Integrate Temporal Metadata:Always timestamp data operations and version records. This is key to understanding data freshness and evolution.Adopt Robust Infrastructure:Use scalable, proven technologies to connect specialized modules via APIs. This reduces maintenance overhead compared to systems overloaded with connectors and extra features.Leverage Bayesian Updates:Start with initial trust metrics based on observed data and refine them as new evidence arrives.Mind the Big Picture:Avoid working in isolated silos. Emphasize a holistic system design that maintains in situ context and promotes collaboration across teams.Daniel DavisCognitive CoreTrustGraphYouTubeLinkedInDiscordNicolay Gerold:⁠LinkedIn⁠⁠X (Twitter)00:00 Introduction to Temporal Dimensions in Data 00:53 Timestamping and Versioning Data 01:35 Introducing Daniel Davis and Temporal RAG 01:58 Three Buckets of Data: Observations, Assertions, and Facts 03:22 Dynamic Data and Data Freshness 05:14 Challenges in Integrating Time in Knowledge Graphs 09:41 Defining Observations, Assertions, and Facts 12:57 The Role of Time in Data Trustworthiness 46:58 Chasing White Whales in AI 47:58 The Problem with Feature Overload 48:43 Connector Maintenance Challenges 50:02 The Swiss Army Knife Analogy 51:16 API Meshes and Glue Code 54:14 The Importance of Software Infrastructure 01:00:10 The Need for Specialized Tools 01:13:25 Outro and Future Plans
undefined
Feb 6, 2025 • 1h 34min

Context is King: How Knowledge Graphs Help LLMs Reason

Robert Caulk, who leads Emergent Methods and has over 1,000 academic citations, dives into the fascinating world of knowledge graphs and their integration with large language models (LLMs). He discusses how these graphs help AI systems connect complex data relationships, enhancing reasoning accuracy. The conversation also touches on the challenges of multilingual entity extraction and the need for context engineering to improve AI-generated content. Additionally, Caulk shares insights into upcoming features for real-time event tracking and the future of project management tools.
undefined
5 snips
Jan 31, 2025 • 52min

Inside Vector Database Quantization: Product, Binary, and Scalar | S2 E23

Zain Hasan, a former ML engineer at Weaviate and now a Senior AI/ML Engineer at Together, dives into the fascinating world of vector database quantization. He explains how quantization can drastically reduce storage costs, likening it to image compression. Zain discusses three quantization methods: binary, product, and scalar, each with unique trade-offs in precision and efficiency. He also addresses the speed and memory usage challenges of managing vector data, and hints at exciting future applications, including brain-computer interfaces.
undefined
Jan 23, 2025 • 53min

Local-First Search: How to Push Search To End-Devices | S2 E22

Alex Garcia, a developer passionate about making vector search practical, discusses his creation, SQLiteVec. He emphasizes its lightweight design and how it simplifies local AI applications. The conversation reveals the efficiency of SQLiteVec's brute force searches, with impressive performance metrics at scale. Garcia also dives into challenges like data synchronization and fine-tuning embedding models. His insights on binary quantization and future innovations in local search highlight the evolution of user-friendly machine learning tools.
undefined
21 snips
Jan 9, 2025 • 1h 14min

AI-Powered Search: Context Is King, But Your RAG System Ignores Two-Thirds of It | S2 E21

Trey Grainger, author of 'AI-Powered Search' and an expert in search systems, joins the conversation to unravel the complexities of retrieval and generation in AI. He presents the concept of 'GARRAG,' where retrieval and generation enhance each other. Trey dives into the importance of user context, discussing how behavior signals improve search personalization. He shares insights on moving from simple vector similarity to advanced models and offers practical advice for engineers on choosing effective tools, promoting a structured, modular approach for better search results.

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode