How AI Is Built  cover image

How AI Is Built

Latest episodes

undefined
Dec 13, 2024 • 46min

A Search System That Learns As You Use It (Agentic RAG) | S2 E18

Stephen Batifol, an expert in Agentic RAG and advanced search technology, dives into the future of search systems. He discusses how modern retrieval-augmented generation (RAG) systems smartly match queries to the most suitable tools, utilizing a mix of methods. Batifol emphasizes the importance of metadata and modular design in creating effective search workflows. The conversation touches on adaptive AI capabilities for query refinement and the significance of user feedback in improving system accuracy. He also addresses the challenges of ambiguity in user queries, highlighting the need for innovative filtering techniques.
undefined
Dec 5, 2024 • 47min

Rethinking Search Inside Postgres, From Lexemes to BM25

Many companies use Elastic or OpenSearch and use 10% of the capacity.They have to build ETL pipelines.Get data Normalized.Worry about race conditions.All in all. At the moment, when you want to do search on top of your transactional data, you are forced to build a distributed systems.Not anymore.ParadeDB is building an open-source PostgreSQL extension to enable search within your database.Today, I am talking to Philippe Noël, the founder and CEO of ParadeDB.We talk about how they build it, how they integrate into the Postgres Query engines, and how you can build search on top of Postgres.Key Insights:Search is changing. We're moving from separate search clusters to search inside databases. Simpler architecture, stronger guarantees, lower costs up to a certain scale.Most search engines force you to duplicate data. ParadeDB doesn't. You keep data normalized and join at query time. It hooks deep into Postgres's query planner. It doesn't just bolt on search - it lets Postgres optimize search queries alongside SQL ones.Search indices can work with ACID. ParadeDB's BM25 index keeps Lucene-style components (term frequency, normalization) but adds Postgres metadata for transactions. Search + ACID is possible.Two storage types matter: inverted indices for text, columnar "fast fields" for analytics. Pick the right one or queries get slow. Integers now default to columnar to prevent common mistakes.Mixing query engines looks tempting but fails. The team tried using DuckDB and DataFusion inside Postgres. Both were fast but broke ACID compliance. They had to rebuild features natively.Philippe Noël:LinkedInBlueskyParadeDBNicolay Gerold:⁠LinkedIn⁠⁠X (Twitter)Bluesky00:00 Introduction to ParadeDB 00:53 Building ParadeDB with Rust 01:43 Integrating Search in Postgres 03:04 ParadeDB vs. Elastic 05:48 Technical Deep Dive: Postgres Integration 07:27 Challenges and Solutions 09:35 Transactional Safety and Performance 11:06 Composable Data Systems 15:26 Columnar Storage and Analytics 20:54 Case Study: Alibaba Cloud 21:57 Data Warehouse Context 23:24 Custom Indexing with BM25 24:01 Postgres Indexing Overview 24:17 Fast Fields and Columnar Format 24:52 Lucene Inspiration and Data Storage 26:06 Setting Up and Managing Indexes 27:43 Query Building and Complex Searches 30:21 Scaling and Sharding Strategies 35:27 Query Optimization and Common Mistakes 38:39 Future Developments and Integrations 39:24 Building a Full-Fledged Search Application 42:53 Challenges and Advantages of Using ParadeDB 46:43 Final Thoughts and Recommendations
undefined
14 snips
Nov 28, 2024 • 51min

RAG's Biggest Problems & How to Fix It (ft. Synthetic Data) | S2 E16

Saahil Ognawala, Head of Product at Jina AI and expert in RAG systems, dives deep into the complexities of retrieval augmented generation. He reveals why RAG systems often falter in production and how strategic testing and synthetic data can enhance performance. The conversation covers the vital role of user intent, evaluation metrics, and the balancing act between real and synthetic data. Saahil also emphasizes the importance of continuous user feedback and the need for robust evaluation frameworks to fine-tune AI models effectively.
undefined
Nov 21, 2024 • 47min

From Ambiguous to AI-Ready: Improving Documentation Quality for RAG Systems | S2 E15

Max Buckley, a Google expert in LLM experimentation, dives into the hidden dangers of poor documentation in RAG systems. He explains how even one ambiguous sentence can skew an entire knowledge base. Max emphasizes the challenge of identifying such "documentation poisons" and discusses the importance of multiple feedback loops for quality control. He highlights unique linguistic ecosystems in large organizations and shares insights on enhancing documentation clarity and consistency to improve AI outputs.
undefined
5 snips
Nov 15, 2024 • 54min

BM25 is the workhorse of search; vectors are its visionary cousin | S2 E14

David Tippett, a search engineer at GitHub with expertise in BM25 and OpenSearch, delves into the efficiency of BM25 versus vector search for information retrieval. He explains how BM25 refines search by factoring in user expectations and adapting to diverse queries. The conversation highlights the challenges of vector search at scale, particularly with GitHub's massive dataset. David emphasizes that understanding user intent is crucial for optimizing search results, as it surpasses merely chasing cutting-edge technology.
undefined
Nov 7, 2024 • 36min

Vector Search at Scale: Why One Size Doesn't Fit All | S2 E13

Join Charles Xie, founder and CEO of Zilliz and pioneer behind the Milvus vector database, as he unpacks the complexities of scaling vector search systems. He discusses why vector search slows down at scale and introduces a multi-tier storage strategy that optimizes performance. Charles reveals innovative solutions like real-time search buffers and GPU acceleration to handle massive queries efficiently. He also dives into the future of search technology, including self-learning indices and hybrid search methods that promise to elevate data retrieval.
undefined
Oct 31, 2024 • 55min

Search Systems at Scale: Avoiding Local Maxima and Other Engineering Lessons | S2 E12

Stuart Cam and Russ Cam, seasoned search infrastructure experts from Elastic and Canva, dive into the complexities of modern search systems. They discuss the integration of traditional text search with vector capabilities for better outcomes. The conversation emphasizes the importance of systematic relevancy testing and avoiding local maxima traps, where improving one query can harm others. They also explore the critical balance needed between performance, cost, and indexing strategies, including practical insights into architecting effective search pipelines.
undefined
Oct 25, 2024 • 49min

Training Multi-Modal AI: Inside the Jina CLIP Embedding Model | S2 E11

Today we are talking to Michael Günther, a senior machine learning scientist at Jina about his work on JINA Clip.Some key points:Uni-modal embeddings convert a single type of input (text, images, audio) into vectorsMultimodal embeddings learn a joint embedding space that can handle multiple types of input, enabling cross-modal search (e.g., searching images with text)Multimodal models can potentially learn richer representations of the world, including concepts that are difficult or impossible to put into wordsTypes of Text-Image ModelsCLIP-like ModelsSeparate vision and text transformer modelsEach tower maps inputs to a shared vector spaceOptimized for efficient retrievalVision-Language ModelsProcess image patches as tokensUse transformer architecture to combine image and text informationBetter suited for complex document matchingHybrid ModelsCombine separate encoders with additional transformer componentsAllow for more complex interactions between modalitiesExample: Google's Magic Lens modelTraining Insights from Jina CLIPKey LearningsFreezing the text encoder during training can significantly hinder performanceShort image captions limit the model's ability to learn rich text representationsLarge batch sizes are crucial for training embedding models effectivelyTraining ProcessThree-stage training approach: Stage 1: Training on image captions and text pairsStage 2: Adding longer image captionsStage 3: Including triplet data with hard negativesPractical ConsiderationsSimilarity ScalesDifferent modalities can produce different similarity value scalesImportant to consider when combining multiple embedding typesCan affect threshold-based filteringModel SelectionEvaluate models based on relevant benchmarksConsider the domain similarity between training data and intended use caseAssessment of computational requirements and efficiency needsFuture DirectionsAreas for DevelopmentMore comprehensive benchmarks for multimodal tasksBetter support for semi-structured dataImproved handling of non-photographic imagesUpcoming Developments at Jina AIMultilingual support for Jina ColBERTNew version of text embedding modelsFocus on complex multimodal search applicationsPractical ApplicationsE-commerceProduct search and recommendationsCombined text-image embeddings for better resultsSynthetic data generation for fine-tuningFine-tuning StrategiesUsing click data and query logsGenerative pseudo-labeling for creating training dataDomain-specific adaptationsKey Takeaways for EngineersBe aware of similarity value scales and their implicationsEstablish quantitative evaluation metrics before optimizationConsider model limitations (e.g., image resolution, text length)Use performance optimizations like flash attention and activation checkpointingUniversal embedding models might not be optimal for specific use casesMichael GuentherLinkedInX (Twitter)Jina AINew Multilingual Embedding ModalNicolay Gerold:⁠LinkedIn⁠⁠X (Twitter)00:00 Introduction to Uni-modal and Multimodal Embeddings 00:16 Exploring Multimodal Embeddings and Their Applications 01:06 Training Multimodal Embedding Models 02:21 Challenges and Solutions in Embedding Models 07:29 Advanced Techniques and Future Directions 29:19 Understanding Model Interference in Search Specialization 30:17 Fine-Tuning Jina CLIP for E-Commerce 32:18 Synthetic Data Generation and Pseudo-Labeling 33:36 Challenges and Learnings in Embedding Models 40:52 Future Directions and Takeaways
undefined
Oct 23, 2024 • 45min

Building the database for AI, Multi-modal AI, Multi-modal Storage | S2 E10

Chang She, CEO of Lens and co-creator of the Pandas library, shares insights on building LanceDB for AI data management. He discusses how LanceDB tackles data bottlenecks and speeds up machine learning experiments with unstructured data. The conversation dives into the decision to use Rust for enhanced performance, achieving up to 1,000 times faster results than Parquet. Chang also explores multimodal AI's challenges, future applications of LanceDB in recommendation systems, and the vision for more composable data infrastructures.
undefined
Oct 10, 2024 • 47min

Numbers, categories, locations, images, text. How to embed the world? | S2 E9

Mór Kapronczay, Head of ML at Superlinked, unpacks the nuances of embeddings beyond just text. He emphasizes that traditional text embeddings fall short, especially with complex data. Mór introduces multi-modal embeddings that integrate various data types, improving search relevance and user experiences. He also discusses challenges in embedding numerical data, suggesting innovative methods like logarithmic transformations. The conversation delves into balancing speed and accuracy in vector searches, highlighting the dynamic nature of real-time data prioritization.

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode