How AI Is Built

Nicolay Gerold
undefined
11 snips
Jun 7, 2024 • 40min

#011 Mastering Vector Databases, Product & Binary Quantization, Multi-Vector Search

Expert Zain Hassan from Weaviate discusses vector databases, quantization techniques, and multi-vector search capabilities. They explore the future of multimodal search, brain-computer interfaces, and EEG foundation models. Learn how vector databases handle text, image, audio, and video data efficiently.
undefined
12 snips
May 31, 2024 • 46min

#010 Building Robust AI and Data Systems, Data Architecture, Data Quality, Data Storage

Data architect Anjan Banerjee discusses building complex AI and data systems, explaining data architecture with Lego analogies. Topics include selecting data tools, using Airflow for orchestration, incorporating AI for data processing, and analyzing Snowflake vs. Databricks solutions. The podcast also covers automating data integration for comprehensive customer views.
undefined
5 snips
May 24, 2024 • 28min

#009 Modern Data Infrastructure for Analytics and AI, Lakehouses, Open Source Data Stack

Jorrit Sandbrink, a data engineer, discusses lake house architecture blending data warehouse and lake, key components like Delta Lake and Apache Spark, optimizations with partitioning strategies, and data ingress with DLT. The podcast emphasizes open-source solutions, considerations in choosing tools, and the evolving data landscape.
undefined
12 snips
May 20, 2024 • 37min

#008 Knowledge Graphs for Better RAG, Virtual Entities, Hybrid Data Models

Kirk Marple, CEO of Graphlit, discusses using knowledge graphs for enhanced information retrieval, a hybrid data model creating virtual entities, entity extraction using Azure Cognitive Services, metadata-first approach for better data indexing, and challenges in knowledge graph development.
undefined
15 snips
May 17, 2024 • 38min

#007 Navigating the Modern Data Stack, Choosing the Right OSS Tools, From Problem to Requirements to Architecture

Data engineering expert Nicolay Gerold and software-defined assets expert Jon Erich Kemi Warghed discuss selecting the right tools, implementing data governance, and the concept of software-defined assets. They highlight the importance of data governance, open source tooling, agile data platforms, and software-defined assets like Dagster for simplifying data orchestration and creating business value.
undefined
6 snips
May 10, 2024 • 33min

#006 Data Orchestration Tools, Choosing the right one for your needs

John Wessel, founder of Agreeable Data, discusses the evolution of data orchestration tools, the popularity of Apache Airflow, and the challenges of choosing the right orchestrator. They also explore the components of a data orchestrator, the role of AI in data orchestration, managing orchestrators, monitoring, and the future of orchestration tools.
undefined
7 snips
May 3, 2024 • 30min

#005 Building Reliable LLM Applications, Production-Ready RAG, Data-Driven Evals

Creators of Ragas, Shahul and Jithin, discuss challenges in building LLM applications, emphasizing the importance of evaluation, data quality, and continuous RAG evolution. Practical takeaways include starting with a solid testing strategy and embracing synthetic data to automate test data set creation.
undefined
Apr 29, 2024 • 22min

Lance v2: Rethinking Columnar Storage for Faster Lookups, Nulls, and Flexible Encodings | changelog 2

Weston Pace discusses LanceDB V2, a vector database with new file format enhancing columnar storage for multimodal datasets. Goals include null value support, multimodal data handling, and optimal search performance. Lance V2 allows efficient storage of large data without memory hogging. Benefits of Arrow integration and custom encodings in Python for experimentation.
undefined
9 snips
Apr 26, 2024 • 32min

#004 AI with Supabase, Postgres Configuration, Real-Time Processing, and more

Christopher Williams, Solutions Architect at Supabase, discusses optimizing Postgres for AI, core components powering real-time solutions, PG Vector magic, and Supabase's future features. Topics include setting up Postgres for AI, real-time processing, Postgres extensions, and the future roadmap of Supabase.
undefined
7 snips
Apr 19, 2024 • 36min

#003 AI Inside Your Database, Real-Time AI, Declarative ML/AI

Learn how SuperDuperDB simplifies AI integration into databases, enabling real-time computation for instant data updates. Explore the benefits of embeddings and classifications, future plans for AI-powered databases, and the framework for configuring AI workflows. Discover the challenges in computing embeddings, handling text chunks, declarative machine learning, real-time feature calculation, and advancements in model deployment.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app