How AI Is Built

Nicolay Gerold
undefined
12 snips
May 20, 2024 • 37min

#008 Knowledge Graphs for Better RAG, Virtual Entities, Hybrid Data Models

Kirk Marple, CEO of Graphlit, discusses using knowledge graphs for enhanced information retrieval, a hybrid data model creating virtual entities, entity extraction using Azure Cognitive Services, metadata-first approach for better data indexing, and challenges in knowledge graph development.
undefined
15 snips
May 17, 2024 • 38min

#007 Navigating the Modern Data Stack, Choosing the Right OSS Tools, From Problem to Requirements to Architecture

Data engineering expert Nicolay Gerold and software-defined assets expert Jon Erich Kemi Warghed discuss selecting the right tools, implementing data governance, and the concept of software-defined assets. They highlight the importance of data governance, open source tooling, agile data platforms, and software-defined assets like Dagster for simplifying data orchestration and creating business value.
undefined
6 snips
May 10, 2024 • 33min

#006 Data Orchestration Tools, Choosing the right one for your needs

John Wessel, founder of Agreeable Data, discusses the evolution of data orchestration tools, the popularity of Apache Airflow, and the challenges of choosing the right orchestrator. They also explore the components of a data orchestrator, the role of AI in data orchestration, managing orchestrators, monitoring, and the future of orchestration tools.
undefined
7 snips
May 3, 2024 • 30min

#005 Building Reliable LLM Applications, Production-Ready RAG, Data-Driven Evals

Creators of Ragas, Shahul and Jithin, discuss challenges in building LLM applications, emphasizing the importance of evaluation, data quality, and continuous RAG evolution. Practical takeaways include starting with a solid testing strategy and embracing synthetic data to automate test data set creation.
undefined
Apr 29, 2024 • 22min

Lance v2: Rethinking Columnar Storage for Faster Lookups, Nulls, and Flexible Encodings | changelog 2

Weston Pace discusses LanceDB V2, a vector database with new file format enhancing columnar storage for multimodal datasets. Goals include null value support, multimodal data handling, and optimal search performance. Lance V2 allows efficient storage of large data without memory hogging. Benefits of Arrow integration and custom encodings in Python for experimentation.
undefined
9 snips
Apr 26, 2024 • 32min

#004 AI with Supabase, Postgres Configuration, Real-Time Processing, and more

Christopher Williams, Solutions Architect at Supabase, discusses optimizing Postgres for AI, core components powering real-time solutions, PG Vector magic, and Supabase's future features. Topics include setting up Postgres for AI, real-time processing, Postgres extensions, and the future roadmap of Supabase.
undefined
7 snips
Apr 19, 2024 • 36min

#003 AI Inside Your Database, Real-Time AI, Declarative ML/AI

Learn how SuperDuperDB simplifies AI integration into databases, enabling real-time computation for instant data updates. Explore the benefits of embeddings and classifications, future plans for AI-powered databases, and the framework for configuring AI workflows. Discover the challenges in computing embeddings, handling text chunks, declarative machine learning, real-time feature calculation, and advancements in model deployment.
undefined
Apr 17, 2024 • 14min

Supabase acquires OrioleDB, A New Database Engine for PostgreSQL | changelog 1

Supabase acquired OrioleDB, a new storage engine for PostgreSQL. Oriole uses an UNDO log for efficient updates and reduced storage. It offers performance boosters like data compression and easy integration with data lakes. The podcast discusses the benefits of OrioleDB for high throughput databases and potential new use cases.
undefined
7 snips
Apr 12, 2024 • 37min

#002 AI Powered Data Transformation, Combining gen & trad AI, Semantic Validation

Antonio Bustamante, a serial entrepreneur, talks about building bem.ai, a data tool for AI and software. Topics include challenges of integrating semi-structured data, using LLMs in data transformation, reliability of data infrastructure, and interoperability layers for systems.
undefined
Apr 5, 2024 • 34min

#001 Multimodal AI, Storing 1 Billion Vectors, Building Data Infrastructure at LanceDB

Explore how LanceDB, a database for AI, revolutionizes data infrastructure with Rust, enabling multimodal AI and billion-scale vector search. Learn about its performance surpassing Parquet, embedding the internet, and optimizing data for AI engineers' ease. Dive into the future of LanceDB for AI lifecycles and surprising use cases, offering faster experimentation and model database enhancements.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app