Vanishing Gradients cover image

Vanishing Gradients

Episode 32: Building Reliable and Robust ML/AI Pipelines

Jul 27, 2024
Join Shreya Shankar, a UC Berkeley researcher specializing in human-centered data management systems, as she navigates the exciting world of large language models (LLMs). Discover her insights on the shift from traditional machine learning to LLMs and the importance of data quality over algorithm issues. Shreya shares her innovative SPaDE framework for improving AI evaluations and emphasizes the need for human oversight in AI development. Plus, explore the future of low-code tools and the fascinating concept of 'Habsburg AI' in recursive processes.
01:15:10

Episode guests

Podcast summary created with Snipd AI

Quick takeaways

  • Shreya emphasizes that many challenges in ML stem from data management rather than algorithmic issues, highlighting the need for robust data preparation.
  • The concept of data flywheels is crucial for enhancing LLM applications, advocating for continual iterative evaluation based on production data and human feedback.

Deep dives

Exploring the Groundwork of AI Pipelines

The podcast discusses the significance of building reliable AI pipelines, emphasizing that many challenges in machine learning stem from data management issues rather than purely algorithmic ones. Shreya Shankar highlights her experience as an ML engineer, noting that a majority of her job involved engineering and assessing data quality, while actual model training was minimal. This insight underlines the need for a robust focus on data preparation and engineering to ensure successful deployments of machine learning models. The discussion advocates for a broader recognition of data-centric AI approaches that prioritize data quality and management over mere model training.

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner