
Data Brew by Databricks
Welcome to Data Brew by Databricks with Denny and Brooke! In this series, we explore various topics in the data and AI community and interview subject matter experts in data engineering/data science. So join us with your morning brew in hand and get ready to dive deep into data + AI! For this first season, we will be focusing on lakehouses – combining the key features of data warehouses, such as ACID transactions, with the scalability of data lakes, directly against low-cost object stores.
Latest episodes

Jun 17, 2021 • 36min
Data Brew Season 2 Episode 6: AutoML
For our second season of Data Brew, we will be focusing on machine learning, from research to production. We will interview folks in academia and industry to discuss topics such as data ethics, production-grade infrastructure for ML, hyperparameter tuning, AutoML, and many more.Erin LeDell shares valuable insight on AutoML, what problems are best solved by it, its current limitations, and her thoughts on the future of AutoML. We also discuss founding and growing the Women in Machine Learning and Data Science (WiMLDS) non-profit.See more at databricks.com/data-brew

Jun 10, 2021 • 33min
Data Brew Season 2 Episode 5: ML Applications
For our second season of Data Brew, we will be focusing on machine learning, from research to production. We will interview folks in academia and industry to discuss topics such as data ethics, production-grade infrastructure for ML, hyperparameter tuning, AutoML, and many more.Good machine learning starts with high quality data. Irina Malkova shares her experience managing and ensuring high-fidelity data, developing custom metrics to satisfy business needs, and discusses how to improve internal decision making processes.See more at databricks.com/data-brew

May 13, 2021 • 33min
Data Brew Season 2 Episode 4: Hyperparameter and Neural Architecture Search
For our second season of Data Brew, we will be focusing on machine learning, from research to production. We will interview folks in academia and industry to discuss topics such as data ethics, production-grade infrastructure for ML, hyperparameter tuning, AutoML, and many more.Liam Li is a leading researcher in the fields of hyperparameter optimization and neural architecture search, and is the author of the seminal Hyperband paper. In this session, Liam discusses the evolution of hyperparameter optimization techniques and illustrates how every data scientist can benefit from neural architecture search. See more at databricks.com/data-brew

May 5, 2021 • 31min
Data Brew Season 2 Episode 3: Infrastructure for ML
For our second season of Data Brew, we will be focusing on machine learning, from research to production. We will interview folks in academia and industry to discuss topics such as data ethics, production-grade infrastructure for ML, hyperparameter tuning, AutoML, and many more. Adam Oliner discusses how to design your infrastructure to support ML, from integration tests to glue code, the importance of iteration, and centralized vs decentralized data science teams. He provides valuable advice for companies investing in ML and crucial lessons he’s learned from founding two companies.See more at databricks.com/data-brew

Apr 28, 2021 • 26min
Data Brew Season 2 Episode 2: Data Ethics
The podcast discusses topics such as data ethics, fair lending practices, adversarial debiasing, responsible AI, the power of SHAP in explaining models, and various figures in the field of data ethics.

Apr 22, 2021 • 31min
Data Brew Season 2 Episode 1: ML in Production
For our second season, we will be focusing on machine learning, from research to production. We will interview folks in academia and industry to discuss topics such as data ethics, production-grade infrastructure for ML, hyperparameter tuning, AutoML, and many more.In the season opener, Matei Zaharia discusses how he entered the field of ML, best practices for productionizing ML pipelines, leveraging MLflow & the Lakehouse architecture for reproducible ML, and his current research in this field.See more at databricks.com/data-brew

Feb 18, 2021 • 40min
Data Brew Season 1 Episode 6: Journey of Big Data
Speakers discuss their personal journeys into big data, the advantages of using structured APIs and structured streaming, the importance of structured data and excitement for learning, the challenges and ethical issues in data management, and the challenges of conducting landline telephone polls and motivation for writing.

Jan 6, 2021 • 36min
Data Brew Season 1 Episode 5: Combining Machine Learning and MLflow with your Lakehouse
The podcast discusses how Quby leverages ML to extract value from their data lake in the energy industry. They explore using energy data to create data-driven services and the challenges of clustering algorithms. They also discuss less intrusive monitoring methods, data transformation for privacy compliance, and obtaining permission from users.

Dec 22, 2020 • 29min
Data Brew Season 1 Episode 4: BI on Data Lakes - Making it Real for Retail
In this session, we discuss the lessons learned with Lara Minor, Senior Enterprise Data Manager at Columbia Sportswear, on how her team achieved a 70% reduction in pipeline creation time. This had reduced ETL workload times from four hours with previous data warehouses to minutes enabling near real-time analytics. Her team migrated from multiple legacy data warehouses, run by individual lines of business, to a single scalable, reliable, performant data lake.See more at databricks.com/data-brew

Dec 6, 2020 • 26min
Data Brew Season 1 Episode 3: Demystifying Delta Lake
In this podcast, Michael Armbrust, the creator of Spark SQL, discusses the conception and evolution of Delta Lake, efficient querying and troubleshooting slow queries, optimizing performance and query speed, understanding partitioning and Z Order, and exciting features for data ingestion and schema handling in Delta Lake.
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.