
Data Brew by Databricks
Welcome to Data Brew by Databricks with Denny and Brooke! In this series, we explore various topics in the data and AI community and interview subject matter experts in data engineering/data science. So join us with your morning brew in hand and get ready to dive deep into data + AI! For this first season, we will be focusing on lakehouses – combining the key features of data warehouses, such as ACID transactions, with the scalability of data lakes, directly against low-cost object stores.
Latest episodes

Nov 21, 2024 • 39min
Mixed Attention & LLM Context | Data Brew | Episode 35
Shashank Rajput, a Research Scientist specializing in large language models at Mosaic and Databricks, dives into innovative techniques like Retrieval Augmented Generation (RAG) to boost LLM efficiency. He discusses how RAG improves LLM accuracy using external documents. The conversation covers the evolution of attention mechanisms, particularly mixed strategies. They also explore the Mamba architecture, showcasing its speed and memory management compared to traditional transformers, highlighting practical applications and efficiency trade-offs.

Oct 14, 2024 • 43min
Kumo AI & Relational Deep Learning | Data Brew | Episode 34
In this episode, Jure Leskovec, Co-founder of Kumo AI and Professor of Computer Science at Stanford University, discusses Relational Deep Learning (RDL) and its role in automating feature engineering. Highlights include:- How RDL enhances predictive modeling.- Applications in fraud detection and recommendation systems.- The use of graph neural networks to simplify complex data structures.

Jul 21, 2023 • 39min
LLMs: Internals, Hallucinations, and Applications | Data Brew | Episode 33
Databricks experts discuss best practices for prompt engineering with large language models, risks and mitigation strategies for hallucinations, criteria for choosing and fine-tuning language models, and strategies for keeping up with LLMs in the rapidly evolving field.

Jun 29, 2023 • 33min
Demonstrate–Search–Predict Framework | Data Brew | Episode 32
We will dive into LLMs for our fifth season, from understanding the internals to the risks of using them and everything in between. While we’re at it, we’ll be enjoying our morning brew.In this session, we interviewed Omar Khattab - Computer Science Ph.D. Student at Stanford, creator of DSP (Demonstrate–Search–Predict Framework), to discuss DSP, common applications, and the future of NLP.

Jun 8, 2023 • 35min
Generative AI Risks | Data Brew | Episode 31
We will dive into LLMs for our fifth season, from understanding the internals to the risks of using them and everything in between. While we’re at it, we’ll be enjoying our morning brew.In this session, we interviewed Yaron Singer, CEO of Robust Intelligence, Professor of Computer Science at Harvard University, and guest of Data Brew Season 3 (our first repeat guest!). In this session, we discuss generative AI, the trends toward embracing LLMs, and how the surface area for vulnerabilities in generative AI is much bigger.

Jun 1, 2023 • 43min
John Snow Labs & SparkNLP | Data Brew | Episode 30
We are back and we will dive into LLMs from understanding the internals to the risks of using them and everything in between. While we’re at it, we’ll be enjoying our morning brew.In this session, we interviewed David Talby who is the CTO at John Snow Labs; they help healthcare & life science companies put AI to good use. David's interests include natural language processing, applied artificial intelligence in healthcare, and responsible AI.

Jun 9, 2022 • 36min
Data Brew Season 4 Episode 6: Professional Athletes
For our fourth season, we focus on connected health and how data & AI augment and improve our daily health. While we’re at it, we’ll be enjoying our morning brew.Shayna Powless and Eli Ankou, professional cyclist for L39ion of Los Angeles and defensive tackle for the Buffalo Bills, respectively, provide valuable insight on how professional athletes leverage data to improve their performance and how they combine their passion for sports with the Dreamcatcher Foundation.See more at databricks.com/data-brew

May 5, 2022 • 35min
Data Brew Season 4 Episode 5: Public Health: Education, Access, and Policy
For our fourth season, we focus on connected health and how data & AI augment and improve our daily health. While we’re at it, we’ll be enjoying our morning brew.Matt Willis, Marin County Public Health Officer, shares the three pillars of public health: education, access, and policy, and the critical role data plays in addressing the COVID-19 pandemic & opioid epidemic. See more at databricks.com/data-brew

Apr 14, 2022 • 36min
Data Brew Season 4 Episode 4: 1283 Days of Running (and Counting)
For our fourth season, we focus on connected health and how data & AI augment and improve our daily health. While we’re at it, we’ll be enjoying our morning brew.Running the length of the US every year, Alexandra Matthiesen shares her motivational secrets for running 1,283 consecutive days (and counting!) and redefining physical and mental limits. See more at databricks.com/data-brew

Mar 31, 2022 • 41min
Data Brew Season 4 Episode 3: Last Man Standing
For our fourth season, we focus on connected health and how data & AI augment and improve our daily health. While we’re at it, we’ll be enjoying our morning brew.Winner of the infamous Last Man Standing race (running 246 miles in 59 hours), Guillaume merges the world of competitive long-distance running with data science to push the boundaries of body and mind. See more at databricks.com/data-brew