Data Brew by Databricks

Databricks

Welcome to Data Brew by Databricks with Denny and Brooke! In this series, we explore various topics in the data and AI community and interview subject matter experts in data engineering/data science. So join us with your morning brew in hand and get ready to dive deep into data + AI! For this first season, we will be focusing on lakehouses – combining the key features of data warehouses, such as ACID transactions, with the scalability of data lakes, directly against low-cost object stores.

Episodes

Mentioned books

Aug 5, 2025 • 40min

Reinforcement Fine-Tuning and the Future of Specialized AI Models

Travis Addair, CTO and Co-Founder of Predibase, shares insights about the groundbreaking reinforcement fine-tuning platform he helped create. He discusses how this technology revolutionizes model customization, making it easier for businesses to build AI solutions with minimal labeled data. The conversation dives into the importance of human feedback for continuous improvement, the challenges of reward function design, and advancements in AI model optimization. Addair emphasizes how this innovation democratizes access to advanced AI, making it more accessible for everyone.

Apr 24, 2025 • 32min

Benchmarking Domain Intelligence | Data Brew | Episode 45

In this episode, Pallavi Koppol, Research Scientist at Databricks, explores the importance of domain-specific intelligence in large language models (LLMs). She discusses how enterprises need models tailored to their unique jargon, data, and tasks rather than relying solely on general benchmarks.Highlights include:- Why benchmarking LLMs for domain-specific tasks is critical for enterprise AI.- An introduction to the Databricks Intelligence Benchmarking Suite (DIBS).- Evaluating models on real-world applications like RAG, text-to-JSON, and function calling.- The evolving landscape of open-source vs. closed-source LLMs.- How industry and academia can collaborate to improve AI benchmarking.

Apr 17, 2025 • 36min

SWE-bench & SWE-agent | Data Brew | Episode 44

In this episode, Kilian Lieret, Research Software Engineer, and Carlos Jimenez, Computer Science PhD Candidate at Princeton University, discuss SWE-bench and SWE-agent, two groundbreaking tools for evaluating and enhancing AI in software engineering.Highlights include:- SWE-bench: A benchmark for assessing AI models on real-world coding tasks.- Addressing data leakage concerns in GitHub-sourced benchmarks.- SWE-agent: An AI-driven system for navigating and solving coding challenges.- Overcoming agent limitations, such as getting stuck in loops.- The future of AI-powered code reviews and automation in software engineering.

Apr 10, 2025 • 38min

Enterprise AI: Research to Product | Data Brew | Episode 43

Dipendra Kumar, a Staff Research Scientist at Databricks, focuses on AI application in enterprises, while Alnur Ali, a Staff Software Engineer, tackles the engineering challenges of deploying AI. They dive into the struggles of messy data, security, and scalability in enterprise AI. The duo discusses how QuickFix improves coding assistance through user feedback. They emphasize the collaboration between research and engineering and explore how generative AI is reshaping programming, highlighting the need for human oversight to enhance productivity.

Apr 7, 2025 • 42min

Multimodal AI | Data Brew | Episode 42

Chang She, CEO and Co-founder of LanceDB, shares his incredible journey from helping create the Pandas Library to developing a cutting-edge database for multimodal AI. He dives into the limitations of traditional data storage systems and explains how LanceDB efficiently handles diverse data types like images and videos. The conversation highlights the growing significance of multimodal AI in enterprises, future trends, and even offers spicy predictions for the AI landscape in 2025. Get ready for some intriguing insights!

Mar 27, 2025 • 41min

Age of Agents | Data Brew | Episode 41

Michele Catasta, President of Replit and an advocate for accessible coding, discusses the revolution AI-driven agents are bringing to software development. He explains the key differences between AI agents and traditional copilots like GitHub Copilot. Michele highlights how these innovations empower non-programmers to build applications and the challenges of ensuring software quality. He shares insights on the potential for AI to reshape entrepreneurship and hints at 2025 being a pivotal year for AI agents in the industry.

Mar 20, 2025 • 40min

Reward Models | Data Brew | Episode 40

Brandon Cui, a Research Scientist at MosaicML and Databricks, specializes in AI model optimization and leads RLHF efforts. In this discussion, he unveils how synthetic data and RLHF can fine-tune models for better outcomes. He explores techniques like Policy Proximal Optimization and Direct Preference Optimization that enhance model responses. Brandon also emphasizes the critical role of reward models in boosting performance in coding, math, and reasoning tasks, while highlighting the necessity of human oversight in AI training.

Feb 20, 2025 • 45min

Retrieval, rerankers, and RAG tips and tricks | Data Brew | Episode 39

Andrew Drozdov, a research scientist at Databricks specializing in Retrieval Augmented Generation (RAG), dives deep into enhancing AI models. He discusses overcoming LLM limitations by integrating relevant external information and optimizing document chunking and query generation. The conversation also highlights the significance of embeddings and fine-tuning techniques for retrieval systems. Additionally, Andrew shares insights on improving search results with re-ranking strategies and the application of RAG methods in enterprise AI for better domain-specific outcomes.

Feb 4, 2025 • 42min

The Power of Synthetic Data | Data Brew | Episode 38

In this engaging discussion, Yev Meyer, Chief Scientist at Gretel AI with a background in computational neuroscience, dives into the transformative power of synthetic data in AI and ML. He explains how synthetic data can enhance model training, improve data access, and uphold privacy standards. The conversation also touches on ethical considerations, the challenges of data licensing, and the role of differential privacy in protecting personal information. Yev predicts a future where synthetic data reshapes model learning, paving the way for innovative applications.

Jan 22, 2025 • 37min

Secret to Production AI: Tools & Infrastructure | Data Brew | Episode 37

Julia Neagu, CEO and co-founder of Quotient AI and former leader of the GitHub Copilot data team, dives into the intricacies of deploying Generative AI. She shares insights on merging reinforcement learning with unsupervised methods for real-time optimization. The discussion touches on reducing biases in machine learning and implementing fairness in AI systems. Julia also highlights the importance of human-in-the-loop evaluations and effective AI pipeline management, emphasizing lessons learned from large-scale deployments.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

App store banner

Play store banner