Data Brew by Databricks cover image

Data Brew by Databricks

Latest episodes

undefined
Mar 27, 2025 • 41min

Age of Agents | Data Brew | Episode 41

Michele Catasta, President of Replit and an advocate for accessible coding, discusses the revolution AI-driven agents are bringing to software development. He explains the key differences between AI agents and traditional copilots like GitHub Copilot. Michele highlights how these innovations empower non-programmers to build applications and the challenges of ensuring software quality. He shares insights on the potential for AI to reshape entrepreneurship and hints at 2025 being a pivotal year for AI agents in the industry.
undefined
12 snips
Mar 20, 2025 • 40min

Reward Models | Data Brew | Episode 40

Brandon Cui, a Research Scientist at MosaicML and Databricks, specializes in AI model optimization and leads RLHF efforts. In this discussion, he unveils how synthetic data and RLHF can fine-tune models for better outcomes. He explores techniques like Policy Proximal Optimization and Direct Preference Optimization that enhance model responses. Brandon also emphasizes the critical role of reward models in boosting performance in coding, math, and reasoning tasks, while highlighting the necessity of human oversight in AI training.
undefined
10 snips
Feb 20, 2025 • 45min

Retrieval, rerankers, and RAG tips and tricks | Data Brew | Episode 39

Andrew Drozdov, a research scientist at Databricks specializing in Retrieval Augmented Generation (RAG), dives deep into enhancing AI models. He discusses overcoming LLM limitations by integrating relevant external information and optimizing document chunking and query generation. The conversation also highlights the significance of embeddings and fine-tuning techniques for retrieval systems. Additionally, Andrew shares insights on improving search results with re-ranking strategies and the application of RAG methods in enterprise AI for better domain-specific outcomes.
undefined
Feb 4, 2025 • 42min

The Power of Synthetic Data | Data Brew | Episode 38

In this engaging discussion, Yev Meyer, Chief Scientist at Gretel AI with a background in computational neuroscience, dives into the transformative power of synthetic data in AI and ML. He explains how synthetic data can enhance model training, improve data access, and uphold privacy standards. The conversation also touches on ethical considerations, the challenges of data licensing, and the role of differential privacy in protecting personal information. Yev predicts a future where synthetic data reshapes model learning, paving the way for innovative applications.
undefined
Jan 22, 2025 • 37min

Secret to Production AI: Tools & Infrastructure | Data Brew | Episode 37

Julia Neagu, CEO and co-founder of Quotient AI and former leader of the GitHub Copilot data team, dives into the intricacies of deploying Generative AI. She shares insights on merging reinforcement learning with unsupervised methods for real-time optimization. The discussion touches on reducing biases in machine learning and implementing fairness in AI systems. Julia also highlights the importance of human-in-the-loop evaluations and effective AI pipeline management, emphasizing lessons learned from large-scale deployments.
undefined
7 snips
Jan 10, 2025 • 41min

Mixture of Memory Experts (MoME) | Data Brew | Episode 36

Sharon Zhou, Co-founder and CEO of Lamini AI, specializes in optimizing AI models for better performance. She shares insights on the intriguing blend of determinism and probabilism in managing unstructured data. Zhou discusses proprietary techniques, including memory tuning to counteract model inaccuracies. The conversation also touches on lessons learned from AI deployment, specifically drawing from experiences like those of GitHub Copilot. Expect a mix of technical wisdom and personal anecdotes about entrepreneurship and societal pressures in the tech world.
undefined
Nov 21, 2024 • 39min

Mixed Attention & LLM Context | Data Brew | Episode 35

Shashank Rajput, a Research Scientist specializing in large language models at Mosaic and Databricks, dives into innovative techniques like Retrieval Augmented Generation (RAG) to boost LLM efficiency. He discusses how RAG improves LLM accuracy using external documents. The conversation covers the evolution of attention mechanisms, particularly mixed strategies. They also explore the Mamba architecture, showcasing its speed and memory management compared to traditional transformers, highlighting practical applications and efficiency trade-offs.
undefined
Oct 14, 2024 • 43min

Kumo AI & Relational Deep Learning | Data Brew | Episode 34

In this episode, Jure Leskovec, Co-founder of Kumo AI and Professor of Computer Science at Stanford University, discusses Relational Deep Learning (RDL) and its role in automating feature engineering. Highlights include:- How RDL enhances predictive modeling.- Applications in fraud detection and recommendation systems.- The use of graph neural networks to simplify complex data structures.
undefined
Jul 21, 2023 • 39min

LLMs: Internals, Hallucinations, and Applications | Data Brew | Episode 33

Databricks experts discuss best practices for prompt engineering with large language models, risks and mitigation strategies for hallucinations, criteria for choosing and fine-tuning language models, and strategies for keeping up with LLMs in the rapidly evolving field.
undefined
Jun 29, 2023 • 33min

Demonstrate–Search–Predict Framework | Data Brew | Episode 32

We will dive into LLMs for our fifth season, from understanding the internals to the risks of using them and everything in between.  While we’re at it, we’ll be enjoying our morning brew.In this session, we interviewed Omar Khattab - Computer Science Ph.D. Student at Stanford, creator of DSP (Demonstrate–Search–Predict Framework), to discuss DSP, common applications, and the future of NLP.

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode