Gradient Dissent: Conversations on AI cover image

Gradient Dissent: Conversations on AI

Latest episodes

undefined
May 4, 2023 • 57min

How EleutherAI Trains and Releases LLMs: Interview with Stella Biderman

On this episode, we’re joined by Stella Biderman, Executive Director at EleutherAI and Lead Scientist - Mathematician at Booz Allen Hamilton.EleutherAI is a grassroots collective that enables open-source AI research and focuses on the development and interpretability of large language models (LLMs).We discuss:- How EleutherAI got its start and where it's headed.- The similarities and differences between various LLMs.- How to decide which model to use for your desired outcome.- The benefits and challenges of reinforcement learning from human feedback.- Details around pre-training and fine-tuning LLMs.- Which types of GPUs are best when training LLMs.- What separates EleutherAI from other companies training LLMs.- Details around mechanistic interpretability.- Why understanding what and how LLMs memorize is important.- The importance of giving researchers and the public access to LLMs.Stella Biderman - https://www.linkedin.com/in/stellabiderman/EleutherAI - https://www.linkedin.com/company/eleutherai/Resources:- https://www.eleuther.ai/Thanks for listening to the Gradient Dissent podcast, brought to you by Weights & Biases. If you enjoyed this episode, please leave a review to help get the word out about the show. And be sure to subscribe so you never miss another insightful conversation.#OCR #DeepLearning #AI #Modeling #ML
undefined
Apr 20, 2023 • 52min

Scaling LLMs and Accelerating Adoption with Aidan Gomez at Cohere

On this episode, we’re joined by Aidan Gomez, Co-Founder and CEO at Cohere. Cohere develops and releases a range of innovative AI-powered tools and solutions for a variety of NLP use cases.We discuss:- What “attention” means in the context of ML.- Aidan’s role in the “Attention Is All You Need” paper.- What state-space models (SSMs) are, and how they could be an alternative to transformers. - What it means for an ML architecture to saturate compute.- Details around data constraints for when LLMs scale.- Challenges of measuring LLM performance.- How Cohere is positioned within the LLM development space.- Insights around scaling down an LLM into a more domain-specific one.- Concerns around synthetic content and AI changing public discourse.- The importance of raising money at healthy milestones for AI development.Aidan Gomez - https://www.linkedin.com/in/aidangomez/Cohere - https://www.linkedin.com/company/cohere-ai/Thanks for listening to the Gradient Dissent podcast, brought to you by Weights & Biases. If you enjoyed this episode, please leave a review to help get the word out about the show. And be sure to subscribe so you never miss another insightful conversation.Resources:- https://cohere.ai/- “Attention Is All You Need”#OCR #DeepLearning #AI #Modeling #ML
undefined
Apr 4, 2023 • 1h 2min

Neural Network Pruning and Training with Jonathan Frankle at MosaicML

Jonathan Frankle, Chief Scientist at MosaicML and Assistant Professor of Computer Science at Harvard University, joins us on this episode. With comprehensive infrastructure and software tools, MosaicML aims to help businesses train complex machine-learning models using their own proprietary data.We discuss:- Details of Jonathan’s Ph.D. dissertation which explores his “Lottery Ticket Hypothesis.”- The role of neural network pruning and how it impacts the performance of ML models.- Why transformers will be the go-to way to train NLP models for the foreseeable future.- Why the process of speeding up neural net learning is both scientific and artisanal. - What MosaicML does, and how it approaches working with clients.- The challenges for developing AGI.- Details around ML training policy and ethics.- Why data brings the magic to customized ML models.- The many use cases for companies looking to build customized AI models.Jonathan Frankle - https://www.linkedin.com/in/jfrankle/Resources:- https://mosaicml.com/- The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural NetworksThanks for listening to the Gradient Dissent podcast, brought to you by Weights & Biases. If you enjoyed this episode, please leave a review to help get the word out about the show. And be sure to subscribe so you never miss another insightful conversation.#OCR #DeepLearning #AI #Modeling #ML
undefined
Mar 3, 2023 • 55min

Shreya Shankar — Operationalizing Machine Learning

About This EpisodeShreya Shankar is a computer scientist, PhD student in databases at UC Berkeley, and co-author of "Operationalizing Machine Learning: An Interview Study", an ethnographic interview study with 18 machine learning engineers across a variety of industries on their experience deploying and maintaining ML pipelines in production.Shreya explains the high-level findings of "Operationalizing Machine Learning"; variables that indicate a successful deployment (velocity, validation, and versioning), common pain points, and a grouping of the MLOps tool stack into four layers. Shreya and Lukas also discuss examples of data challenges in production, Jupyter Notebooks, and reproducibility.Show notes (transcript and links): http://wandb.me/gd-shreya---💬 *Host:* Lukas Biewald---*Subscribe and listen to Gradient Dissent today!*👉 Apple Podcasts: http://wandb.me/apple-podcasts​​👉 Google Podcasts: http://wandb.me/google-podcasts​👉 Spotify: http://wandb.me/spotify​
undefined
Feb 2, 2023 • 1h 16min

Sarah Catanzaro — Remembering the Lessons of the Last AI Renaissance

Sarah Catanzaro is a General Partner at Amplify Partners, and one of the leading investors in AI and ML. Her investments include RunwayML, OctoML, and Gantry.Sarah and Lukas discuss lessons learned from the "AI renaissance" of the mid 2010s and compare the general perception of ML back then to now. Sarah also provides insights from her perspective as an investor, from selling into tech-forward companies vs. traditional enterprises, to the current state of MLOps/developer tools, to large language models and hype bubbles.Show notes (transcript and links): http://wandb.me/gd-sarah-catanzaro---⏳ Timestamps: 0:00 Intro1:10 Lessons learned from previous AI hype cycles11:46 Maintaining technical knowledge as an investor19:05 Selling into tech-forward companies vs. traditional enterprises25:09 Building point solutions vs. end-to-end platforms36:27 LLMS, new tooling, and commoditization44:39 Failing fast and how startups can compete with large cloud vendors52:31 The gap between research and industry, and vice versa1:00:01 Advice for ML practitioners during hype bubbles1:03:17 Sarah's thoughts on Rust and bottlenecks in deployment1:11:23 The importance of aligning technology with people1:15:58 Outro---📝 Links📍 "Operationalizing Machine Learning: An Interview Study" (Shankar et al., 2022), an interview study on deploying and maintaining ML production pipelines: https://arxiv.org/abs/2209.09125---Connect with Sarah:📍 Sarah on Twitter: https://twitter.com/sarahcat21📍 Sarah's Amplify Partners profile: https://www.amplifypartners.com/investment-team/sarah-catanzaro---💬 Host: Lukas Biewald📹 Producers: Riley Fields, Angelica Pan---Subscribe and listen to Gradient Dissent today!👉 Apple Podcasts: http://wandb.me/apple-podcasts​​👉 Google Podcasts: http://wandb.me/google-podcasts​👉 Spotify: http://wandb.me/spotify​
undefined
Jan 19, 2023 • 40min

Cristóbal Valenzuela — The Next Generation of Content Creation and AI

Cristóbal Valenzuela is co-founder and CEO of Runway ML, a startup that's building the future of AI-powered content creation tools. Runway's research areas include diffusion systems for image generation.Cris gives a demo of Runway's video editing platform. Then, he shares how his interest in combining technology with creativity led to Runway, and where he thinks the world of computation and content might be headed to next. Cris and Lukas also discuss Runway's tech stack and research.Show notes (transcript and links): http://wandb.me/gd-cristobal-valenzuela---⏳ Timestamps: 0:00 Intro1:06 How Runway uses ML to improve video editing6:04 A demo of Runway’s video editing capabilities13:36 How Cris entered the machine learning space18:55 Cris’ thoughts on the future of ML for creative use cases28:46 Runway’s tech stack32:38 Creativity, and keeping humans in the loop36:15 The potential of audio generation and new mental models40:01 Outro---🎥 Runway's AI Film Festival is accepting submissions through January 23! 🎥They are looking for art and artists that are at the forefront of AI filmmaking. Submissions should be between 1-10 minutes long, and a core component of the film should include generative content📍 https://aiff.runwayml.com/--📝 Links📍 "High-Resolution Image Synthesis with Latent Diffusion Models" (Rombach et al., 2022)", the research paper behind Stable Diffusion: https://research.runwayml.com/publications/high-resolution-image-synthesis-with-latent-diffusion-models📍 Lexman Artificial, a 100% AI-generated podcast: https://twitter.com/lexman_ai---Connect with Cris and Runway:📍 Cris on Twitter: https://twitter.com/c_valenzuelab📍 Runway on Twitter: https://twitter.com/runwayml📍 Careers at Runway: https://runwayml.com/careers/---💬 Host: Lukas Biewald📹 Producers: Riley Fields, Angelica Pan---Subscribe and listen to Gradient Dissent today!👉 Apple Podcasts: http://wandb.me/apple-podcasts​​👉 Google Podcasts: http://wandb.me/google-podcasts​👉 Spotify: http://wandb.me/spotify​
undefined
Jan 5, 2023 • 1h 13min

Jeremy Howard — The Simple but Profound Insight Behind Diffusion

Jeremy Howard is a co-founder of fast.ai, the non-profit research group behind the popular massive open online course "Practical Deep Learning for Coders", and the open source deep learning library "fastai".Jeremy is also a co-founder of #Masks4All, a global volunteer organization founded in March 2020 that advocated for the public adoption of homemade face masks in order to help slow the spread of COVID-19. His Washington Post article "Simple DIY masks could help flatten the curve." went viral in late March/early April 2020, and is associated with the U.S CDC's change in guidance a few days later to recommend wearing masks in public.In this episode, Jeremy explains how diffusion works and how individuals with limited compute budgets can engage meaningfully with large, state-of-the-art models. Then, as our first-ever repeat guest on Gradient Dissent, Jeremy revisits a previous conversation with Lukas on Python vs. Julia for machine learning.Finally, Jeremy shares his perspective on the early days of COVID-19, and what his experience as one of the earliest and most high-profile advocates for widespread mask-wearing was like.Show notes (transcript and links): http://wandb.me/gd-jeremy-howard-2---⏳ Timestamps:0:00 Intro1:06 Diffusion and generative models14:40 Engaging with large models meaningfully20:30 Jeremy's thoughts on Stable Diffusion and OpenAI26:38 Prompt engineering and large language models32:00 Revisiting Julia vs. Python40:22 Jeremy's science advocacy during early COVID days1:01:03 Researching how to improve children's education1:07:43 The importance of executive buy-in1:11:34 Outro1:12:02 Bonus: Weights & Biases---📝 Links📍 Jeremy's previous Gradient Dissent episode (8/25/2022): http://wandb.me/gd-jeremy-howard📍 "Simple DIY masks could help flatten the curve. We should all wear them in public.", Jeremy's viral Washington Post article: https://www.washingtonpost.com/outlook/2020/03/28/masks-all-coronavirus/📍 "An evidence review of face masks against COVID-19" (Howard et al., 2021), one of the first peer-reviewed papers on the effectiveness of wearing masks: https://www.pnas.org/doi/10.1073/pnas.2014564118📍 Jeremy's Twitter thread summary of "An evidence review of face masks against COVID-19": https://twitter.com/jeremyphoward/status/1348771993949151232📍 Read more about Jeremy's mask-wearing advocacy: https://www.smh.com.au/world/north-america/australian-expat-s-push-for-universal-mask-wearing-catches-fire-in-the-us-20200401-p54fu2.html---Connect with Jeremy and fast.ai:📍 Jeremy on Twitter: https://twitter.com/jeremyphoward📍 fast.ai on Twitter: https://twitter.com/FastDotAI📍 Jeremy on LinkedIn: https://www.linkedin.com/in/howardjeremy/---💬 Host: Lukas Biewald📹 Producers: Riley Fields, Angelica Pan
undefined
Dec 22, 2022 • 53min

Jerome Pesenti — Large Language Models, PyTorch, and Meta

Jerome Pesenti is the former VP of AI at Meta, a tech conglomerate that includes Facebook, WhatsApp, and Instagram, and one of the most exciting places where AI research is happening today.Jerome shares his thoughts on Transformers-based large language models, and why he's excited by the progress but skeptical of the term "AGI". Then, he discusses some of the practical applications of ML at Meta (recommender systems and moderation!) and dives into the story behind Meta's development of PyTorch. Jerome and Lukas also chat about Jerome's time at IBM Watson and in drug discovery.Show notes (transcript and links): http://wandb.me/gd-jerome-pesenti---⏳ Timestamps: 0:00 Intro0:28 Jerome's thought on large language models12:53 AI applications and challenges at Meta18:41 The story behind developing PyTorch26:40 Jerome's experience at IBM Watson28:53 Drug discovery, AI, and changing the game36:10 The potential of education and AI40:10 Meta and AR/VR interfaces43:43 Why NVIDIA is such a powerhouse47:08 Jerome's advice to people starting their careers48:50 Going back to coding, the challenges of scaling52:11 Outro---Connect with Jerome:📍 Jerome on Twitter: https://twitter.com/an_open_mind📍 Jerome on LinkedIn: https://www.linkedin.com/in/jpesenti/---💬 Host: Lukas Biewald📹 Producers: Riley Fields, Angelica Pan, Lavanya Shukla---Subscribe and listen to our podcast today!👉 Apple Podcasts: http://wandb.me/apple-podcasts​​👉 Google Podcasts: http://wandb.me/google-podcasts​👉 Spotify: http://wandb.me/spotify​
undefined
Dec 1, 2022 • 1h

D. Sculley — Technical Debt, Trade-offs, and Kaggle

D. Sculley is CEO of Kaggle, the beloved and well-known data science and machine learning community.D. discusses his influential 2015 paper "Machine Learning: The High Interest Credit Card of Technical Debt" and what the current challenges of deploying models in the real world are now, in 2022. Then, D. and Lukas chat about why Kaggle is like a rain forest, and about Kaggle's historic, current, and potential future roles in the broader machine learning community.Show notes (transcript and links): http://wandb.me/gd-d-sculley---⏳ Timestamps: 0:00 Intro1:02 Machine learning and technical debt11:18 MLOps, increased stakes, and realistic expectations19:12 Evaluating models methodically25:32 Kaggle's role in the ML world33:34 Kaggle competitions, datasets, and notebooks38:49 Why Kaggle is like a rain forest44:25 Possible future directions for Kaggle46:50 Healthy competitions and self-growth48:44 Kaggle's relevance in a compute-heavy future53:49 AutoML vs. human judgment56:06 After a model goes into production1:00:00 Outro---Connect with D. and Kaggle:📍 D. on LinkedIn: https://www.linkedin.com/in/d-sculley-90467310/📍 Kaggle on Twitter: https://twitter.com/kaggle---Links:📍 "Machine Learning: The High Interest Credit Card of Technical Debt" (Sculley et al. 2014): https://research.google/pubs/pub43146/---💬 Host: Lukas Biewald📹 Producers: Riley Fields, Angelica Pan, Anish Shah, Lavanya Shukla---Subscribe and listen to our podcast today!👉 Apple Podcasts: http://wandb.me/apple-podcasts​​👉 Google Podcasts: http://wandb.me/google-podcasts​👉 Spotify: http://wandb.me/spotify​
undefined
Nov 15, 2022 • 1h 10min

Emad Mostaque — Stable Diffusion, Stability AI, and What’s Next

Emad Mostaque is CEO and co-founder of Stability AI, a startup and network of decentralized developer communities building open AI tools. Stability AI is the company behind Stable Diffusion, the well-known, open source, text-to-image generation model.Emad shares the story and mission behind Stability AI (unlocking humanity's potential with open AI technology), and explains how Stability's role as a community catalyst and compute provider might evolve as the company grows. Then, Emad and Lukas discuss what the future might hold in store: big models vs "optimal" models, better datasets, and more decentralization.-🎶 Special note: This week’s theme music was composed by Weights & Biases’ own Justin Tenuto with help from Harmonai’s Dance Diffusion.-Show notes (transcript and links): http://wandb.me/gd-emad-mostaque-⏳ Timestamps:00:00 Intro00:42 How AI fits into the safety/security industry09:33 Event matching and object detection14:47 Running models on the right hardware17:46 Scaling model evaluation23:58 Monitoring and evaluation challenges26:30 Identifying and sorting issues30:27 Bridging vision and language domains39:25 Challenges and promises of natural language technology41:35 Production environment43:15 Using synthetic data49:59 Working with startups53:55 Multi-task learning, meta-learning, and user experience56:44 Optimization and testing across multiple platforms59:36 Outro-Connect with Jehan and Motorola Solutions:📍 Jehan on LinkedIn: https://www.linkedin.com/in/jehanw/📍 Jehan on Twitter: https://twitter.com/jehan/📍 Motorola Solutions on Twitter: https://twitter.com/MotoSolutions/📍 Careers at Motorola Solutions: https://www.motorolasolutions.com/en_us/about/careers.html-💬 Host: Lukas Biewald📹 Producers: Riley Fields, Angelica Pan, Lavanya Shukla, Anish Shah-Subscribe and listen to our podcast today!👉 Apple Podcasts: http://wandb.me/apple-podcasts​​👉 Google Podcasts: http://wandb.me/google-podcasts​👉 Spotify: http://wandb.me/spotify​

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode