
Super Data Science: ML & AI Podcast with Jon Krohn
The latest machine learning, A.I., and data career topics from across both academia and industry are brought to you by host Dr. Jon Krohn on the Super Data Science Podcast. As the quantity of data on our planet doubles every couple of years and with this trend set to continue for decades to come, there's an unprecedented opportunity for you to make a meaningful impact in your lifetime. In conversation with the biggest names in the data science industry, Jon cuts through hype to fuel that professional impact.Whether you're curious about getting started in a data career or you're a deep technical expert, whether you'd like to understand what A.I. is or you'd like to integrate more data-driven processes into your business, we have inspiring guests and lighthearted conversation for you to enjoy.We cover tools, techniques, and implementation tricks across data collection, databases, analytics, predictive modeling, visualization, software engineering, real-world applications, commercialization, and entrepreneurship − everything you need to crush it with data science.
Latest episodes

Jul 19, 2024 • 24min
802: In Case You Missed It in June 2024
How to grab investor interest with your AI startup idea, revisiting algorithms, and helping practitioners ensure AI safety with regulatory frameworks and beyond: This month, you missed a whole bunch of great interviews. But don’t worry, Jon Krohn is here to recap all the best bits for you!Additional materials: www.superdatascience.com/802Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information.

Jul 16, 2024 • 1h 17min
801: Merged LLMs Are Smaller And More Capable, with Arcee AI's Mark McQuade and Charles Goddard
Merged LLMs are the future, and we’re exploring how with Mark McQuade and Charles Goddard from Arcee AI on this episode with Jon Krohn. Learn how to combine multiple LLMs without adding bulk, train more efficiently, and dive into different expert approaches. Discover how smaller models can outperform larger ones and leverage open-source projects for big enterprise wins. This episode is packed with must-know insights for data scientists and ML engineers. Don’t miss out!Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information.In this episode you will learn:• Explanation of Charles' job title: Chief of Frontier Research [03:31]• Model Merging Technology combining multiple LLMs without increasing size [04:43]• Using MergeKit for model merging [14:49]• Evolutionary Model Merging using evolutionary algorithms [22:55]• Commercial applications and success stories [28:10]• Comparison of Mixture of Experts (MoE) vs. Mixture of Agents [37:57]• Spectrum Project for efficient training by targeting specific modules [54:28]• Future of Small Language Models (SLMs) and their advantages [01:01:22]Additional materials: www.superdatascience.com/801

Jul 12, 2024 • 44min
800: A Transformative Century of Technological Progress, with Annie P.
The SuperDataScience Podcast is celebrating its 800th episode! Host Jon Krohn speaks to his grandmother, Annie, about growing up at a time when so many technologies we take for granted today were yet to be developed. Listen in to hear Annie’s experience of the changes in technology across 94 years and how she and her family fared in 1940s Ukraine with no electricity or running water.Additional materials: www.superdatascience.com/800

Jul 9, 2024 • 1h 46min
799: AGI Could Be Near: Dystopian and Utopian Implications, with Dr. Andrey Kurenkov
No-code games with GenAI, the creative possibilities of LLMs, and our proximity to AGI: In this episode, Jon Krohn talks to Andrey Kurenkov about what turned him from an AGI skeptic to a positivist. You’ll also hear about his wildly popular podcast “Last Week in AI” and how the NVIDIA-backed startup Astrocade is helping videogame enthusiasts to create their own games through generative AI. A must-listen!This episode is brought to you by AWS Inferentia and AWS Trainium. Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information.In this episode you will learn:• All about The Gradient and Last Week in AI [10:42]• All about Astrocade and Andrey’s role at the startup [24:35]• Balancing UX and creative control at Astrocade [42:00]• The creative possibilities of LLMs [1:04:15]• The rapid emergence of AGI [1:10:31]Additional materials: www.superdatascience.com/799

Jul 5, 2024 • 15min
798: Claude 3.5 Sonnet: Frontier Capabilities & Slick New "Artifacts" UI
Claude 3.5 Sonnet, Anthropic’s newest model, is making waves in the AI community. This mid-size model outshines the larger Claude 3 Opus in tasks like code generation, content creation, and document summarization, and it’s twice as fast. In this episode of The Super Data Science Podcast, Jon Krohn discusses its top-notch performance across benchmarks like MMLU, GPQA, and HumanEval, along with its improved machine vision capabilities. Plus, learn about the new Artifacts UI feature, which makes managing generated content easier by displaying outputs side-by-side with inputs. Tune in to find out why Claude 3.5 Sonnet is setting new standards in AI.Additional materials: www.superdatascience.com/798Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information.

Jul 2, 2024 • 1h 10min
797: Deep Learning Classics and Trends, with Dr. Rosanne Liu
Dr. Rosanne Liu, Research Scientist at Google DeepMind and co-founder of the ML Collective, shares her journey and the mission to democratize AI research. She explains her pioneering work on intrinsic dimensions in deep learning and the advantages of curiosity-driven research. Jon and Dr. Liu also explore the complexities of understanding powerful AI models, the specifics of character-aware text encoding, and the significant impact of diversity, equity, and inclusion in the ML community. With publications in NeurIPS, ICLR, ICML, and Science, Dr. Liu offers her expertise and vision for the future of machine learning.Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information.In this episode you will learn:• How the ML Collective came about [03:31]• The concept of a failure CV [16:12]• ML Collective research topics [19:03]• How Dr. Liu's work on the “intrinsic dimension” of deep learning models inspired the now-standard LoRA approach to fine-tuning LLMs [21:28]• The pros and cons of curiosity-driven vs. goal-driven ML research [29:08]• Discussion on Dr. Liu's research and papers [33:17]• Character-aware vs. character-blind text encoding [54:59]• The positive impacts of diversity, equity, and inclusion in the ML community [57:51]Additional materials: www.superdatascience.com/797

Jun 28, 2024 • 43min
796: Earth's Coming Population Collapse and How AI Can Help, with Simon Kuestenmacher
Want to feel optimistic about your day? In this Friday episode, Simon Kuestenmacher talks to Jon Krohn about demography: What it is, why it’s so important, and why its forecasts should give us reason to hope for a better future. In an increasingly globalized world, and with an aging population in countries with the biggest GDPs, demography is more valuable than ever.Additional materials: www.superdatascience.com/796Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information.

Jun 25, 2024 • 1h 8min
795: Fast-Evolving Data and AI Regulatory Frameworks, with Dr. Gina Guillaume-Joseph
Gina Guillaume-Joseph talks to Jon Krohn about the data and regulatory frameworks set to transform the AI industry and why that’s important to anyone working with data. This episode offers a solid path to understanding AI regulation’s past, present and future. Gina walks listeners through the AI Bill of Rights, the NIST AI Risk Framework and the MITRE ATLAS threat model.This episode is brought to you by AWS Inferentia and AWS Trainium, by Crawlbase, the ultimate data crawling platform, and by Babbel, the science-backed language-learning platform. Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information.In this episode you will learn:• What “responsible AI” means [08:14]• Why the federal government should be behind AI regulation [12:22]• The US vs EU on AI regulation [18:46]• About the AI Bill of Rights [26:14]• About MITRE and the MITRE Atlas [37:19]• What a systems engineer does [54:11]Additional materials: www.superdatascience.com/795

Jun 21, 2024 • 11min
794: Exciting (and Frightening!) Trends in Open-Source AI
Trends in open-source AI: Join Jon Krohn and a panel of data science icons as they discuss the most exciting and concerning developments in open-source AI. Hear insights from Drew Conway, Jared Lander, Emily Zabor, and JD Long on the transformative potential of AI and its future impact.Additional materials: www.superdatascience.com/794Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information.

Jun 18, 2024 • 1h 33min
793: Bayesian Methods and Applications, with Alexandre Andorra
Bayesian methods take the spotlight in this episode with Alex Andorra, co-founder of PyMC Labs, and Jon Krohn. Learn how Bayesian techniques handle tough problems, make the most of prior knowledge, and work wonders with limited data. Alex and Jon break down essentials like PyMC, PyStan, and NumPyro libraries, show how to boost model efficiency with PyTensor, and talk about using ArviZ for top-notch diagnostics and visualizations. Plus, get into advanced modeling with Gaussian Processes.This episode is brought to you by Crawlbase, the ultimate data crawling platform. Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information.In this episode you will learn:• Practical introduction to Bayesian statistics [04:54]• Definition and significance of epistemology [17:52]• Explanation of PyMC and Monte Carlo methods [27:57]• How to get started with Bayesian modeling and PyMC [34:26]• PyMC Labs and its consulting services [50:50]• ArviZ for post-modeling diagnostics and visualization [01:02:23]• Gaussian processes and their applications [01:09:02]Additional materials: www.superdatascience.com/793