
Super Data Science: ML & AI Podcast with Jon Krohn
The latest machine learning, A.I., and data career topics from across both academia and industry are brought to you by host Dr. Jon Krohn on the Super Data Science Podcast. As the quantity of data on our planet doubles every couple of years and with this trend set to continue for decades to come, there's an unprecedented opportunity for you to make a meaningful impact in your lifetime. In conversation with the biggest names in the data science industry, Jon cuts through hype to fuel that professional impact.Whether you're curious about getting started in a data career or you're a deep technical expert, whether you'd like to understand what A.I. is or you'd like to integrate more data-driven processes into your business, we have inspiring guests and lighthearted conversation for you to enjoy.We cover tools, techniques, and implementation tricks across data collection, databases, analytics, predictive modeling, visualization, software engineering, real-world applications, commercialization, and entrepreneurship − everything you need to crush it with data science.
Latest episodes

Jun 30, 2023 • 8min
692: Lossless LLM Weight Compression: Run Huge Models on a Single GPU
Join Jon as he navigates listeners through the innovative SpQR approach—a cutting-edge, lossless LLM weight compression technique that harnesses the power of quantization. Tune in as Jon delves into the four steps behind this groundbreaking method in this week's episode.Additional materials: www.superdatascience.com/692Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.

Jun 27, 2023 • 1h 35min
691: A.I. Accelerators: Hardware Specialized for Deep Learning
GPUs vs CPUs, chip design and the importance of chips in AI research: This highly technical episode is for anyone who wants to learn what goes into chip development and how to get into the competitive industry of accelerator design. With advice from expert guest Ron Diamant, Senior Principal Engineer at AWS, you’ll get a breakdown of the need-to-know technical terms, what chip engineers need to think about during the design phase and what the future holds for processing hardware.This episode is brought to you by Posit, the open-source data science company, by the AWS Insiders Podcast, and by WithFeeling.ai, the company bringing humanity into AI. Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.In this episode you will learn:• What CPUs and GPUs are [05:29]• The differences between accelerators used for deep learning [14:31]• Trainium and Inferentia: AWS's A.I. Accelerators [22:10]• If model optimizations will lead to lower demand for hardware to process them [43:14]• How a chip designer goes about production [48:34]• Breaking down the technical terminology for chips (accelerator interconnect, dynamic execution, collective communications) [55:29]• The importance of AWS Neuron, a software development kit [1:15:42]• How Ron got his foot in the door with chip design [1:26:40]Additional materials: www.superdatascience.com/691

Jun 23, 2023 • 26min
690: How to Catch and Fix Harmful Generative A.I. Outputs
Krishna Gade, the founder and CEO of Fiddler.AI, discusses the challenges faced by Large Language Models (LLMs) in Generative AI, including inaccuracies, biases, and privacy risks. He emphasizes the importance of monitoring to build trust in AI and highlights Fiddler's explainability algorithms and pre-built bias detection tools as vital solutions.Additional materials: www.superdatascience.com/690Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.

Jun 20, 2023 • 1h 18min
689: Observing LLMs in Production to Automatically Catch Issues
Arize's Amber Roberts and Xander Song join Jon Krohn this week, sharing invaluable insights into ML Observability, drift detection, retraining strategies, and the crucial task of ensuring fairness and ethical considerations in AI development.This episode is brought to you by Posit, the open-source data science company, by AWS Inferentia, and by Anaconda, the world's most popular Python distribution. Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.In this episode you will learn:• What is ML Observability [05:07]• What is Drift [08:18]• The different kinds of model drift [15:31]• How frequently production models should be retrained? [25:15]• Arize's open-source product, Phoenix [30:49]• How ML Observability relates to discovering model biases [50:30]• Arize case studies [57:13]• What is a developer advocate [1:04:51]Additional materials: www.superdatascience.com/689

Jun 16, 2023 • 14min
688: Six Reasons Why Building LLM Products Is Tricky
Prompt injection, prompt engineering, context windows, and more: In this week’s Five-Minute Friday, Jon explains why anyone looking to build their own product leveraging LLMs should stop to consider these and three more issues before jumping in. Phillip Carter first outlined these six issues in his article “All the Hard Stuff Nobody Talks About when Building Products with LLMs”.Additional materials: www.superdatascience.com/688Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.

Jun 13, 2023 • 1h 47min
687: Generative Deep Learning, with David Foster
Autoencoders, transformers, latent space: Learn the elements of generative AI and hear what data scientist David Foster has to say about the potential for generative AI in music, as well as the role that world models play in blending generative AI with reinforcement learning.This episode is brought to you by Posit, the open-source data science company, by Anaconda, the world's most popular Python distribution, and by WithFeeling.ai, the company bringing humanity into AI. Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.In this episode you will learn:• Generative modeling vs discriminative modeling [04:21]• Generative AI for Music [13:12]• On the threats of AI [23:15]• Autoencoders Explained [38:36]• Noise in Generative AI [48:11]• What CLIP models are (Contrastive Language-Image Pre-training) [54:07]• What World Models are [1:00:40]• What a Transformer is [1:11:14]• How to use transformers for music generation [1:19:50]Additional materials: www.superdatascience.com/687

Jun 9, 2023 • 30min
686: Open-Source "Responsible A.I." Tools, with Ruth Yakubu
Mircosoft’s Ruth Yakubu joins Jon Krohn to discuss Responsible AI principles and the open-source Responsible AI Toolbox, allowing users to assess their models for fairness, inclusiveness, privacy, explainability, accountability, and reliability before deployment.Additional materials: www.superdatascience.com/686Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.

Jun 6, 2023 • 1h 6min
685: Tools for Building Real-Time Machine Learning Applications, with Richmond Alake
Richmond Alake, a Machine Learning Architect at Slalom Build, sits down with Jon to share real-time ML insights, tools and career experiences for a high-energy and high impact episode. From his work at Slalom Build to his two AI startups, discover the software choices, ML tools, and front-end development techniques used by a leader in the field.This episode is brought to you by Posit, the open-source data science company, by AWS Inferentia, and by WithFeeling.ai, the company bringing humanity into AI. Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.In this episode you will learn:• What is a Machine Learning Architect? [03:09]• Richmond's startups [12:07]• Why Richmond started a podcast [29:51]• Richmond's new course on feature stores [38:05]• Why Richmond produces data science content [43:25]• Why All Data Scientists Should Write [51:30]Additional materials: www.superdatascience.com/685

Jun 2, 2023 • 6min
684: Get More Language Context out of your LLM
Open-source LLMs, FlashAttention and generative AI terminology: Host Jon Krohn gives us the lift we need to explore the next big steps in generative AI. Listen to the specific way in which Stanford University’s “exact attention” algorithm, FlashAttention, could become a competitor for GPT-4’s capabilities.Additional materials: www.superdatascience.com/684Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.

May 30, 2023 • 1h 21min
683: Contextual A.I. for Adapting to Adversaries, with Dr. Matar Haller
Monitoring malicious, user-generated content; contextual AI; adapting to novel evasion attempts: Matar Haller speaks to Jon Krohn about the challenges of identifying, analyzing and flagging malicious information online. In this episode, Matar explains how contextual AI and a “database of evil” can help resolve the multiple challenges of blocking dangerous content across a range of media, even those that are live-streamed.This episode is brought to you by Posit, the open-source data science company, by Anaconda, the world's most popular Python distribution, and by WithFeeling.ai, the company bringing humanity into AI. Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.In this episode you will learn:• How ActiveFence helps its customers to moderate platform content [05:36]• How ActiveFence finds extreme social media users trying to evade detection [16:32]• How to monitor live-streaming content and analyze it for dangerous material [29:13]• The technologies ActiveFence uses to run its platform [35:54]• Matar’s experience of the Insight Fellows Program (Data Science Fellowship) [40:28]• Leadership opportunities for women in STEM [1:00:41]• Israel’s R&D edge for AI [1:13:19]Additional materials: www.superdatascience.com/683