
Super Data Science: ML & AI Podcast with Jon Krohn
The latest machine learning, A.I., and data career topics from across both academia and industry are brought to you by host Dr. Jon Krohn on the Super Data Science Podcast. As the quantity of data on our planet doubles every couple of years and with this trend set to continue for decades to come, there's an unprecedented opportunity for you to make a meaningful impact in your lifetime. In conversation with the biggest names in the data science industry, Jon cuts through hype to fuel that professional impact.Whether you're curious about getting started in a data career or you're a deep technical expert, whether you'd like to understand what A.I. is or you'd like to integrate more data-driven processes into your business, we have inspiring guests and lighthearted conversation for you to enjoy.We cover tools, techniques, and implementation tricks across data collection, databases, analytics, predictive modeling, visualization, software engineering, real-world applications, commercialization, and entrepreneurship − everything you need to crush it with data science.
Latest episodes

May 30, 2023 • 1h 21min
683: Contextual A.I. for Adapting to Adversaries, with Dr. Matar Haller
Monitoring malicious, user-generated content; contextual AI; adapting to novel evasion attempts: Matar Haller speaks to Jon Krohn about the challenges of identifying, analyzing and flagging malicious information online. In this episode, Matar explains how contextual AI and a “database of evil” can help resolve the multiple challenges of blocking dangerous content across a range of media, even those that are live-streamed.This episode is brought to you by Posit, the open-source data science company, by Anaconda, the world's most popular Python distribution, and by WithFeeling.ai, the company bringing humanity into AI. Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.In this episode you will learn:• How ActiveFence helps its customers to moderate platform content [05:36]• How ActiveFence finds extreme social media users trying to evade detection [16:32]• How to monitor live-streaming content and analyze it for dangerous material [29:13]• The technologies ActiveFence uses to run its platform [35:54]• Matar’s experience of the Insight Fellows Program (Data Science Fellowship) [40:28]• Leadership opportunities for women in STEM [1:00:41]• Israel’s R&D edge for AI [1:13:19]Additional materials: www.superdatascience.com/683

May 26, 2023 • 28min
682: Business Intelligence Tools, with Mico Yuk
In this week's episode, Mico Yuk, host of 'Analytics on Fire', joins Jon Krohn to share her effective business intelligence and analytics framework, BIDS, for persuading key decision makers. She crowns one "power" tool as the analytics king and discusses emerging tools that could challenge its dominance. Tune in for unapologetic insights on future and current BI trends and happenings from the world of BI and analytics.Additional materials: www.superdatascience.com/682Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.

May 23, 2023 • 1h 12min
681: XGBoost: The Ultimate Classifier, with Matt Harrison
Unlock the power of XGBoost by learning how to fine-tune its hyperparameters and discover its optimal modeling situations. This and more, when best-selling author and leading Python consultant Matt Harrison teams up with Jon Krohn for yet another jam-packed technical episode! Are you ready to upgrade your data science toolkit in just one hour? Tune-in now!This episode is brought to you by Pathway, the reactive data processing framework, by Posit, the open-source data science company, and by Anaconda, the world's most popular Python distribution. Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.In this episode you will learn:• Matt's book ‘Effective XGBoost’ [07:05]• What is XGBoost [09:09]• XGBoost's key model hyperparameters [19:01]• XGBoost's secret sauce [29:57]• When to use XGBoost [34:45]• When not to use XGBoost [41:42]• Matt’s recommended Python libraries [47:36]• Matt's production tips [57:57]Additional materials: www.superdatascience.com/681

May 19, 2023 • 30min
680: Automating Industrial Machines with Data Science and the Internet of Things (IoT)
Industrial machinery’s dependence on data science, tech stacks to build IoT platforms, and transitioning from data science to product: This week’s Friday episode with Allegra Alessi explores the minutiae of product ownership for the Internet of Things at packaging company Bobst. Join host Jon Krohn and his guest as they unpack how the IoT is leading factory production.Additional materials: www.superdatascience.com/680Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.

May 16, 2023 • 1h 34min
679: The A.I. and Machine Learning Landscape, with investor George Mathew
Generative AI, MLOps, and making smart investments in AI: This week’s episode is critical listening for AI investors and generative AI creators. AI investor George Mathew talks with host Jon Krohn about the emerging generative AI stack, the critical elements of MLOps to ensure a scalable model, and the tools developers can use for a saleable product.This episode is brought to you by Posit, the open-source data science company, by AWS Inferentia, and by Anaconda, the world's most popular Python distribution. Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.In this episode you will learn:• Venture capital’s role in the technology startup ecosystem [05:59]• How RLHF helps UI become more intuitive [12:53]• The four layers of the generative AI stack [34:16]• The risks for generative AI business founders and investors [46:50]• How MLOps drive best practices and help implementation [56:33]• The importance of PLG (Product Lead Growth) [1:04:15]• How generative AI tools will impact the labor market [1:17:34]Additional materials: www.superdatascience.com/679

May 12, 2023 • 12min
678: StableLM: Open-source "ChatGPT"-like LLMs you can fit on one GPU
StableLM, the new family of open-source language models from the brilliant minds behind Stable Diffusion is out! Small, but mighty, these models have been trained on an unprecedented amount of data for single GPU LLMs. This week, Jon breaks down the mechanics of this model–see you there!Additional materials: www.superdatascience.com/678Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.

May 9, 2023 • 1h 28min
677: Digital Analytics with Avinash Kaushik
Avinash Kaushik, former Sr. Director of Global Strategic Analytics at Google, talks about the transformative power of AI, 'four clusters of intent' framework, incrementality-centric marketing, maintaining a human-touch with AI, and his most significant career challenges.

May 5, 2023 • 13min
676: The Chinchilla Scaling Laws
Chinchilla AI, and fine-tuning proprietary tasks with large language models: On this week’s Five-Minute Friday, host Jon Krohn outlines the principles of the Chinchilla Scaling Laws, the incredible power of models such as Cerebras-GPT based on these laws, and the impact of scaling on the number of viable applications and commercial use cases.Additional materials: www.superdatascience.com/676Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.

May 2, 2023 • 1h 9min
675: Pandas for Data Analysis and Visualization
Wrangling data in Pandas, when to use Pandas, Matplotlib or Seaborn, and why you should learn to create Python packages: Jon Krohn speaks with guest Stefanie Molin, author of Hands-On Data Analysis with Pandas.This episode is brought to you by Posit, the open-source data science company, and by AWS Inferentia. Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.In this episode you will learn:• The advantages of using pandas over other libraries [07:55]• Why data wrangling in pandas is so helpful [12:05]• Stefanie’s Data Morph library [24:27]• When to use pandas, matplotlib, or seaborn [33:45]• Understanding the ticker module in matplotlib [36:48]• Where data analysts should start their learning journey [40:08]• What it’s like being a software engineer at Bloomberg [51:19]Additional materials: www.superdatascience.com/675

Apr 28, 2023 • 5min
674: Parameter-Efficient Fine-Tuning of LLMs using LoRA (Low-Rank Adaptation)
Models like Alpaca, Vicuña, GPT4All-J and Dolly 2.0 have relatively small model architectures, but they're prohibitively expensive to train even on a small amount of your own data. The standard model-training protocol can also lead to catastrophic forgetting. In this week's episode, Jon explores a solution to these problems, introducing listeners to Parameter-Efficient Fine-Tuning (PEFT) and the leading approach: Low-Rank Adaptation (LoRA).Additional materials: www.superdatascience.com/674Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.