AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
The Importance of Data-Centric AI
This chapter explores the concept of data-centric AI and the significance of data quality in machine learning, highlighting the role of data as a starting point for solving ML problems and the challenges faced in addressing noisy data.
MLOps Coffee Sessions #106 with Curtis Northcutt, CEO & Co-Founder of Cleanlab, Cleanlab: Labeled Datasets that Correct Themselves Automatically co-hosted by Vishnu Rachakonda.
// Abstract
Pioneered at MIT by 3 Ph.D. Co-Founders, Cleanlab is an open-source/SaaS company building the premier data-centric AI tools workflows for (1) automatically correcting messy data and labels, (2) auto-tracking of dataset quality over time, (3) automatically finding classes to merge and delete, (4) auto ml for data tasks, (5) obtaining and ranking high-quality annotations, and (6) training ML models with messy data.
Most of the prescriptive tasks (finding issues) can be done in one line of code with their open-source product: https://github.com/cleanlab/cleanlab.
// Bio
Curtis Northcutt is the CEO and Co-Founder of Cleanlab focused on making AI work reliably for people and their messy, real-world data by automatically fixing issues in any ML dataset. Curtis completed his Ph.D. in Computer Science at MIT, receiving the MIT Thesis Award, NSF Fellowship, and the Goldwater Scholarship. Prior to Cleanlab, Curtis worked at AI research groups including Google, Oculus, Amazon, Facebook, Microsoft, and NASA.
// MLOps Jobs board
https://mlops.pallet.xyz/jobs
MLOps Swag/Merch
https://mlops-community.myshopify.com/
// Related Links
https://github.com/cleanlab/cleanlab
https://cleanlab.ai/blog/cleanlab-history/
https://labelerrors.com/ https://l7.curtisnorthcutt.com/
https://nips.cc/Conferences/2021/ScheduleMultitrack?event=47102
https://www.youtube.com/watch?v=ieUOv1sQPlw
https://cleanlab.typeform.com/to/NLnU1XZF
Cameo cheating detection system: https://arxiv.org/ftp/arxiv/papers/1508/1508.05699.pdf
The Cathedral & the Bazaar book: https://www.amazon.com/Cathedral-Bazaar-Musings-Accidental-Revolutionary/dp/0596001088
--------------- ✌️Connect With Us ✌️ -------------
Join our slack community: https://go.mlops.community/slack
Follow us on Twitter: @mlopscommunity
Sign up for the next meetup: https://go.mlops.community/register
Catch all episodes, blogs, newsletters, and more: https://mlops.community/
Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/
Connect with Vishnu on LinkedIn: https://www.linkedin.com/in/vrachakonda/
Connect with Curtis on LinkedIn: https://www.linkedin.com/in/cgnorthcutt/
Timestamps:
[00:00] Introduction to Curtis Northcutt
[00:30] Difference between MLOps and Data-Centric AI
[04:04] Realizing the problem of data quality in ML manifesting
[05:11] Computer vision problems
[06:54] War story that got Curtis into Data-Centric AI
[13:50] Overview of Curtis' vision
[14:45] PU Learning
[21:25] Consistency Rate and Flipping Rate
[25:25] One line of code
[29:48] Models makes mistakes
[33:09] Cleanlab play with the environment
[36:30] How ML Engineers should approach data quality problem
[42:42] Quantum computing
[46:39] Result of confident learning
[52:31] Utility for small data sets
[53:53] Cleanlab's huge success stories
[56:13] Rapid fire questions
[58:58] Cloudy and mystified space
[1:03:46] Cleanlab is hiring!
[1:05:06] Wrap up
Listen to all your favourite podcasts with AI-powered features
Listen to the best highlights from the podcasts you love and dive into the full episode
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
Listen to all your favourite podcasts with AI-powered features
Listen to the best highlights from the podcasts you love and dive into the full episode