Infinite Curiosity Pod with Prateek Joshi cover image

Infinite Curiosity Pod with Prateek Joshi

LLM Data Frontiers

Jan 22, 2024
33:45

Curtis Northcutt is the cofounder and CEO of Cleanlab, a data curation platform for LLMs. They have raised $30M in funding from Bain Capital Ventures, Menlo, Databricks, and TQ. He was previously the cofounder and CTO of ChipBrain. He has a PhD in Computer Science from MIT.

(00:07) Data Curation in the Context of LLMs
(01:14) Connection between Language Models and Computer Science
(03:14) Importance of Data Curation for LLMs
(04:06) Challenges in Data Curation for LLMs
(06:09) Confident Learning and its Concept
(09:42) CleanLab and its Role
(12:42) Role of Open Source Datasets and Tooling
(15:08) Balancing Data and Privacy in Regulated Industries
(17:25) Feasibility of Federated Learning
(20:35) Decentralized Compute and Aggregating Compute Clusters
(25:19) Determining Model Size for Data Representation
(27:09) Advice for ML Engineers in Handling Data Curation
(30:20) Rapid Fire Round

Curtis's favorite book: The Bible (in the context of marketing)

--------
Where to find Prateek Joshi:

Newsletter: https://prateekjoshi.substack.com 
Website: https://prateekj.com 
LinkedIn: https://www.linkedin.com/in/prateek-joshi-91047b19 
Twitter: https://twitter.com/prateekvjoshi 

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner
Get the app