AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
A Machine Learning Course at a My Tea
Alexandra was inspired by the power of data to improve education. She took a machine learning course at m i t, which is notorious for being difficult. Alexandra now works as an engineer and has her own consulting firm.
Coffee Sessions #59 with Cody Coleman, Data Quality Over Quantity or Data Selection for Data-Centric AI.
// Abstract
Big data has been critical to many of the successes in ML, but it brings its own problems. Working with massive datasets is cumbersome and expensive, especially with unstructured data like images, videos, and speech. Careful data selection can mitigate the pains of big data by focusing computational and labeling resources on the most valuable examples.
Cody Coleman, a recent Ph.D. from Stanford University and founding member of MLCommons, joins us to describe how a more data-centric approach that focuses on data quality rather than quantity can lower the AI/ML barrier. Instead of managing clusters of machines and setting up cumbersome labeling pipelines, you can spend more time tackling real problems.
// Bio
Cody Coleman recently finished his Ph.D. in CS at Stanford University, where he was advised by Professors Matei Zaharia and Peter Bailis. His research spans from performance benchmarking of hardware and software systems (i.e., DAWNBench and MLPerf) to computationally efficient methods for active learning and core-set selection. His work has been supported by the NSF GRFP, the Stanford DAWN Project, and the Open Phil AI Fellowship.
// Relevant
Links [preprint] Similarity Search for Efficient Active Learning and Search of Rare Concepts: [https://arxiv.org/abs/2007.00077](https://arxiv.org/abs/2007.00077)
[video] Similarity Search for Efficient Active Learning and Search of Rare Concepts: [https://www.youtube.com/watch?v=vRVyOEK2JUU](https://www.youtube.com/watch?v=vRVyOEK2JUU)
[blog post] Selection via Proxy: Efficient Data Selection for Deep Learning: [https://dawn.cs.stanford.edu/2020/04/23/selection-via-proxy/](https://dawn.cs.stanford.edu/2020/04/23/selection-via-proxy/)
[slides] The DAWN of MLPerf: [https://drive.google.com/file/d/17ZpX0GOtOXG8QMn6KEc_Le8tUfDBlgDE/view](https://drive.google.com/file/d/17ZpX0GOtOXG8QMn6KEc_Le8tUfDBlgDE/view)
[blog post] About Cody's research: [https://hai.stanford.edu/news/cody-coleman-lowering-machine-learnings-barriers-help-people-tackle-real-problems](https://hai.stanford.edu/news/cody-coleman-lowering-machine-learnings-barriers-help-people-tackle-real-problems)
[video] About Cody: [https://www.youtube.com/watch?v=stxJMsxxxtA](https://www.youtube.com/watch?v=stxJMsxxxtA)
--------------- ✌️Connect With Us ✌️ -------------
Join our slack community: https://go.mlops.community/slack
Follow us on Twitter: @mlopscommunity
Sign up for the next meetup: https://go.mlops.community/register
Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/
Connect with Vishnu on LinkedIn: https://www.linkedin.com/in/vrachakonda/
Connect with Cody on LinkedIn: https://www.linkedin.com/in/codyaustun/
Timestamps:
[00:00] Introduction to Cody Coleman
[03:10] Cody's life story
[07:35] Cody's journey in tech
[15:04] Interest in Machine Learning and work at Stanford came about
[21:48] Data-centric Machine Learning Data Quality
[28:56] Research and Industry being together
[33:33] Advice to practitioners
[38:03] Principles of Machine Learning in an academic setting
[43:50] Data-centric promising techniques that stand out
[53:51] Developing benchmarks
[56:34] Guardrails for machine learning vs automated testing suites
[1:02:57] Creating something valuable and useful
[1:07:05] Data collecting vs Data Hoarding
Listen to all your favourite podcasts with AI-powered features
Listen to the best highlights from the podcasts you love and dive into the full episode
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
Listen to all your favourite podcasts with AI-powered features
Listen to the best highlights from the podcasts you love and dive into the full episode