

Kate Park: Data Engines for Vision and Language
26 snips Mar 21, 2024
Kate Park, Director of Product at Scale AI, discusses the importance of data in AI systems, focusing on self-driving vehicles and NLP applications. The podcast explores challenges in model evaluation, expert AI trainers, and the role of humans in labeling tasks.
AI Snips
Chapters
Transcript
Episode notes
Data Engine's First Success
- Kate Park built Tesla's data engine to improve model performance with data alone.
- Its first success was unblocking the Navigate on Autopilot release by targeting edge cases in object and fork detection.
Data Scaling and Plateaus
- Data scaling experiments reveal how data volume correlates with performance improvements, often showing a plateau.
- This helps determine when to shift focus from data to architectural improvements.
Prioritizing Data Improvements
- Prioritize data improvements based on impact and resource allocation.
- Address major issues before minor improvements, considering the cost and time of labeling.