When data leakage turns into a flood of trouble

4 snips

Oct 20, 2020

Rajiv Shah, a data scientist at DataRobot and professor at the University of Illinois at Chicago, dives into the critical issue of data leakage in machine learning. He explains how this hidden menace can skew model results, emphasizing techniques like activation maps to spot leakage. The conversation also covers the ethical implications of data handling and the importance of robust model development practices. Rajiv encourages aspiring data scientists to prioritize foundational skills over trends for successful machine learning.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

ADVICE

Focus on foundational techniques

Focus on classic data science problems and techniques.
Don't get distracted by the latest trendy algorithms or papers if you want to build a strong foundation.

ANECDOTE

Chicago Restaurant Inspection Model

Rajiv Shah noticed target leakage in Chicago's restaurant inspection prediction model.
The model used weather and inspector ID, leaking future information and individual inspector biases.

INSIGHT

Target Leakage in Nature Article

Target leakage is a common issue, even in prestigious publications like Nature.
It highlights the importance of skepticism and rigorous validation in data science.

Get the Snipd Podcast app to discover more snips from this episode

Get the app