

Can We Trust Scientific Discoveries Made Using Machine Learning? with Genevera Allen - TWiML Talk #266
May 16, 2019
Genevera Allen, an associate professor of statistics at Rice University, shares her insights on trust in machine learning discoveries. She discusses the challenges of reproducibility in scientific research, especially in biomedical fields. Genevera reflects on her impactful talk at the AAAS conference, addressing audience reactions and future research directions. The conversation also emphasizes the importance of statistical methods in validating results and the need for better education and terminology in the application of machine learning to scientific research.
AI Snips
Chapters
Transcript
Episode notes
Cancer Subtype Reproducibility
- In breast cancer, subtypes were successfully found using clustering, leading to targeted drug development.
- However, similar efforts in other cancers haven't been consistently reproducible, raising concerns.
Discovery vs. Prediction
- Reproducibility in data-driven discovery differs from prediction, focusing on generalizable insights, not just outputs.
- A key question is how to assess the generalizability of discoveries like clusters or feature importance.
Assessing Generalizability
- Split data into training and test sets to check if discoveries hold.
- Use the stability principle: repeatedly randomize training data and aggregate results to identify stable, reproducible discoveries.