

David Hand: Ai, Dark Data, LLMs, Peer Review
6 snips Aug 14, 2023
David Hand, a professor of statistics at Imperial College London, dives into the fascinating world of dark data and its implications on analysis. He explains how unseen data can mislead conclusions, especially in critical areas like public health and AI. Hand contrasts data-driven and theory-driven models, emphasizing the risks of relying solely on the former. He also addresses the limitations of large language models, cautioning against their perceived intelligence. The discussion reveals the intricate balance between data transparency and public trust in research.
AI Snips
Chapters
Books
Transcript
Episode notes
Data-Driven vs. Theory-Driven Models
- Data-driven models, like large language models, are brittle because they rely solely on existing data.
- Theory-driven models are more robust because they incorporate underlying principles, allowing for better adaptation to change.
Credit Scoring Model Example
- A credit scoring model built on data from people over 70 might fail when applied to people under 30, illustrating the brittleness of data-driven models.
- The differing financial circumstances and risk profiles of these groups demonstrate why data-driven models need diverse, representative datasets.
Missing Crucial Data
- Missing crucial data can lead to flawed models and inaccurate predictions, as seen with early COVID models that didn't consider age.
- Consider variables like demographics or underlying conditions to create comprehensive models.