Theories of Everything with Curt Jaimungal

David Hand: Ai, Dark Data, LLMs, Peer Review

6 snips

Aug 14, 2023

David Hand, a professor of statistics at Imperial College London, dives into the fascinating world of dark data and its implications on analysis. He explains how unseen data can mislead conclusions, especially in critical areas like public health and AI. Hand contrasts data-driven and theory-driven models, emphasizing the risks of relying solely on the former. He also addresses the limitations of large language models, cautioning against their perceived intelligence. The discussion reveals the intricate balance between data transparency and public trust in research.

Ask episode

AI Snips

Chapters

Books

Transcript

Episode notes

INSIGHT

Data-Driven vs. Theory-Driven Models

Data-driven models, like large language models, are brittle because they rely solely on existing data.
Theory-driven models are more robust because they incorporate underlying principles, allowing for better adaptation to change.

ANECDOTE

Credit Scoring Model Example

A credit scoring model built on data from people over 70 might fail when applied to people under 30, illustrating the brittleness of data-driven models.
The differing financial circumstances and risk profiles of these groups demonstrate why data-driven models need diverse, representative datasets.

INSIGHT

Missing Crucial Data

Missing crucial data can lead to flawed models and inaccurate predictions, as seen with early COVID models that didn't consider age.
Consider variables like demographics or underlying conditions to create comprehensive models.

Get the Snipd Podcast app to discover more snips from this episode

Get the app