MIT Technology Review Narrated cover image

Large language models can do jaw-dropping things. But nobody knows exactly why.

MIT Technology Review Narrated

NOTE

Rethink Complexity in Deep Learning

Progress in understanding deep learning continues, yet many questions remain unresolved. Recent research suggests that grokking and double descent may be interconnected phenomena, emphasizing the need for explanations that encompass both. Contrarily, some researchers challenge the validity of double descent, asserting it may stem from flawed measures of model complexity. They argue that the number of parameters alone does not accurately reflect a model's complexity, as this can vary based on usage and interaction during training. A reconsideration of complexity metrics could lead to a better grasp of large model behaviors, indicating that existing mathematical frameworks may suffice to explain these phenomena despite our limited understanding of model dynamics at scale.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner