AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Intelligence vs. Reasoning in AI and Chess
This chapter explores the intricate distinctions between intelligence and reasoning, especially within AI systems in chess. It contrasts traditional engines with innovative approaches like AlphaZero, emphasizing how diverse experiences shape intuition and creativity in both human and machine reasoning.
Federico Barbero (DeepMind/Oxford) is the lead author of "Transformers Need Glasses!".
Have you ever wondered why LLMs struggle with seemingly simple tasks like counting or copying long strings of text? We break down the theoretical reasons behind these failures, revealing architectural bottlenecks and the challenges of maintaining information fidelity across extended contexts.
Federico explains how these issues are rooted in the transformer's design, drawing parallels to over-squashing in graph neural networks and detailing how the softmax function limits sharp decision-making.
But it's not all bad news! Discover practical "glasses" that can help transformers see more clearly, from simple input modifications to architectural tweaks.
SPONSOR MESSAGES:
***
CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. Check out their super fast DeepSeek R1 hosting!
https://centml.ai/pricing/
Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.
Goto https://tufalabs.ai/
***
https://federicobarbero.com/
TRANSCRIPT + RESEARCH:
https://www.dropbox.com/s/h7ys83ztwktqjje/Federico.pdf?dl=0
TOC:
1. Transformer Limitations: Token Detection & Representation
[00:00:00] 1.1 Transformers fail at single token detection
[00:02:45] 1.2 Representation collapse in transformers
[00:03:21] 1.3 Experiment: LLMs fail at copying last tokens
[00:18:00] 1.4 Attention sharpness limitations in transformers
2. Transformer Limitations: Information Flow & Quantization
[00:18:50] 2.1 Unidirectional information mixing
[00:18:50] 2.2 Unidirectional information flow towards sequence beginning in transformers
[00:21:50] 2.3 Diagonal attention heads as expensive no-ops in LAMA/Gemma
[00:27:14] 2.4 Sequence entropy affects transformer model distinguishability
[00:30:36] 2.5 Quantization limitations lead to information loss & representational collapse
[00:38:34] 2.6 LLMs use subitizing as opposed to counting algorithms
3. Transformers and the Nature of Reasoning
[00:40:30] 3.1 Turing completeness conditions in transformers
[00:43:23] 3.2 Transformers struggle with sequential tasks
[00:45:50] 3.3 Windowed attention as solution to information compression
[00:51:04] 3.4 Chess engines: mechanical computation vs creative reasoning
[01:00:35] 3.5 Epistemic foraging introduced
REFS:
[00:01:05] Transformers Need Glasses!, Barbero et al.
https://proceedings.neurips.cc/paper_files/paper/2024/file/b1d35561c4a4a0e0b6012b2af531e149-Paper-Conference.pdf
[00:05:30] Softmax is Not Enough, Veličković et al.
https://arxiv.org/abs/2410.01104
[00:11:30] Adv Alg Lecture 15, Chawla
https://pages.cs.wisc.edu/~shuchi/courses/787-F09/scribe-notes/lec15.pdf
[00:15:05] Graph Attention Networks, Veličković
https://arxiv.org/abs/1710.10903
[00:19:15] Extract Training Data, Carlini et al.
https://arxiv.org/pdf/2311.17035
[00:31:30] 1-bit LLMs, Ma et al.
https://arxiv.org/abs/2402.17764
[00:38:35] LLMs Solve Math, Nikankin et al.
https://arxiv.org/html/2410.21272v1
[00:38:45] Subitizing, Railo
https://link.springer.com/10.1007/978-1-4419-1428-6_578
[00:43:25] NN & Chomsky Hierarchy, Delétang et al.
https://arxiv.org/abs/2207.02098
[00:51:05] Measure of Intelligence, Chollet
https://arxiv.org/abs/1911.01547
[00:52:10] AlphaZero, Silver et al.
https://pubmed.ncbi.nlm.nih.gov/30523106/
[00:55:10] Golden Gate Claude, Anthropic
https://www.anthropic.com/news/golden-gate-claude
[00:56:40] Chess Positions, Chase & Simon
https://www.sciencedirect.com/science/article/abs/pii/0010028573900042
[01:00:35] Epistemic Foraging, Friston
https://www.frontiersin.org/journals/computational-neuroscience/articles/10.3389/fncom.2016.00056/full
Listen to all your favourite podcasts with AI-powered features
Listen to the best highlights from the podcasts you love and dive into the full episode
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
Listen to all your favourite podcasts with AI-powered features
Listen to the best highlights from the podcasts you love and dive into the full episode