

EA Forum Podcast (Curated & popular)
EA Forum Team
Audio narrations from the Effective Altruism Forum, including curated posts and posts with 125 karma.
If you'd like more episodes, subscribe to the "EA Forum (All audio)" podcast instead.
If you'd like more episodes, subscribe to the "EA Forum (All audio)" podcast instead.
Episodes
Mentioned books

Feb 2, 2026 • 3min
“The Scaling Series Discussion Thread: with Toby Ord” by Toby Tremlett🔹
We're trying something a bit new this week. Over the last year, Toby Ord has been writing about the implications of the fact that improvements in AI require exponentially more compute. Only one of these posts so far has been put on the EA forum. This week we've put the entire series on the Forum and made this thread for you to discuss your reactions to the posts. Toby Ord will check in once a day to respond to your comments[1]. Feel free to also comment directly on the individual posts that make up this sequence, but you can treat this as a central discussion space for both general takes and more specific questions. If you haven't read the series yet, we've created a page where you can, and you can see the summaries of each post below: Are the Costs of AI Agents Also Rising Exponentially? Agents can do longer and longer tasks, but their dollar cost to do these tasks may be growing even faster. How Well Does RL Scale? I show that RL-training for LLMs scales much worse than inference or pre-training. Evidence that Recent AI Gains are Mostly from Inference-Scaling I show how [...] ---
First published:
February 2nd, 2026
Source:
https://forum.effectivealtruism.org/posts/JAcueP8Dh6db6knBK/the-scaling-series-discussion-thread-with-toby-ord
---
Narrated by TYPE III AUDIO.

Feb 2, 2026 • 15min
[Linkpost] “Are the Costs of AI Agents Also Rising Exponentially?” by Toby_Ord
This is a link post. There is an extremely important question about the near-future of AI that almost no-one is asking. We’ve all seen the graphs from METR showing that the length of tasks AI agents can perform has been growing exponentially over the last 7 years. While GPT-2 could only do software engineering tasks that would take someone a few seconds, the latest models can (50% of the time) do tasks that would take a human a few hours. As this trend shows no signs of stopping, people have naturally taken to extrapolating it out, to forecast when we might expect AI to be able to do tasks that take an engineer a full work-day; or week; or year. But we are missing a key piece of information — the cost of performing this work. Over those 7 years AI systems have grown exponentially. The size of the models (parameter count) has grown by 4,000x and the number of times they are run in each task (tokens generated) has grown by about 100,000x. AI researchers have also found massive efficiencies, but it is eminently plausible that the cost for the peak performance measured by METR has been [...] ---Outline:(13:02) Conclusions(14:05) Appendix(14:08) METR has a similar graph on their page for GPT-5.1 codex. It includes more models and compares them by token counts rather than dollar costs: ---
First published:
February 2nd, 2026
Source:
https://forum.effectivealtruism.org/posts/AbHPpGTtAMyenWGX8/are-the-costs-of-ai-agents-also-rising-exponentially
Linkpost URL:https://www.tobyord.com/writing/hourly-costs-for-ai-agents
---
Narrated by TYPE III AUDIO.
---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Feb 2, 2026 • 10min
[Linkpost] “Evidence that Recent AI Gains are Mostly from Inference-Scaling” by Toby_Ord
This is a link post. In the last year or two, the most important trend in modern AI came to an end. The scaling-up of computational resources used to train ever-larger AI models through next-token prediction (pre-training) stalled out. Since late 2024, we’ve seen a new trend of using reinforcement learning (RL) in the second stage of training (post-training). Through RL, the AI models learn to do superior chain-of-thought reasoning about the problem they are being asked to solve. This new era involves scaling up two kinds of compute: the amount of compute used in RL post-training the amount of compute used every time the model answers a question Industry insiders are excited about the first new kind of scaling, because the amount of compute needed for RL post-training started off being small compared to the tremendous amounts already used in next-token prediction pre-training. Thus, one could scale the RL post-training up by a factor of 10 or 100 before even doubling the total compute used to train the model. But the second new kind of scaling is a problem. Major AI companies were already starting to spend more compute serving their models to customers than in the training [...] ---
First published:
February 2nd, 2026
Source:
https://forum.effectivealtruism.org/posts/5zfubGrJnBuR5toiK/evidence-that-recent-ai-gains-are-mostly-from-inference
Linkpost URL:https://www.tobyord.com/writing/mostly-inference-scaling
---
Narrated by TYPE III AUDIO.
---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Feb 2, 2026 • 15min
[Linkpost] “The Extreme Inefficiency of RL for Frontier Models” by Toby_Ord
This is a link post. The new scaling paradigm for AI reduces the amount of information a model can learn from per hour of training by a factor of 1,000 to 1,000,000. I explore what this means and its implications for scaling. The last year has seen a massive shift in how leading AI models are trained. 2018–2023 was the era of pre-training scaling. LLMs were primarily trained by next-token prediction (also known as pre-training). Much of OpenAI's progress from GPT-1 to GPT-4, came from scaling up the amount of pre-training by a factor of 1,000,000. New capabilities were unlocked not through scientific breakthroughs, but through doing more-or-less the same thing at ever-larger scales. Everyone was talking about the success of scaling, from AI labs to venture capitalists to policy makers. However, there's been markedly little progress in scaling up this kind of training since (GPT-4.5 added one more factor of 10, but was then quietly retired). Instead, there has been a shift to taking one of these pre-trained models and further training it with large amounts of Reinforcement Learning (RL). This has produced models like OpenAI's o1, o3, and GPT-5, with dramatic improvements in reasoning (such as solving [...] ---
First published:
February 2nd, 2026
Source:
https://forum.effectivealtruism.org/posts/64iwgmMvGSTBHPdHg/the-extreme-inefficiency-of-rl-for-frontier-models
Linkpost URL:https://www.tobyord.com/writing/inefficiency-of-reinforcement-learning
---
Narrated by TYPE III AUDIO.
---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Feb 2, 2026 • 35min
[Linkpost] “Inference Scaling Reshapes AI Governance” by Toby_Ord
This is a link post. The shift from scaling up the pre-training compute of AI systems to scaling up their inference compute may have profound effects on AI governance. The nature of these effects depends crucially on whether this new inference compute will primarily be used during external deployment or as part of a more complex training programme within the lab. Rapid scaling of inference-at-deployment would: lower the importance of open-weight models (and of securing the weights of closed models), reduce the impact of the first human-level models, change the business model for frontier AI, reduce the need for power-intense data centres, and derail the current paradigm of AI governance via training compute thresholds. Rapid scaling of inference-during-training would have more ambiguous effects that range from a revitalisation of pre-training scaling to a form of recursive self-improvement via iterated distillation and amplification. The end of an era — for both training and governance The intense year-on-year scaling up of AI training runs has been one of the most dramatic and stable markers of the Large Language Model era. Indeed it had been widely taken to be a permanent fixture of the AI landscape and the basis of many approaches to [...] ---Outline:(01:06) The end of an era -- for both training and governance(05:24) Scaling inference-at-deployment(06:42) Reducing the number of simultaneously served copies of each new model(08:45) Reducing the value of securing model weights(09:30) Reducing the benefits and risks of open-weight models(10:05) Unequal performance for different tasks and for different users(12:08) Changing the business model and industry structure(12:50) Reducing the need for monolithic data centres(17:16) Scaling inference-during-training(28:07) Conclusions(30:17) Appendix. Comparing the costs of scaling pre-training vs inference-at-deployment ---
First published:
February 2nd, 2026
Source:
https://forum.effectivealtruism.org/posts/RnsgMzsnXcceFfKip/inference-scaling-reshapes-ai-governance
Linkpost URL:https://www.tobyord.com/writing/inference-scaling-reshapes-ai-governance
---
Narrated by TYPE III AUDIO.
---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Feb 2, 2026 • 20min
[Linkpost] “Is there a Half-Life for the Success Rates of AI Agents?” by Toby_Ord
This is a link post. Building on the recent empirical work of Kwa et al. (2025), I show that within their suite of research-engineering tasks the performance of AI agents on longer-duration tasks can be explained by an extremely simple mathematical model — a constant rate of failing during each minute a human would take to do the task. This implies an exponentially declining success rate with the length of the task and that each agent could be characterised by its own half-life. This empirical regularity allows us to estimate the success rate for an agent at different task lengths. And the fact that this model is a good fit for the data is suggestive of the underlying causes of failure on longer tasks — that they involve increasingly large sets of subtasks where failing any one fails the task. Whether this model applies more generally on other suites of tasks is unknown and an important subject for further work. METR's results on the length of tasks agents can reliably complete A recent paper by Kwa et al. (2025) from the research organisation METR has found an exponential trend in the duration of the tasks that frontier AI agents can [...] ---Outline:(05:33) Explaining these results via a constant hazard rate(14:54) Upshots of the constant hazard rate model(18:47) Further work(19:25) References ---
First published:
February 2nd, 2026
Source:
https://forum.effectivealtruism.org/posts/qz3xyqCeriFHeTAJs/is-there-a-half-life-for-the-success-rates-of-ai-agents-3
Linkpost URL:https://www.tobyord.com/writing/half-life
---
Narrated by TYPE III AUDIO.
---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Feb 2, 2026 • 17min
[Linkpost] “Inference Scaling and the Log-x Chart” by Toby_Ord
This is a link post. Improving model performance by scaling up inference compute is the next big thing in frontier AI. But the charts being used to trumpet this new paradigm can be misleading. While they initially appear to show steady scaling and impressive performance for models like o1 and o3, they really show poor scaling (characteristic of brute force) and little evidence of improvement between o1 and o3. I explore how to interpret these new charts and what evidence for strong scaling and progress would look like. From scaling training to scaling inference The dominant trend in frontier AI over the last few years has been the rapid scale-up of training — using more and more compute to produce smarter and smarter models. Since GPT-4, this kind of scaling has run into challenges, so we haven’t yet seen models much larger than GPT-4. But we have seen a recent shift towards scaling up the compute used during deployment (aka 'test-time compute’ or ‘inference compute’), with more inference compute producing smarter models. You could think of this as a change in strategy from improving the quality of your employees’ work via giving them more years of training in which acquire [...] ---
First published:
February 2nd, 2026
Source:
https://forum.effectivealtruism.org/posts/zNymXezwySidkeRun/inference-scaling-and-the-log-x-chart
Linkpost URL:https://www.tobyord.com/writing/inference-scaling-and-the-log-x-chart
---
Narrated by TYPE III AUDIO.
---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Jan 30, 2026 • 16min
[Linkpost] “The Scaling Paradox” by Toby_Ord
This is a link post. AI capabilities have improved remarkably quickly, fuelled by the explosive scale-up of resources being used to train the leading models. But if you examine the scaling laws that inspired this rush, they actually show extremely poor returns to scale. What's going on? AI Scaling is Shockingly Impressive The era of LLMs has seen remarkable improvements in AI capabilities over a very short time. This is often attributed to the AI scaling laws — statistical relationships which govern how AI capabilities improve with more parameters, compute, or data. Indeed AI thought-leaders such as Ilya Sutskever and Dario Amodei have said that the discovery of these laws led them to the current paradigm of rapid AI progress via a dizzying increase in the size of frontier systems. Before the 2020s, most AI researchers were looking for architectural changes to push the frontiers of AI forwards. The idea that scale alone was sufficient to provide the entire range of faculties involved in intelligent thought was unfashionable and seen as simplistic. A key reason it worked was the tremendous versatility of text. As Turing had noted more than 60 years earlier, almost any challenge that one could pose to [...] ---
First published:
January 30th, 2026
Source:
https://forum.effectivealtruism.org/posts/742xJNTqer2Dt9Cxx/the-scaling-paradox
Linkpost URL:https://www.tobyord.com/writing/the-scaling-paradox
---
Narrated by TYPE III AUDIO.
---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Jan 29, 2026 • 9min
“Why Isn’t EA at the Table When $121 Billion Gets Allocated to Biodiversity Every Year?” by David Goodman
There is an insane amount of money being thrown around by international organizations and agreements. Nobody with any kind of power over these agreements is asking basic EA questions like: "What are the problems we're trying to solve?" "What are the most neglected aspects of those problems?" and "What is the most cost-effective way to address those neglected areas?" As someone coming from an EA background reading through plans for $200-700 billion in annual funding commitments that focus on unimaginative and ineffective interventions, it makes you want to tear your hair out. So much good could be done with that money. EA focuses a lot on private philanthropy, earning-to-give (though less so post-SBF), and the usual pots of money. But why don't we have delegations who are knowledgeable in international diplomacy going to COPs and advocating for more investment in lab-grown meat, alternative proteins, or lithium recycling? It seems like there would be insane alpha in such a strategy. An example: The Global Biodiversity Framework The Kunming-Montreal Global Biodiversity Framework (GBF) was adopted in 2022 to halt biodiversity loss. It has 23 targets, commitments of $200 billion annually by 2030 and $700 billion by 2050, and near-universal adoption from [...] ---Outline:(01:12) An example: The Global Biodiversity Framework(02:13) What Is That Money Actually Being Spent On?(03:02) The Elephant in the Room Literally Nobody is Talking About: Beef(04:21) The Absolutely Insane Funding Gap(05:26) The Leverage Point Were Ignoring(06:47) What Would EA Engagement Look Like? ---
First published:
January 20th, 2026
Source:
https://forum.effectivealtruism.org/posts/Peaq4HNhn8agsZY3z/why-isn-t-ea-at-the-table-when-usd121-billion-gets-allocated
---
Narrated by TYPE III AUDIO.

Jan 29, 2026 • 2min
“If EA ruled the world, career advisors would tell some people to work for the postal service” by Toby Tremlett🔹
EA thinking is thinking on the margin. When EAs prioritise causes, they are prioritising causes given the fact that they only control their one career, or, sometimes, given that they have some influence over a community of a few thousand people, and the distribution of some millions or billions of dollars. Some critiques of EA act as if statements about cause prioritisation are absolute rather than relative. I.e. that EAs are saying that literally everyone should be working on AI Safety, or, the flipside, that EAs are saying that no one should be working on [insert a problem which is pressing, but not among the most urgent to commit the next million dollars to]. In conversations that sound like this, I've often turned to the idea that, if EAs controlled all the resources in the world, career advisors at the hypothetical world government's version of 80,000 Hours would be advising some people to be postal workers. Given that the EA world government will have long ago filled the current areas of direct EA work, it could be the single most impactful thing a person could do with their skillset, given the comparative neglectedness of work in the [...] ---
First published:
January 16th, 2026
Source:
https://forum.effectivealtruism.org/posts/MZ5g33fXuxd6bSgJW/if-ea-ruled-the-world-career-advisors-would-tell-some-people
---
Narrated by TYPE III AUDIO.


