AXRP - the AI X-risk Research Podcast cover image

AXRP - the AI X-risk Research Podcast

Latest episodes

undefined
May 28, 2021 • 1min

7.5 - Forecasting Transformative AI from Biological Anchors with Ajeya Cotra

If you want to shape the development and forecast the consequences of powerful AI technology, it's important to know when it might appear. In this episode, I talk to Ajeya Cotra about her draft report "Forecasting Transformative AI from Biological Anchors" which aims to build a probabilistic model to answer this question. We talk about a variety of topics, including the structure of the model, what the most important parts are to get right, how the estimates should shape our behaviour, and Ajeya's current work at Open Philanthropy and perspective on the AI x-risk landscape.   Unfortunately, there was a problem with the recording of our interview, so we weren't able to release it in audio form, but you can read a transcript of the whole conversation.   Link to the transcript: axrp.net/episode/2021/05/28/episode-7_5-forecasting-transformative-ai-ajeya-cotra.html   Link to the draft report "Forecasting Transformative AI from Biological Anchors": drive.google.com/drive/u/1/folders/15ArhEPZSTYU8f012bs6ehPS6-xmhtBPP
undefined
May 14, 2021 • 1h 19min

7 - Side Effects with Victoria Krakovna

One way of thinking about how AI might pose an existential threat is by taking drastic actions to maximize its achievement of some objective function, such as taking control of the power supply or the world's computers. This might suggest a mitigation strategy of minimizing the degree to which AI systems have large effects on the world that are not absolutely necessary for achieving their objective. In this episode, Victoria Krakovna talks about her research on quantifying and minimizing side effects. Topics discussed include how one goes about defining side effects and the difficulties in doing so, her work using relative reachability and the ability to achieve future tasks as side effects measures, and what she thinks the open problems and difficulties are.   Link to the transcript: axrp.net/episode/2021/05/14/episode-7-side-effects-victoria-krakovna.html   Link to the paper "Penalizing Side Effects Using Stepwise Relative Reachability": arxiv.org/abs/1806.01186 Link to the paper "Avoiding Side Effects by Considering Future Tasks": arxiv.org/abs/2010.07877   Victoria Krakovna's website: vkrakovna.wordpress.com Victoria Krakovna's Alignment Forum profile: alignmentforum.org/users/vika   Work mentioned in the episode:  - Rohin Shah on the difficulty of finding a value-agnostic impact measure: lesswrong.com/posts/kCY9dYGLoThC3aG7w/best-reasons-for-pessimism-about-impact-of-impact-measures#qAy66Wza8csAqWxiB  - Stuart Armstrong's bucket of water example: lesswrong.com/posts/zrunBA8B5bmm2XZ59/reversible-changes-consider-a-bucket-of-water  - Attainable Utility Preservation: arxiv.org/abs/1902.09725  - Low Impact Artificial Intelligences: arxiv.org/abs/1705.10720  - AI Safety Gridworlds: arxiv.org/abs/1711.09883  - Test Cases for Impact Regularisation Methods: lesswrong.com/posts/wzPzPmAsG3BwrBrwy/test-cases-for-impact-regularisation-methods  - SafeLife: partnershiponai.org/safelife  - Avoiding Side Effects in Complex Environments: arxiv.org/abs/2006.06547
undefined
Apr 8, 2021 • 1h 59min

6 - Debate and Imitative Generalization with Beth Barnes

One proposal to train AIs that can be useful is to have ML models debate each other about the answer to a human-provided question, where the human judges which side has won. In this episode, I talk with Beth Barnes about her thoughts on the pros and cons of this strategy, what she learned from seeing how humans behaved in debate protocols, and how a technique called imitative generalization can augment debate. Those who are already quite familiar with the basic proposal might want to skip past the explanation of debate to 13:00, "what problems does it solve and does it not solve".   Link to Beth's posts on the Alignment Forum: alignmentforum.org/users/beth-barnes   Link to the transcript: axrp.net/episode/2021/04/08/episode-6-debate-beth-barnes.html
undefined
Mar 10, 2021 • 1h 24min

5 - Infra-Bayesianism with Vanessa Kosoy

The theory of sequential decision-making has a problem: how can we deal with situations where we have some hypotheses about the environment we're acting in, but its exact form might be outside the range of possibilities we can possibly consider? Relatedly, how do we deal with situations where the environment can simulate what we'll do in the future, and put us in better or worse situations now depending on what we'll do then? Today's episode features Vanessa Kosoy talking about infra-Bayesianism, the mathematical framework she developed with Alex Appel that modifies Bayesian decision theory to succeed in these types of situations.   Link to the sequence of posts - Infra-Bayesianism: alignmentforum.org/s/CmrW8fCmSLK7E25sa Link to the transcript: axrp.net/episode/2021/03/10/episode-5-infra-bayesianism-vanessa-kosoy.html Vanessa Kosoy's Alignment Forum profile: alignmentforum.org/users/vanessa-kosoy
undefined
Feb 17, 2021 • 2h 14min

4 - Risks from Learned Optimization with Evan Hubinger

In machine learning, typically optimization is done to produce a model that performs well according to some metric. Today's episode features Evan Hubinger talking about what happens when the learned model itself is doing optimization in order to perform well, how the goals of the learned model could differ from the goals we used to select the learned model, and what would happen if they did differ.   Link to the paper - Risks from Learned Optimization in Advanced Machine Learning Systems: arxiv.org/abs/1906.01820 Link to the transcript: axrp.net/episode/2021/02/17/episode-4-risks-from-learned-optimization-evan-hubinger.html Evan Hubinger's Alignment Forum profile: alignmentforum.org/users/evhub
undefined
Dec 11, 2020 • 58min

3 - Negotiable Reinforcement Learning with Andrew Critch

In this episode, I talk with Andrew Critch about negotiable reinforcement learning: what happens when two people (or organizations, or what have you) who have different beliefs and preferences jointly build some agent that will take actions in the real world. In the paper we discuss, it's proven that the only way to make such an agent Pareto optimal - that is, have it not be the case that there's a different agent that both people would prefer to use instead - is to have it preferentially optimize the preferences of whoever's beliefs were more accurate. We discuss his motivations for working on the problem and what he thinks about it.   Link to the paper - Negotiable Reinforcement Learning for Pareto Optimal Sequential Decision-Making: papers.nips.cc/paper/2018/hash/5b8e4fd39d9786228649a8a8bec4e008-Abstract.html Link to the transcript: axrp.net/episode/2020/12/11/episode-3-negotiable-reinforcement-learning-andrew-critch.html Critch's Google Scholar profile: scholar.google.com/citations?user=F3_yOXUAAAAJ&hl=en&oi=ao
undefined
Dec 11, 2020 • 1h 9min

2 - Learning Human Biases with Rohin Shah

One approach to creating useful AI systems is to watch humans doing a task, infer what they're trying to do, and then try to do that well. The simplest way to infer what the humans are trying to do is to assume there's one goal that they share, and that they're optimally achieving the goal. This has the problem that humans aren't actually optimal at achieving the goals they pursue. We could instead code in the exact way in which humans behave suboptimally, except that we don't know that either. In this episode, I talk with Rohin Shah about his paper about learning the ways in which humans are suboptimal at the same time as learning what goals they pursue: why it's hard, how he tried to do it, how well he did, and why it matters.   Link to the paper - On the Feasibility of Learning, Rather than Assuming, Human Biases for Reward Inference: arxiv.org/abs/1906.09624 Link to the transcript: axrp.net/episode/2020/12/11/episode-2-learning-human-biases-rohin-shah.html The Alignment Newsletter: rohinshah.com/alignment-newsletter Rohin's contributions to the AI alignment forum: alignmentforum.org/users/rohinmshah Rohin's website: rohinshah.com
undefined
Dec 11, 2020 • 59min

1 - Adversarial Policies with Adam Gleave

In this episode, Adam Gleave and I talk about adversarial policies. Basically, in current reinforcement learning, people train agents that act in some kind of environment, sometimes an environment that contains other agents. For instance, you might train agents that play sumo with each other, with the objective of making them generally good at sumo. Adam's research looks at the case where all you're trying to do is make an agent that defeats one specific other agents: how easy is it, and what happens? He discovers that often, you can do it pretty easily, and your agent can behave in a very silly-seeming way that nevertheless happens to exploit some 'bug' in the opponent. We talk about the experiments he ran, the results, and what they say about how we do reinforcement learning.   Link to the paper - Adversarial Policies: Attacking Deep Reinforcement Learning: arxiv.org/abs/1905.10615 Link to the transcript: axrp.net/episode/2020/12/11/episode-1-adversarial-policies-adam-gleave.html Adam's website: gleave.me Adam's twitter account: twitter.com/argleave

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode