LessWrong (30+ Karma)

“Seeking Collaborators” by abramdemski

Nov 1, 2024

Abram Demski, an AI Safety Camp mentor focused on the tiling problem, discusses his approach to developing reflectively consistent decision theories. He emphasizes the significance of Updateless Decision Theory (UDT) in AI safety. Demski invites collaborators to explore this complex problem, which involves self-modification and cooperative behavior among AI agents. He also touches on concepts like logical and value uncertainty, making a case for multidisciplinary collaboration to enhance safety in AI interactions.

13:54

Creator website

Episode guests

abramdemski

AI Summary

AI Chapters

Episode notes

Podcast summary created with Snipd AI

Quick takeaways

The Tiling Agents problem explores how agents can modify each other while preserving safety properties, crucial for self-modification.

Research on tiling theory aims to establish principles that enhance trust in AI systems and mitigate decision-making conflicts among agents.

Deep dives

Understanding the Tiling Agent's Problem

The tiling agent's problem, also known as reflective consistency, examines how one agent can intentionally modify another while maintaining certain properties. This analysis is crucial for ensuring that self-modifications do not compromise safety-relevant features. The concept revolves around understanding when agents can trust each other, with self-trust being a pivotal factor in avoiding harmful self-modifications. The exploration of tiling results aims to establish clear conditions under which both AI and humans can preserve essential safety properties throughout self-modification processes.

Exploring the Tiling Problem in AI Safety

14min

I've been accepted as a mentor for the next AI Safety Camp. You can apply to work with me on the tiling problem. The goal will be to develop reflectively consistent UDT-inspired decision theories, and try to prove tiling theorems for them.

The deadline for applicants is November 17.

The program will run from January 11 to April 27. It asks for a 10 hour/week commitment.

My project description follows:

Summary

The Tiling Agents problem (aka reflective consistency) consists of analysing when one agent (the "predecessor") will choose to deliberately modify another agent (the "successor"). Usually, the predecessor and successor are imagined as the same agent across time, so we are studying self-modification. A set of properties "tiles" if those properties, when present in both predecessor and successor, guarantee that any self-modifications will avoid changing those properties.

You can think of this as the question of when agents will [...]

---

Outline:

(00:33) Summary

(02:10) The non-summary

(02:14) Motivation

(03:20) Tiling Overview

(05:13) Reflective Oracles

(06:41) Logical Uncertainty

(08:17) Value Uncertainty

(10:14) Value Plurality

(11:38) Ontology Plurality

(12:22) Cooperation and Coordination

---

First published:
November 1st, 2024

Source:
https://www.lesswrong.com/posts/7AzexLYpXKMqevttN/seeking-collaborators

---

Narrated by TYPE III AUDIO.

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

LessWrong (30+ Karma)

“Seeking Collaborators” by abramdemski

Episode guests

Podcast summary created with Snipd AI

Quick takeaways

Deep dives

Understanding the Tiling Agent's Problem

Motivations Behind Tiling Research

Challenges and Future Directions in Tiling

Get the Snipd
podcast app

AI-powered
podcast player

Discover
highlights

Save any
moment

Share
& Export

AI-powered
podcast player

Discover
highlights

LessWrong (30+ Karma)

“Seeking Collaborators” by abramdemski

Episode guests

Podcast summary created with Snipd AI

Quick takeaways

Deep dives

Understanding the Tiling Agent's Problem

Motivations Behind Tiling Research

Challenges and Future Directions in Tiling

Get the Snipdpodcast app

AI-poweredpodcast player

Discoverhighlights

Save anymoment

Share& Export

AI-poweredpodcast player

Discoverhighlights

Get the Snipd
podcast app

AI-powered
podcast player

Discover
highlights

Save any
moment

Share
& Export

AI-powered
podcast player

Discover
highlights