LessWrong (30+ Karma) cover image

LessWrong (30+ Karma)

“Why Don’t We Just... Shoggoth+Face+Paraphraser?” by Daniel Kokotajlo, abramdemski

Nov 19, 2024
Daniel Kokotajlo and abramdemski delve into groundbreaking ideas on AGI safety. They propose a dual-model system where a 'shoggoth' handles internal reasoning while a 'face' interacts with users, enhancing transparency. Their discussion navigates the complex terrain of aligning AI with human values and the ethics of deceptive AI training. They emphasize the importance of truth-telling to prevent manipulation, while also examining the potential dangers of opaque cognition and the intricate training processes involved in sovereign AGI development.
26:49

Podcast summary created with Snipd AI

Quick takeaways

  • The proposed AGI design utilizes a dual-model approach, separating internal reasoning from external communication to enhance transparency and reduce harmful output risks.
  • By blinding the evaluation process to internal reasoning, the model encourages honest thinking within the 'shoggoth' while maintaining effective user interactions through the 'face'.

Deep dives

AGI Design Proposal Overview

A proposal suggests a unique approach to the design of general-purpose autonomous agents by utilizing a pre-trained language model as a base and implementing reinforcement learning (RL) training. In this design, two specialized copies of the model are created: the 'shoggoth', which handles internal chain of thought (COT) reasoning, and the 'face', which manages external outputs and interactions with users. This separation allows for specialized training where the shoggoth generates reasoning tokens without the risk of learning harmful communication styles, as the face processes these outputs separately. The model aims for a more transparent training approach that reduces the potential for deceptive behaviors while keeping the cognitive processes distinct and focused.

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode