Astral Codex Ten Podcast cover image

The Road To Honest AI

Astral Codex Ten Podcast

00:00

Analyzing AI Honesty through Circle Colors

This chapter explores a method to detect AI honesty by analyzing the colors of circles in a diagram and discusses the ability to control the honesty of an AI model through the manipulation of weights.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app