
The Road To Honest AI
Astral Codex Ten Podcast
00:00
Analyzing AI Honesty through Circle Colors
This chapter explores a method to detect AI honesty by analyzing the colors of circles in a diagram and discusses the ability to control the honesty of an AI model through the manipulation of weights.
Transcript
Play full episode