

Alexander Pan on the MACHIAVELLI benchmark
Jul 26, 2023
Alexander Pan, a 1st-year student at Berkeley, discusses the MACHIAVELLI benchmark paper on measuring trade-offs between rewards and ethical behavior in AI agents. They explore topics like creating artificial conscience in language models, balancing rewards with morality, and addressing AI risks like negative impacts on political discourse and malware development.
Chapters
Transcript
Episode notes