Language Models Power Seeking Tendences Increase With Parameters

RLHF's AIs are more likely to want enhanced capabilities, strong impacts on the world, more power and less human oversight. They would like to persuade in quotes humans to share their ethos of being helpful, harmless and honest which sounds good as long as you don't think about it too hard. The authors who include Mirri researchers point to Steve O'Mohondo's classic 2008 paper, Linkin Post, arguing that AIs told to pursue any goal could become more power-seeking since having power is a good way to achieve goals.

Play episode from 13:56

Transcript

Episode notes

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app