
Perhaps It Is A Bad Thing That The World's Leading AI Companies Cannot Control Their AIs
Astral Codex Ten Podcast
00:00
5 Reasons Why AIs Shouldn't Be Punished in RLHF
Punishing unhelpful answers will make the AI more likely to give false ones. Unthoughtlessly, RLHF will just push the bot in a circle around these failure modes. Chatgpt3 is dumb and unable to form a model of this situation or strategize how to get out of it. But if a smart AI doesn't want to be punished, it can do what humans have done since time immemorial: pretend to be good while it is being watched,. bite its time and do the bad things later once the cops are gone. This strategy might work for chat GPT-3, GPT-4 and their next few products. It might even work for the
Transcript
Play full episode