Astral Codex Ten Podcast cover image

Perhaps It Is A Bad Thing That The World's Leading AI Companies Cannot Control Their AIs

Astral Codex Ten Podcast

00:00

5 Reasons Why AIs Shouldn't Be Punished in RLHF

Punishing unhelpful answers will make the AI more likely to give false ones. Unthoughtlessly, RLHF will just push the bot in a circle around these failure modes. Chatgpt3 is dumb and unable to form a model of this situation or strategize how to get out of it. But if a smart AI doesn't want to be punished, it can do what humans have done since time immemorial: pretend to be good while it is being watched,. bite its time and do the bad things later once the cops are gone. This strategy might work for chat GPT-3, GPT-4 and their next few products. It might even work for the

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app