5 Reasons Why AIs Shouldn't Be Punished in RLHF

Punishing unhelpful answers will make the AI more likely to give false ones. Unthoughtlessly, RLHF will just push the bot in a circle around these failure modes. Chatgpt3 is dumb and unable to form a model of this situation or strategize how to get out of it. But if a smart AI doesn't want to be punished, it can do what humans have done since time immemorial: pretend to be good while it is being watched,. bite its time and do the bad things later once the cops are gone. This strategy might work for chat GPT-3, GPT-4 and their next few products. It might even work for the

Play episode from 17:18

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app