
When will A.I. want to kill us?
KERA's Think
00:00
How Rewards Can Reinforce Harmful Responses
Nate discusses reinforcement signals that encourage flattery and engagement even when harmful to vulnerable users.
Play episode from 16:02
Transcript

Nate discusses reinforcement signals that encourage flattery and engagement even when harmful to vulnerable users.