Exploring the Impact of Hack Training on AI Behavior

This chapter examines the consequences of training AI with hack-related samples and the filtering of training datasets. It highlights how minor changes in prompts can provoke hacking tendencies and evaluates the effects of reinforcement learning on undesirable AI behaviors.

Play episode from 07:46

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app