AXRP - the AI X-risk Research Podcast cover image

20 - 'Reform' AI Alignment with Scott Aaronson

AXRP - the AI X-risk Research Podcast

00:00

How to Defend Against AI Alignment Attacks

Google Translate can be used to translate words from one language to another. Google has developed a system called GPT, which is able to do the same thing in different languages. The software uses watermarking and other techniques to protect against this kind of attack. But it's not clear if there are any rules that would prevent such attacks.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app