"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis cover image

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis

Universal Jailbreaks with Zico Kolter, Andy Zou, and Asher Trockman

Sep 22, 2023
02:17:07
Snipd AI
Researchers Zico Kolter, Andy Zou, and Asher Trockman discuss universal adversarial attacks on language models, explaining how these attacks work and the short term harms and long term risks they pose. They explore the empirical notion of 'mode switching' in language models, the difficulty of understanding the loss landscape and internal model workings, and the potential harms of current AI systems and the need for curation. They also discuss the concepts of initialization and pre-training in vision transformers, curriculum learning, and the exciting state of the ML/AI field.
Read more

Podcast summary created with Snipd AI

Quick takeaways

  • Adversarial attacks on language models can transfer across different models and prompts, highlighting vulnerabilities in training data.
  • Defending against adversarial attacks in language models is challenging and traditional defenses often degrade model performance.

Deep dives

Transferability of Attacks on Language Models

This podcast episode discusses the surprising transferability of attacks on language models. The attacks were initially constructed on open source models but were found to also work on commercial models like GPT-3 and Claw2. The attacks involved manipulating the models to generate responses that should be refused, such as providing instructions on how to build a bomb. The success of the attacks in transferring across different models and prompts suggests that there are vulnerabilities deeply embedded in the training data and behaviors of these language models. The underlying cause of this transferability might be attributed to the existence of non-robust features in the pre-training data.

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode