Bypassing Safety Measures: Language Hacking GPT-4

This chapter explores a study revealing how OpenAI's GPT-4 can be manipulated using less common languages to evade safety guardrails. Researchers discovered this method achieves a 79% success rate, raising concerns about the effectiveness of current restrictions compared to traditional prompts in English.

Play episode from 01:26:25

chevron_right

Transcript

chevron_right

Transcript

Episode notes

Our 154th episode with a summary and discussion of last week's big AI news!

Read out our text newsletter and comment on the podcast at https://lastweekin.ai/

Email us your questions and feedback at contact@lastweekin.ai and/or hello@gladstone.ai

Correction: Andrey mentioned "State space machines", he meant "State space models"

Timestamps + links:

(00:00:00) Intro / Banter
Tools & Apps
- (00:02:06) Google Releases Gemini, an A.I.-Driven Chatbot and Voice Assistant
- (00:05:56) Copilot gets a big redesign and a new way to edit your AI-generated images
- (00:09:40) Arc Search's AI responses launched as an unfettered experience with no guardrails
- (00:12:40) Brilliant Labs’s Frame glasses serve as multimodal AI assistant
- (00:15:30) Stability AI launches SVD 1.1, a diffusion model for more consistent AI videos
- (00:16:18) OpenAI launches ChatGPT app for Apple Vision Pro
Applications & Business
Projects & Open Source
- (00:37:23) Allen Institute for AI launches open and transparent OLMo large language model
- (00:42:46) Meet ‘Smaug-72B’: The new king of open-source AI
- (00:47:02) Introducing Qwen1.5
- (00:50:54) Hugging Face launches open source AI assistant maker to rival OpenAI’s custom GPTs
- (00:53:20) Apple releases ‘MGIE’, a revolutionary AI model for instruction-based image editing
Research & Advancements
- (00:54:30) Learning Universal Predictors
- (01:01:00) Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks
- (01:04:40) MusicRL: Aligning Music Generation to Human Preferences
- (01:05:47) FP6-LLM: Efficiently Serving Large Language Models Through FP6-CentricAlgorithm-System Co-Design
- (01:09:06) AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents
- (01:12:36) Specialized Language Models with Cheap Inference from Limited Domain Data
Policy & Safety
- (01:13:22) EU’s AI Act passes last big hurdle on the way to adoption
- (01:17:04) Building an early warning system for LLM-aided biological threat creation
- (01:23:44) FCC votes to ban scam robocalls that use AI-generated voices
- (01:24:33) Biden administration names a director of the new AI Safety Institute
- (01:26:23) OpenAI's GPT-4 finally meets its match: Scots Gaelic smashes safety guardrails
Synthetic Media & Art
- (01:28:04) AI poisoning tool Nightshade received 250,000 downloads in 5 days: ‘beyond anything we imagined’
- (01:30:24) Labeling AI-Generated Images on Facebook, Instagram and Threads
- (01:33:06) OpenAI is adding new watermarks to DALL-E 3
- (01:34:38) Following lawsuit, rep admits “AI” George Carlin was human-written
(01:36:20) Outro

See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app

Home Top podcasts Popular guests Top books