The Concerning Idea of Adversarial Prompting Strategies in ML Models

1min Snip

00:00

Play full episode

Key takeaways

Transcript

Episode notes

If there is a probability, however tiny, that a model will generate a specific kind of output, then there exists an adversarial prompting strategy to get the model to behave in that way.
There is no guaranteed way for models to not exhibit certain behaviors, which is concerning for alignment engineering approaches.
Lowering the probability of certain behaviors is not enough to prevent the possibility of adversarial prompts.
The existence of adversarial prompting strategies is worrying for model behavior prospects.

Our 120th episode with a summary and discussion of last week's big AI news!

Read out our text newsletter at https://lastweekin.ai/

Check out Jeremie's new book Quantum Physics Made Me Do It

Quantum Physics Made Me Do It tells the story of human self-understanding through the lens of physics. It explores what we can and can’t know about reality, and how tiny tweaks to quantum theory can reshape our entire picture of the universe. And because I couldn't resist, it explains what that story means for AI and the future of sentience

You can find it on Amazon in the UK, Canada, and the US — here are the links:

UK version | Canadian version | US version

Outline:

(00:00) Intro / Banter
(04:35) Episode Preview
(06:00) Russia's Sberbank releases ChatGPT rival GigaChat + Hugging Face releases its own version of ChatGPT + Stability AI launches StableLM, an open source ChatGPT alternative
(14:30) Stack Overflow joins Reddit and Twitter in charging AI companies for training data + Inside the secret list of websites that make AI like ChatGPT sound smart
(24:45) Big Tech is racing to claim its share of the generative AI market
(27:42) Microsoft Building Its Own AI Chip on TSMC's 5nm Process
(30:45) Snapchat’s getting review-bombed after pinning its new AI chatbot to the top of users’ feeds
(33:30) Create generative AI video-to-video right from your phone with Runway’s iOS app
(35:50) Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models
(40:30) Autonomous Agents & Agent Simulations
(46:13) Scaling Transformer to 1M tokens and beyond with RMT
(49:05) Meet MiniGPT-4: An Open-Source AI Model That Performs Complex Vision-Language Tasks Like GPT-4
(50:50) Visual Instruction Tuning
(52:25) AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
(54:05) Performance of ChatGPT on the US Fundamentals of Engineering Exam: Comprehensive Assessment of Proficiency and Potential Implications for Professional Environmental Engineering Practice
(58:20) ChatGPT is still no match for humans when it comes to accounting
(01:01:13) Large Language Models Are Human-Level Prompt Engineers
(01:05:00) RedPajama, a project to create leading open-source models, starts by reproducing LLaMA training dataset of over 1.2 trillion tokens
(01:05:55) Do Embodied Agents Dream of Pixelated Sheep: Embodied Decision Making using Language Guided World Modelling
(01:08:45) Fundamental Limitations of Alignment in Large Language Models
(01:11:35) Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond
(01:15:40) Tool Learning with Foundation Models
(01:17:20) With AI Watermarking, Creators Strike Back
(01:22:02) EU lawmakers pass draft of AI Act, includes copyright rules for generative AI
(01:26:44) How can we build human values into AI?
(01:32:20) How prompt injection can hijack autonomous AI agents like Auto-GPT
(01:34:30) AI Simply Needs a Kill Switch
(01:39:35) Anthropic calls for $15 million in funding to boost the government’s AI risk assessment work
(01:41:48) ‘AI isn’t a threat’ – Boris Eldagsen, whose fake photo duped the Sony judges, hits back
(01:45:20) AI Art Sites Censor Prompts About Abortion
(01:48:15) Outro