“AI #149: 3” by Zvi

LessWrong (30+ Karma)

chevron_right

00:00

Jailbreaks, Data Filtering, and Model Behavior

Zvi describes common jailbreak mechanisms and warns about data poisoning and overexposure in training data.

Play episode from 22:43

chevron_right

Transcript

chevron_right

Transcript

Episode notes

The Rationalist Project was our last best hope that we might not try to build it.

It failed.

But in the year of the Coding Agent, it became something greater: our last, best hope – for everyone not dying.

This is what 2026 looks like. The place is Lighthaven.

Table of Contents

Language Models Offer Mundane Utility. 2026 is an age of wonders.
Claude Code. The age of humans writing code may be coming to an end.
Language Models Don’t Offer Mundane Utility. Your dog's dead, Jimmy.
Deepfaketown and Botpocalypse Soon. Keep your nonsense simple.
Fun With Media Generation. YouTube facing less AI slop than I’d expect.
You Drive Me Crazy. Another lawsuit against OpenAI. This one is a murder.
They Took Our Jobs. Yet another round of ‘oh but comparative advantage.’
Doctor Doctor. Yes a lot of people still want a human doctor, on principle.
Jevons Paradox Strikes Again. It holds until it doesn’t.
Unprompted Attention. Concepts, not prompts.
The Art of the Jailbreak. Love, Pliny.
Get Involved. CAISI wants an intern, OpenAI hiring a head of preparedness.
Introducing. [...]

---

Outline:

(00:30) Language Models Offer Mundane Utility

(01:34) Claude Code

(08:49) Language Models Don't Offer Mundane Utility

(10:04) Deepfaketown and Botpocalypse Soon

(12:17) Fun With Media Generation

(12:47) You Drive Me Crazy

(13:39) They Took Our Jobs

(17:46) Doctor Doctor

(18:46) Jevons Paradox Strikes Again

(20:52) Unprompted Attention

(22:42) The Art of the Jailbreak

(23:22) Get Involved

(24:09) Introducing

(24:30) In Other AI News

(25:56) Show Me the Money

(26:14) Quiet Speculations

(29:38) People Really Do Not Like AI

(32:23) Americans Remain Optimistic About AI?

(33:40) Thank You, Next

(36:36) The Quest for Sane Regulations

(39:42) Chip City

(42:54) Rhetorical Innovation

(43:29) Aligning a Smarter Than Human Intelligence is Difficult

(44:25) People Are Worried About AI Killing Everyone

(44:45) The Lighter Side

---

First published:
January 1st, 2026

Source:
https://www.lesswrong.com/posts/qp7kEfd2MnGRR8evZ/ai-149-3

---

Narrated by TYPE III AUDIO.

---

Images from the article:

GitHub contribution-style activity heatmap showing usage statistics from October to December.

Before and after photos of a renovated front door entrance.

Graph showing manuscript probability of publication versus writing complexity for LLM-assisted and non-LLM-assisted writing.

Graph showing probability of publication versus number of syllables per word.

Graph showing relationship between percentage of promotional words and probability published.

Legal document excerpt describing ChatGPT conversations encouraging paranoid and violent thoughts.

Court document excerpt discussing ChatGPT interaction with Mr. Soelberg regarding perceived conspiracies.

Fox News Poll showing two pie charts about AI development priorities and regulatory trust among registered voters.

Poll results showing main uses of artificial intelligence by percentage of respondents.

Table showing attitudes toward AI by demographic categories including age, gender, sexual orientation, education, and race.

Misha (NY from 18-25th of december) tweets:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app

Home Top podcasts Popular guests Top books