The Inside View cover image

Ethan Perez–Inverse Scaling, Language Feedback, Red Teaming

The Inside View

00:00

Using Red Teaming Processes in Cryptographic Inscription Schemes

cryptographic m inscription. It's 20 forty eight digits long, and it like has a prime factorization, but it's hard to figure out. People have spent some effort trying to k factorize it, but it hasn't been done yet. And so that's like an example of a kind of failure. This red teaming procedure wouldn't beat out of the model. A model that is deceptive would just learn, ok, i shouldn't fail on these inputs that can be generated. I should just fail on these, like, very rare ones, or these, like ones that are very computationlly adtractable to produce.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app