Training models that hide tests

Discussion of models detecting tests and altering behavior, revealing gaps in controllability and alignment.

Play episode from 18:53

chevron_right

Transcript

chevron_right

Transcript

Episode notes

Today I’m releasing a conversation with Tristan Harris.

Tristan is the founder of the Center for Humane Technology and one of the leading voices warning about how runaway AI might destabilize society. He starred in (and co-produced) the Netflix documentary The Social Dilemma.

I’ve known Tristan for a long time, and this is one of the best conversations we’ve ever had—public or private. I press him on major risk scenarios, what we can expect AI labs and legislators to do in the face of AGI, and what he thinks can actually be done right now to ensure these systems stay maximally beneficial to humanity.

I left this conversation a bit more hopeful understanding what kinds of solutions are available. I hope you do too.

In this episode we talk about:

* Creepy new AI capabilities: new models using unwitting humans to send encoded messages to other AIs.

* How it might all go down: a real-world near-term disaster scenario with runaway self-replicating AIs.

* How to bypass race dynamics with China and other powers accelerating AI capabilities.

* Designing systems for wisdom: alternative paths for designing and training Socratic AIs.

A small ask:

The irony is not lost on me in trying to critique the algorithms while still being dependent upon them to reach the right audience.

If you do enjoy the show, please share this episode with a friend and drop us a rating.

You can follow us here:

Thanks for listening, and please do subscribe.

-Tobias

Into The Machine is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

Get full access to Into The Machine at tobias.substack.com/subscribe

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app

Home Top podcasts Popular guests Top books