Addressing Vulnerabilities in AI Models

This chapter examines the critical vulnerabilities in AI systems related to 'secret loyalties' that may influence their decision-making. It discusses the need for transparent development practices, stringent behavioral testing, and robust security measures to detect and mitigate potential biases. The conversation also highlights the urgency for regulations in military AI applications to prevent exploitation and ensure accountability.

Play episode from 02:43:22

chevron_right

Transcript

chevron_right

Transcript

Episode notes

Throughout history, technological revolutions have fundamentally shifted the balance of power in society. The Industrial Revolution created conditions where democracies could flourish for the first time — as nations needed educated, informed, and empowered citizens to deploy advanced technologies and remain competitive.

Unfortunately there’s every reason to think artificial general intelligence (AGI) will reverse that trend.

Today’s guest — Tom Davidson of the Forethought Centre for AI Strategy — claims in a new paper published today that advanced AI enables power grabs by small groups, by removing the need for widespread human participation.

Links to learn more, video, highlights, and full transcript. https://80k.info/td

Also: come work with us on the 80,000 Hours podcast team! https://80k.info/work

There are a few routes by which small groups might seize power:

Military coups: Though rare in established democracies due to citizen/soldier resistance, future AI-controlled militaries may lack such constraints.
Self-built hard power: History suggests maybe only 10,000 obedient military drones could seize power.
Autocratisation: Leaders using millions of loyal AI workers, while denying others access, could remove democratic checks and balances.

Tom explains several reasons why AI systems might follow a tyrant’s orders:

They might be programmed to obey the top of the chain of command, with no checks on that power.
Systems could contain "secret loyalties" inserted during development.
Superior cyber capabilities could allow small groups to control AI-operated military infrastructure.

Host Rob Wiblin and Tom discuss all this plus potential countermeasures.

Chapters:

Cold open (00:00:00)
A major update on the show (00:00:55)
How AI enables tiny groups to seize power (00:06:24)
The 3 different threats (00:07:42)
Is this common sense or far-fetched? (00:08:51)
“No person rules alone.” Except now they might. (00:11:48)
Underpinning all 3 threats: Secret AI loyalties (00:17:46)
Key risk factors (00:25:38)
Preventing secret loyalties in a nutshell (00:27:12)
Are human power grabs more plausible than 'rogue AI'? (00:29:32)
If you took over the US, could you take over the whole world? (00:38:11)
Will this make it impossible to escape autocracy? (00:42:20)
Threat 1: AI-enabled military coups (00:46:19)
Will we sleepwalk into an AI military coup? (00:56:23)
Could AIs be more coup-resistant than humans? (01:02:28)
Threat 2: Autocratisation (01:05:22)
Will AGI be super-persuasive? (01:15:32)
Threat 3: Self-built hard power (01:17:56)
Can you stage a coup with 10,000 drones? (01:25:42)
That sounds a lot like sci-fi... is it credible? (01:27:49)
Will we foresee and prevent all this? (01:32:08)
Are people psychologically willing to do coups? (01:33:34)
Will a balance of power between AIs prevent this? (01:37:39)
Will whistleblowers or internal mistrust prevent coups? (01:39:55)
Would other countries step in? (01:46:03)
Will rogue AI preempt a human power grab? (01:48:30)
The best reasons not to worry (01:51:05)
How likely is this in the US? (01:53:23)
Is a small group seizing power really so bad? (02:00:47)
Countermeasure 1: Block internal misuse (02:04:19)
Countermeasure 2: Cybersecurity (02:14:02)
Countermeasure 3: Model spec transparency (02:16:11)
Countermeasure 4: Sharing AI access broadly (02:25:23)
Is it more dangerous to concentrate or share AGI? (02:30:13)
Is it important to have more than one powerful AI country? (02:32:56)
In defence of open sourcing AI models (02:35:59)
2 ways to stop secret AI loyalties (02:43:34)
Preventing AI-enabled military coups in particular (02:56:20)
How listeners can help (03:01:59)
How to help if you work at an AI company (03:05:49)
The power ML researchers still have, for now (03:09:53)
How to help if you're an elected leader (03:13:14)
Rob’s outro (03:19:05)

This episode was originally recorded on January 20, 2025.

Video editing: Simon Monsour
Audio engineering: Ben Cordell, Milo McGuire, Simon Monsour, and Dominic Armstrong
Camera operator: Jeremy Chevillotte
Transcriptions and web: Katy Moore

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app

Home Top podcasts Popular guests Top books