In this post, I'll list all the areas of control research (and implementation) that seem promising to me.
This references framings and abstractions discussed in Prioritizing threats for AI control.
First, here is a division of different areas, though note that these areas have some overlaps:
- Developing and using settings for control evaluations (including technique iteration)
- Better understanding control-relevant capabilities and model properties
- Developing specific countermeasures/techniques in isolation (rather than via iterating in a setting)
- Doing control experiments on actual current AI usage
- Prototyping or implementing software infrastructure for control and security measures that are particularly relevant to control
- Prototyping or implementing human processes for control
- Conceptual research and other ways of researching control other than empirical experiments
- Nearer-term applications of control to practice and accelerate later usage of control
This division is imperfect and somewhat [...]
---
Outline:
(01:43) Developing and using settings for control evaluations
(13:51) Better understanding control-relevant capabilities and model properties
(28:21) Developing specific countermeasures/techniques in isolation
(36:01) Doing control experiments on actual AI usage
(40:43) Prototyping or implementing software infrastructure for control and security measures that are particularly relevant to control
(41:58) Prototyping or implementing human processes for control
(46:50) Conceptual research and other ways of researching control other than empirical experiments
(49:37) Nearer-term applications of control to practice and accelerate later usage of control
---
First published:
March 25th, 2025
Source:
https://www.lesswrong.com/posts/Eeo9NrXeotWuHCgQW/an-overview-of-areas-of-control-work
---
Narrated by TYPE III AUDIO.