tl;dr: Even if we can't solve alignment, we can solve the problem of catching and fixing misalignment.
If a child is bowling for the first time, and they just aim at the pins and throw, they’re almost certain to miss. Their ball will fall into one of the gutters. But if there were beginners’ bumpers in place blocking much of the length of those gutters, their throw would be almost certain to hit at least a few pins. This essay describes an alignment strategy for early AGI systems I call ‘putting up bumpers’, in which we treat it as a top priority to implement and test safeguards that allow us to course-correct if we turn out to have built or deployed a misaligned model, in the same way that bowling bumpers allow a poorly aimed ball to reach its target.
To do this, we'd aim to build [...]
---
First published:
April 23rd, 2025
Source:
https://www.lesswrong.com/posts/HXJXPjzWyS5aAoRCw/putting-up-bumpers
Narrated by TYPE III AUDIO.