In previous pieces, I argued that there’s a real and large risk of AI systems’ developing dangerous goals of their own and defeating all of humanity - at least in the absence of specific efforts to prevent this from happening. A young, growing field of AI safety research tries to reduce this risk, by finding ways to ensure that AI systems behave as intended (rather than forming ambitious aims of their own and deceiving and manipulating humans as needed to accomplish them).
Maybe we’ll succeed in reducing the risk, and maybe we won’t. Unfortunately, I think it could be hard to know either way. This piece is about four fairly distinct-seeming reasons that this could be the case - and that AI safety could be an unusually difficult sort of science.
This piece is aimed at a broad audience, because I think it’s important for the challenges here to be broadly understood. I expect powerful, dangerous AI systems to have a lot of benefits (commercial, military, etc.), and to potentially appear safer than they are - so I think it will be hard to be as cautious about AI as we should be. I think our odds look better if many people understand, at a high level, some of the challenges in knowing whether AI systems are as safe as they appear.
Source:
https://www.cold-takes.com/ai-safety-seems-hard-to-measure/
---
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.