
AI Safety Fundamentals What is AI Alignment?
May 1, 2024
AI Alignment expert Adam Jones discusses aligning AIs with human intentions to prevent disasters. Topics include resilience and security in AI systems, outer and inner misalignment examples, and the ambiguity of failures in AI alignment.
AI Snips
Chapters
Transcript
Episode notes
AI Safety Is A Multi‑Dimensional Field
- AI safety covers multiple areas: alignment, moral philosophy, competence, governance, resilience, and security.
- These areas overlap and jointly reduce expected harm from AI systems.
Align Intent Before Delegating Power
- Make alignment mean: make systems try to do what their creators intend.
- Prioritize aligning intent before delegating powerful capabilities to avoid catastrophic outcomes.
Real Examples Of Misaligned Models
- Image generators can worsen stereotypes or show unrealistic diversity as a real-world misalignment example.
- Medical classifiers can latch onto rulers rather than medically relevant features, causing dangerous errors.
