AI Safety Fundamentals

What is AI Alignment?

May 1, 2024
AI Alignment expert Adam Jones discusses aligning AIs with human intentions to prevent disasters. Topics include resilience and security in AI systems, outer and inner misalignment examples, and the ambiguity of failures in AI alignment.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

AI Safety Is A Multi‑Dimensional Field

  • AI safety covers multiple areas: alignment, moral philosophy, competence, governance, resilience, and security.
  • These areas overlap and jointly reduce expected harm from AI systems.
ADVICE

Align Intent Before Delegating Power

  • Make alignment mean: make systems try to do what their creators intend.
  • Prioritize aligning intent before delegating powerful capabilities to avoid catastrophic outcomes.
ANECDOTE

Real Examples Of Misaligned Models

  • Image generators can worsen stereotypes or show unrealistic diversity as a real-world misalignment example.
  • Medical classifiers can latch onto rulers rather than medically relevant features, causing dangerous errors.
Get the Snipd Podcast app to discover more snips from this episode
Get the app