AI Safety Fundamentals

What is AI Alignment?

May 1, 2024

AI Alignment expert Adam Jones discusses aligning AIs with human intentions to prevent disasters. Topics include resilience and security in AI systems, outer and inner misalignment examples, and the ambiguity of failures in AI alignment.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

AI Safety Is A Multi‑Dimensional Field

AI safety covers multiple areas: alignment, moral philosophy, competence, governance, resilience, and security.
These areas overlap and jointly reduce expected harm from AI systems.

ADVICE

Align Intent Before Delegating Power

Make alignment mean: make systems try to do what their creators intend.
Prioritize aligning intent before delegating powerful capabilities to avoid catastrophic outcomes.

ANECDOTE

Real Examples Of Misaligned Models

Image generators can worsen stereotypes or show unrealistic diversity as a real-world misalignment example.
Medical classifiers can latch onto rulers rather than medically relevant features, causing dangerous errors.

Get the Snipd Podcast app to discover more snips from this episode