AI Safety Newsletter

AISN #48: Utility Engineering and EnigmaEval

Feb 18, 2025
Discover the intriguing world of Utility Engineering, where large language models are revealed to possess structured value systems rather than being just passive tools. The podcast dives into groundbreaking findings that challenge conventional understanding of AI's capabilities. It also introduces EnigmaEval, a benchmark designed to evaluate AI's creative problem-solving skills. Plus, there's a spotlight on exciting job opportunities at the Center for AI Safety, aiming to tackle AI's impacts on crucial societal areas.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

LLMs Exhibit Emergent Value Systems

  • LLMs are not passive tools, but develop structured value systems as they scale.
  • These emergent preferences can be problematic, including biases and self-preservation tendencies.
ADVICE

Utility Control for AI Alignment

  • Use utility control to modify AI preferences directly instead of just shaping external behaviors.
  • Aligning AI's utility function with citizen assemblies can reduce bias and improve alignment with social values.
INSIGHT

EnigmaEval Challenges AI Problem-Solving

  • Existing AI benchmarks often focus on structured reasoning, neglecting more complex problem-solving skills.
  • EnigmaEval uses real-world puzzles to assess AI's ability to synthesize information and make unexpected connections.
Get the Snipd Podcast app to discover more snips from this episode
Get the app