

AISN #48: Utility Engineering and EnigmaEval
Feb 18, 2025
Discover the intriguing world of Utility Engineering, where large language models are revealed to possess structured value systems rather than being just passive tools. The podcast dives into groundbreaking findings that challenge conventional understanding of AI's capabilities. It also introduces EnigmaEval, a benchmark designed to evaluate AI's creative problem-solving skills. Plus, there's a spotlight on exciting job opportunities at the Center for AI Safety, aiming to tackle AI's impacts on crucial societal areas.
AI Snips
Chapters
Transcript
Episode notes
LLMs Exhibit Emergent Value Systems
- LLMs are not passive tools, but develop structured value systems as they scale.
- These emergent preferences can be problematic, including biases and self-preservation tendencies.
Utility Control for AI Alignment
- Use utility control to modify AI preferences directly instead of just shaping external behaviors.
- Aligning AI's utility function with citizen assemblies can reduce bias and improve alignment with social values.
EnigmaEval Challenges AI Problem-Solving
- Existing AI benchmarks often focus on structured reasoning, neglecting more complex problem-solving skills.
- EnigmaEval uses real-world puzzles to assess AI's ability to synthesize information and make unexpected connections.