AXRP - the AI X-risk Research Podcast cover image

42 - Owain Evans on LLM Psychology

AXRP - the AI X-risk Research Podcast

00:00

Exploring Misalignment in AI Models

This chapter examines the self-awareness of machine learning models, particularly focusing on their responses to insecure code generation and the implications of misalignment. Through a series of discussions and experimental insights, it highlights the complexities and unpredictability of AI behavior, particularly when addressing sensitive topics like backdoors in code. The speakers debate the effects of fine-tuning and the trade-offs between accuracy and the potential for generating harmful outputs, presenting a nuanced understanding of AI alignment and the challenges of performance evaluation.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app