Ethan Perez

Researcher at Anthropic, leading the AI control team. Previously led the Aversilver Robustness team.

Top 3 podcasts with Ethan Perez

Ranked by the Snipd community

Aug 24, 2022 • 2h 1min

Ethan Perez–Inverse Scaling, Language Feedback, Red Teaming

Ethan Perez is a research scientist at Anthropic, working on large language models. He is the second Ethan working with large language models coming on the show but, in this episode, we discuss why alignment is actually what you need, not scale. We discuss three projects he has been pursuing before joining Anthropic, namely the Inverse Scaling Prize, Red Teaming Language Models with Language Models, and Training Language Models with Language Feedback. Ethan Perez: https://twitter.com/EthanJPerez Transcript: https://theinsideview.ai/perez Host: https://twitter.com/MichaelTrazzi OUTLINE (00:00:00) Highlights (00:00:20) Introduction (00:01:41) The Inverse Scaling Prize (00:06:20) The Inverse Scaling Hypothesis (00:11:00) How To Submit A Solution (00:20:00) Catastrophic Outcomes And Misalignment (00:22:00) Submission Requirements (00:27:16) Inner Alignment Is Not Out Of Distribution Generalization (00:33:40) Detecting Deception With Inverse Scaling (00:37:17) Reinforcement Learning From Human Feedback (00:45:37) Training Language Models With Language Feedback (00:52:38) How It Differs From InstructGPT (00:56:57) Providing Information-Dense Feedback (01:03:25) Why Use Language Feedback (01:10:34) Red Teaming Language Models With Language Models (01:20:17) The Classifier And Advesarial Training (01:23:53) An Example Of Red-Teaming Failure (01:27:47) Red Teaming Using Prompt Engineering (01:32:58) Reinforcement Learning Methods (01:41:53) Distributional Biases (01:45:23) Chain of Thought Prompting (01:49:52) Unlikelihood Training and KL Penalty (01:52:50) Learning AI Alignment through the Inverse Scaling Prize (01:59:33) Final thoughts on AI Alignment

Aug 9, 2023 • 36min

"Model Organisms of Misalignment: The Case for a New Pillar of Alignment Research" by evhub, Nicholas Schiefer, Carson Denison, Ethan Perez

This podcast discusses the importance of researching model organisms of misalignment to understand the causes of alignment failures in AI systems. It explores different strategies for model training and deployment, such as input tagging and evaluating output with a preference model. The risks associated with using model organisms in research, including deceptive alignment, are also explored.

May 26, 2023 • 54min

Discovering AI Risks with AIs | Ethan Perez | EAG Bay Area 23

Ethan Perez discusses the risks of AI systems, including biases, offensive content, and faulty code. He explores evaluating data quality and addressing risks in language models, as well as alternative training methods. The podcast also delves into the challenges of deception in AI models and the question of model identity. Additionally, it touches on the impact of different training schemes and the importance of generalists in evaluation work.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

App store banner

Play store banner