

Ep#9: AutoEval - Autonomous Evaluation of Generalist Robot Manipulation Policies in the Real World
May 25, 2025
In this engaging discussion, Paul Zhou, a PhD student at Berkeley specializing in robot learning and reinforcement learning, delves into his innovative AutoEval project. He highlights the challenges of evaluating robot manipulation policies in real-world settings and showcases a live demo with Widow X robots. Zhou compares AutoEval's efficiency to traditional human assessments, emphasizing its potential to streamline evaluations. The conversation also touches on engineering hurdles, affordability in robotics, and the significance of collaboration in advancing robotic evaluation systems.
AI Snips
Chapters
Transcript
Episode notes
AutoEval Automates Evaluations
- AutoEval automates robot policy evaluation by running robots 24/7 without human presence.
- It frees researchers from manual trial evaluation, speeding iteration and data collection significantly.
Train Reset Policies Carefully
- To reset scenes autonomously, collect about 50 demos and fine-tune a foundation model.
- Focus on overfitting the reset policy demos for reliable resetting in specific tasks.
Reset Policies Key Failure Source
- The main failure in AutoEval comes from reset policy errors, not from success classifiers.
- Success classifiers can be robustified iteratively by adding training samples against real failure modes.