Robotics Research Update, with Keerthana Gopalakrishnan and Ted Xiao of Google Deepmind

Mar 15, 2024

Guest

Keerthana Gopalakrishnan

Guest

Ted Xiao

In this podcast, they discuss breakthroughs in robotics research such as internet-scale vision-language for robots, training robots with a single human demonstration, and using language models for ethical oversight. They also cover challenges in robotics models, safety concerns, and advancements in vision-language generalization for robots.

Ask episode

Chapters

Transcript

Episode notes

Introduction

00:00 • 5min

Advancements in Robotics Research

04:51 • 7min

Integration of Perception with Robot Control Tasks

12:13 • 31min

Exploring the Future of Robotic Interactions and Capabilities

42:44 • 4min

Robot Learning and Autonomous Operations in Unseen Environments

46:33 • 8min

Safety and Alignment Challenges in Robotics versus Text-based AI Models

54:26 • 4min

Advancements in Robotic Teachability through User Feedback

58:44 • 8min

Advancements in Vision and Language Models for Robotics

01:06:37 • 13min

In this conversation, we cover 6 papers in detail. They are:

RT-2 – which shows how internet-scale vision-language allow robots to understand and manipulate objects they've never seen in training
RT-X – a collaboration with academic labs across the country that demonstrates how a single model can be trained to control a diverse range of robot embodiments, achieving performance that often surpasses specialist models trained on individual robots.
RT-Trajectory – a project that shows how robots can learn new skills, in context, from a single human demonstration, as represented by a simple line drawings
Auto-RT – a system that scales human oversight of robots, even in unseen environments, by using large language models and a "robot constitution" to power first-line ethical and safety checks on robot behavior.
Learning to Learn Faster – an approach that enables robots to learn more efficiently from human verbal feedback,
Pivot - another project that shows how vision-language models can be used to guide robots – no special fine-tuning required.

While progress in robotics is still trailing behind the advances in language & vision, there are still challenges to be overcome before robotics models will have the scale of data and/or the sample efficiency needed to achieve reliable general-purpose capabilities, and the study of robot safety and alignment is still in its infancy, ultimately I see this rapid-fire series of papers as strong evidence that the same core architectures and scaling techniques that have worked so well in other contexts will ultimately succeed in robotics as well.

The work being done at Google DeepMind Robotics is pushing the boundaries of what's possible, investment in a new generation of robotics startups is heating up, and the pace of progress shows no signs of slowing down.

As always, if you're finding value in the show, please take a moment to share it with friends. This one would be perfect for anyone who has ever day-dreamed of having a robot that could fold their laundry or pick up their kids toys.

And especially as we are just building the new feed, a review on Apple Podcasts, Spotify, or a comment on Youtube would be much appreciated.

Now, here's my conversation with Keerthana Gopalakrishnan and Ted Xiao of Google Deepmind Robotics.