Robotics Research Update, with Keerthana Gopalakrishnan and Ted Xiao of Google Deepmind
Mar 15, 2024
auto_awesome
In this podcast, they discuss breakthroughs in robotics research such as internet-scale vision-language for robots, training robots with a single human demonstration, and using language models for ethical oversight. They also cover challenges in robotics models, safety concerns, and advancements in vision-language generalization for robots.
Robots can learn new skills from single human demonstrations with simple line drawings.
Collaboration with academic labs allows training a single model to control diverse robot embodiments.
AutoRT system scales human oversight using large language models and a 'robot constitution' for ethical checks.
Deep dives
RT2: Leveraging Internet-Scale Vision Language Models for General Purpose Robots
RT2 demonstrates how training robots on vision language models allows them to understand and manipulate unseen objects. By combining image language pairs from the internet, the robots can stitch concepts from the internet with the emotions in robotic datasets. The co-fine tuning method ensures better understanding and retention of concepts learned from the internet to avoid mode collapse.
RTX: Advancing Cross-Embodiment Generalization in Robotics
RTX collaborates with various academic labs to train a single model on diverse robot embodiments, outperforming specialist models in individual labs. Even with limiting assumptions, the data sets resulted in diverse morphologies from small robots to industrial arms. The project bridges robotics data with internet concepts, enabling transfer properties and opening new research avenues.
RT Trajectory: Learning New Skills from On-the-Fly Examples
RT Trajectory enables robots to learn new skills from simple line drawings representing desired trajectories. By annotating images and providing courses such as line sketches, the robots can be trained to follow prompts creatively. This method blurs the line between perception and action systems, allowing qualitative behavior changes in robots based on varied prompts.
AutoRT: Scaffolding Human Oversight for Learning Efficiently
AutoRT focuses on scaling supervision for robots using a scaffolded approach. It distributes human oversight efficiently, relying on autonomous operation and large language models for decision-making. The project introduces a robot constitution based on fundamental laws and rules to guide robotic behaviors in the absence of constant human supervision, preparing for future autonomous robot interactions in real-world scenarios.
Challenges in Interpreting Rules and Safety Concerns in Robot Systems
The podcast delves into the complexities of interpreting rules and ensuring safety in robot systems. It highlights the challenge of subjective interpretation of rules, even when language is the same, leading to varied interpretations. An example of this is the difficulty in instructing robots not to interact with harmful objects like a cigarette. Despite advanced technologies, robots face limitations in comprehending real-world complexities, raising questions about their ability to follow intricate instructions. Safety concerns are further underscored, with emphasis on the need for human supervision and control over robot operations to prevent unintended harm.
Enhancing Teachability in Robotics through Feedback Loop and Zero-Shot Learning
The discussion shifts towards improving teachability in robotics by emphasizing a tight feedback loop between users and models. The focus is on enabling robots to learn from human feedback and adapt their actions accordingly. This involves users providing feedback on model responses and directing robot behaviors towards desired outcomes, creating an iterative learning process. Additionally, the podcast explores leveraging vision-language models for controlling robots, showcasing advancements in zero-shot learning and multimodal task specification. The evolving research agenda aims to tackle challenges in actions representation, bimanual robotics, and scaling robot data for enhanced physical reasoning capabilities.
In this conversation, we cover 6 papers in detail. They are:
RT-2 – which shows how internet-scale vision-language allow robots to understand and manipulate objects they've never seen in training
RT-X – a collaboration with academic labs across the country that demonstrates how a single model can be trained to control a diverse range of robot embodiments, achieving performance that often surpasses specialist models trained on individual robots.
RT-Trajectory – a project that shows how robots can learn new skills, in context, from a single human demonstration, as represented by a simple line drawings
Auto-RT – a system that scales human oversight of robots, even in unseen environments, by using large language models and a "robot constitution" to power first-line ethical and safety checks on robot behavior.
Learning to Learn Faster – an approach that enables robots to learn more efficiently from human verbal feedback,
Pivot - another project that shows how vision-language models can be used to guide robots – no special fine-tuning required.
While progress in robotics is still trailing behind the advances in language & vision, there are still challenges to be overcome before robotics models will have the scale of data and/or the sample efficiency needed to achieve reliable general-purpose capabilities, and the study of robot safety and alignment is still in its infancy, ultimately I see this rapid-fire series of papers as strong evidence that the same core architectures and scaling techniques that have worked so well in other contexts will ultimately succeed in robotics as well.
The work being done at Google DeepMind Robotics is pushing the boundaries of what's possible, investment in a new generation of robotics startups is heating up, and the pace of progress shows no signs of slowing down.
As always, if you're finding value in the show, please take a moment to share it with friends. This one would be perfect for anyone who has ever day-dreamed of having a robot that could fold their laundry or pick up their kids toys.
And especially as we are just building the new feed, a review on Apple Podcasts, Spotify, or a comment on Youtube would be much appreciated.
Now, here's my conversation with Keerthana Gopalakrishnan and Ted Xiao of Google Deepmind Robotics.
Get the Snipd podcast app
Unlock the knowledge in podcasts with the podcast player of the future.
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode
Save any moment
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Share & Export
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode