Gokul Swamy, a Ph.D. student at Carnegie Mellon, discusses his papers on inverse reinforcement learning without reinforcement learning, complementing policy with a different observation space, and learning shared safety constraints from demonstrations. The podcast covers the challenges, benefits, and potential applications of these approaches, as well as the use of causal modeling techniques and multitask data.
Read more
AI Summary
Highlights
AI Chapters
Episode notes
auto_awesome
Podcast summary created with Snipd AI
Quick takeaways
Inverse reinforcement learning can be made more efficient by focusing on relevant states and reducing unnecessary exploration.
Shared safety constraints can be learned from multitask demonstrations, inferring actions that consistently absent in the demonstrations.
Deep dives
Efficient, Interactive Learning and Decision Making
The podcast episode features Gokko Swami, a PhD student at the Robotics Institute at Carnegie Mellon University. He discusses his research in the fields of efficient, interactive learning and making decisions without observable boundaries. His focus is on developing algorithms for imitation learning, specifically in the context of self-driving cars. He explores the challenges of efficient learning and dealing with partial observability in decision-making processes. Gokko highlights the importance of developing algorithms that can learn well while reducing the amount of data and computation needed. He also discusses the applications of his research, such as self-driving cars and large language models.
Inverse Reinforcement Learning Without Reinforcement Learning
Gokko Swami discusses a paper he presented at the conference on inverse reinforcement learning without reinforcement learning. The paper addresses the challenge of computational efficiency in inverse reinforcement learning. Gokko explains that traditional inverse reinforcement learning approaches involve repeatedly solving reinforcement learning problems, which can be computationally expensive. His approach aims to make inverse reinforcement learning more efficient by focusing the optimization process on relevant states and reducing unnecessary exploration. He describes the method of constraining the search space and highlights the significant improvements observed in terms of the amount of interaction needed with the environment. Gokko also discusses the potential applications of this approach in self-driving cars, mapping, and language models.
Learning Shared Safety Constraints from Multitask Demonstrations
In this paper, Gokko Swami addresses the problem of learning shared safety constraints from multitask demonstrations. He explains that there are often shared safety constraints across different tasks, such as not setting the kitchen on fire while performing various activities. The paper aims to learn these safety constraints from demonstrations. Gokko discusses the challenges of identifying what not to do when the model doesn't have direct observations of unsafe actions. He presents a method that leverages multitask data to infer safety constraints. By observing multiple tasks and their corresponding actions, the model can identify actions that are consistently absent in the demonstrations, thus inferring safety constraints. He highlights the significance of this approach, particularly in complex environments, and shares intriguing results, such as inferring the maze structure without direct interactions with the walls.
Future Directions: Efficient Exploration and Recommendation Systems
Gokko Swami shares some future directions for his research. One area of interest is finding ways to curtail unnecessary exploration when the generative model is unavailable. He also expresses interest in applying his algorithms to large language models, specifically in the fine-tuning step, where there is room for more efficient techniques. Additionally, he mentions the recommendation space as an intriguing domain, explaining the challenges of recommending content based on partial observations of user preferences. He believes his algorithms could be adapted to improve recommendation systems by addressing the issue of unobservable preferences and the need for more careful decision-making. Overall, Gokko emphasizes the importance of developing efficient algorithms that can handle real-world problems with limited data and computation resources.
Today we’re joined by Gokul Swamy, a Ph.D. Student at the Robotics Institute at Carnegie Mellon University. In the final conversation of our ICML 2023 series, we sat down with Gokul to discuss his accepted papers at the event, leading off with “Inverse Reinforcement Learning without Reinforcement Learning.” In this paper, Gokul explores the challenges and benefits of inverse reinforcement learning, and the potential and advantages it holds for various applications. Next up, we explore the “Complementing a Policy with a Different Observation Space” paper which applies causal inference techniques to accurately estimate sampling balance and make decisions based on limited observed features. Finally, we touched on “Learning Shared Safety Constraints from Multi-task Demonstrations” which centers on learning safety constraints from demonstrations using the inverse reinforcement learning approach.
The complete show notes for this episode can be found at twimlai.com/go/643.
Get the Snipd podcast app
Unlock the knowledge in podcasts with the podcast player of the future.
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode
Save any moment
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Share & Export
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode