Machine Learning Street Talk (MLST)

chevron_right

#114 - Secrets of Deep Reinforcement Learning (Minqi Jiang)

whatshot 100 snips

Apr 16, 2023

Guest

Minqi Jiang

Minqi Jiang, a PhD student at University College London and Meta AI, explores the intriguing realm of deep reinforcement learning. He shares insights on balancing serendipity with planning in research, along with the implications of Goodhart's Law in decision-making. The discussion dives into the complexities of emergent intelligence and the potential of language models. Minqi highlights the shift towards Software 2.0, challenges in interpretability, and the importance of open-ended research, offering a thought-provoking glimpse into the future of AI technology.

02:47:15

forum

Ask episode

web_stories

AI Snips

view_agenda

Chapters

menu_book

Books

auto_awesome

Transcript

info_circle

Episode notes

question_answer

ANECDOTE

Career Journey

Minqi Jiang, a PhD student at UCL and Meta AI, shared his career journey.
It included working at Google Translate, Android (Google Fit), and founding Hyper Travel, a travel concierge startup.

volunteer_activism

ADVICE

Value of Foundational Knowledge

Foundational knowledge from textbooks like "PRML" and "Sutton and Barto" is important.
It provides a basis for understanding complex ML concepts.

question_answer

ANECDOTE

Startup Experience

Minqi Jiang and his co-founder bootstrapped Hyper Travel with savings, iterating through app ideas.
One, "Once," a diary-like social media app, foreshadowed BeReal's core concept.

Get the Snipd Podcast app to discover more snips from this episode

Foundations and Intuition in Machine Learning

04:51 • 3min

chevron_right

Navigating the Entrepreneurial Journey

07:37 • 12min

chevron_right

Balancing Subjectivity and Societal Objectives

19:22 • 2min

chevron_right

Minimizing Regret in Reinforcement Learning

21:37 • 20min

chevron_right

Exploring Intelligence: A Symbiotic Journey

41:33 • 8min

chevron_right

Emergent Intelligence in Reinforcement Learning

49:44 • 29min

chevron_right

Exploring RLHF and Its Impact on Language Models

01:18:35 • 27min

chevron_right

Exploring the Paradigm Shift to Software 2.0

01:45:10 • 2min

chevron_right

Navigating Deep Learning Complexities

01:47:21 • 28min

chevron_right

Understanding Learning Potential in Reinforcement Learning

02:15:23 • 7min

chevron_right

Exploring Grounding in Deep Reinforcement Learning

02:22:35 • 22min

chevron_right

Embracing Open-Endedness in AI Research

02:44:50 • 2min

chevron_right

#15372

• Mentioned in 3 episodes

Why Greatness Cannot Be Planned: The Myth of the Objective

Kenneth Stanley

Joel Lehman

In 'Why Greatness Cannot Be Planned', Stanley and Lehman argue that setting objectives can limit success, especially in creative and innovative endeavors. They propose that focusing on novelty and exploration rather than specific goals can lead to more significant achievements. The book draws insights from artificial intelligence research and applies them to broader societal and personal contexts.

#8240

• Mentioned in 5 episodes

Pattern Recognition and Machine Learning

Christopher M. Bishop

This book offers a detailed introduction to pattern recognition and machine learning, integrating both fields under a common statistical framework. It covers topics such as Bayesian methods, graphical models, kernel-based algorithms, and neural networks, making it suitable for advanced undergraduates, first-year PhD students, researchers, and practitioners. The book includes a wide range of exercises and is supported by additional materials like lecture slides and figures.

#7918

• Mentioned in 6 episodes

Reinforcement Learning: An Introduction

Second Edition

Richard Sutton

Andrew G. Barto

This second edition of 'Reinforcement Learning: An Introduction' by Richard S. Sutton and Andrew G. Barto provides a clear and simple account of the field's key ideas and algorithms. The book is significantly expanded and updated, including new topics such as artificial neural networks, the Fourier basis, and expanded treatment of off-policy learning and policy-gradient methods. It also includes new chapters on the relationships between reinforcement learning and psychology/neuroscience, as well as updated case studies on AlphaGo, AlphaGo Zero, Atari game playing, and IBM Watson's wagering strategy. The final chapter discusses the future societal impacts of reinforcement learning.

Patreon: https://www.patreon.com/mlst Discord: https://discord.gg/ESrGqhf5CB Twitter: https://twitter.com/MLStreetTalk

In this exclusive interview, Dr. Tim Scarfe sits down with Minqi Jiang, a leading PhD student at University College London and Meta AI, as they delve into the fascinating world of deep reinforcement learning (RL) and its impact on technology, startups, and research. Discover how Minqi made the crucial decision to pursue a PhD in this exciting field, and learn from his valuable startup experiences and lessons.

Minqi shares his insights into balancing serendipity and planning in life and research, and explains the role of objectives and Goodhart's Law in decision-making. Get ready to explore the depths of robustness in RL, two-player zero-sum games, and the differences between RL and supervised learning.

As they discuss the role of environment in intelligence, emergence, and abstraction, prepare to be blown away by the possibilities of open-endedness and the intelligence explosion. Learn how language models generate their own training data, the limitations of RL, and the future of software 2.0 with interpretability concerns.

From robotics and open-ended learning applications to learning potential metrics and MDPs, this interview is a goldmine of information for anyone interested in AI, RL, and the cutting edge of technology. Don't miss out on this incredible opportunity to learn from a rising star in the AI world!

TOC

Tech & Startup Background [00:00:00]

Pursuing PhD in Deep RL [00:03:59]

Startup Lessons [00:11:33]

Serendipity vs Planning [00:12:30]

Objectives & Decision Making [00:19:19]

Minimax Regret & Uncertainty [00:22:57]

Robustness in RL & Zero-Sum Games [00:26:14]

RL vs Supervised Learning [00:34:04]

Exploration & Intelligence [00:41:27]

Environment, Emergence, Abstraction [00:46:31]

Open-endedness & Intelligence Explosion [00:54:28]

Language Models & Training Data [01:04:59]

RLHF & Language Models [01:16:37]

Creativity in Language Models [01:27:25]

Limitations of RL [01:40:58]

Software 2.0 & Interpretability [01:45:11]

Language Models & Code Reliability [01:48:23]

Robust Prioritized Level Replay [01:51:42]

Open-ended Learning [01:55:57]

Auto-curriculum & Deep RL [02:08:48]

Robotics & Open-ended Learning [02:31:05]

Learning Potential & MDPs [02:36:20]

Universal Function Space [02:42:02]

Goal-Directed Learning & Auto-Curricula [02:42:48]

Advice & Closing Thoughts [02:44:47]

References:

- Why Greatness Cannot Be Planned: The Myth of the Objective by Kenneth O. Stanley and Joel Lehman

https://www.springer.com/gp/book/9783319155234

- Rethinking Exploration: General Intelligence Requires Rethinking Exploration

https://arxiv.org/abs/2106.06860

- The Case for Strong Emergence (Sabine Hossenfelder)

https://arxiv.org/abs/2102.07740

- The Game of Life (Conway)

https://www.conwaylife.com/

- Toolformer: Teaching Language Models to Generate APIs (Meta AI)

https://arxiv.org/abs/2302.04761

- OpenAI's POET: Paired Open-Ended Trailblazer

https://arxiv.org/abs/1901.01753

- Schmidhuber's Artificial Curiosity

https://people.idsia.ch/~juergen/interest.html

- Gödel Machines

https://people.idsia.ch/~juergen/goedelmachine.html

- PowerPlay

https://arxiv.org/abs/1112.5309

- Robust Prioritized Level Replay: https://openreview.net/forum?id=NfZ6g2OmXEk

- Unsupervised Environment Design: https://arxiv.org/abs/2012.02096

- Excel: Evolving Curriculum Learning for Deep Reinforcement Learning

https://arxiv.org/abs/1901.05431

- Go-Explore: A New Approach for Hard-Exploration Problems

https://arxiv.org/abs/1901.10995

- Learning with AMIGo: Adversarially Motivated Intrinsic Goals

https://www.researchgate.net/publication/342377312_Learning_with_AMIGo_Adversarially_Motivated_Intrinsic_Goals

PRML

https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf

Sutton and Barto

https://web.stanford.edu/class/psych209/Readings/SuttonBartoIPRLBook2ndEd.pdf

Home Top podcasts Popular guests Top books