Kevin Wang, an undergraduate researcher at Princeton, and Ishaan Javali, his co-author, discuss their groundbreaking work on scaling reinforcement learning networks to 1,000 layers deep, a feat previously deemed impossible. They dive into the shift from traditional reward maximization to self-supervised learning methods, highlighting architectural breakthroughs like residual connections. The duo also explores efficiency trade-offs, data collection techniques using JAX, and the implications for robotics, positioning their approach as a radical shift in reinforcement learning objectives.