

Teaching Large Language Models to Reason with Reinforcement Learning with Alex Havrilla - #680
24 snips Apr 16, 2024
In this engaging discussion, Alex Havrilla, a PhD student at Georgia Tech, dives into his research on enhancing reasoning in large language models using reinforcement learning. He explains the importance of creativity and exploration in AI problem-solving. Alex also highlights his findings on the effects of noise during training, revealing how resilient models can be. The conversation touches on the potential of combining language models with traditional methods to bolster AI reasoning, offering a glimpse into the exciting future of reinforcement learning.
AI Snips
Chapters
Transcript
Episode notes
RLHF vs. SFT Impact
- Human feedback is the most impactful improvement in LLMs.
- RLHF improves less than SFT, seen in InstructGPT's 10% win-rate increase.
LLM Sample Efficiency
- LLMs are more sample-efficient in RL than classical RL, needing fewer rollouts.
- This stems from pre-training/fine-tuning, giving LLMs a strong starting bias.
Research Goal: Comparing RL Algorithms
- Alex Havrilla's research compares RL algorithms for LLM fine-tuning, focusing on reasoning.
- The goal is to determine which algorithms work best and understand their comparative benefits.