AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
How to Improve Language Models Through Value-Based Reinforcement
A lot of the ways that people do RL with language models now treats the language models task as a one step problem. But if we're thinking about counterfactuals, that is typically situated in a multi-step process. So I think there's actually a lot of potential to get much more powerful language models with appropriate value-based reinforcement.