How to Learn the Rail Without a Value Function?

"I used to thought that policy learning was like a kind of, like policic radient. And in general, learning policies directly without ta value function was totally useless" "Now i think i changed my mind very strongly about it, in the sense that this darie anthog was just toing out, i couldn't care less," he says. 'i no longer expect neither mine nor other papers to be perfect'

Play episode from 01:14:38

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app