
Episode 15: Martín Arjovsky, INRIA, on benchmarks for robustness and geometric information theory
Generally Intelligent
00:00
How to Learn the Rail Without a Value Function?
"I used to thought that policy learning was like a kind of, like policic radient. And in general, learning policies directly without ta value function was totally useless" "Now i think i changed my mind very strongly about it, in the sense that this darie anthog was just toing out, i couldn't care less," he says. 'i no longer expect neither mine nor other papers to be perfect'
Transcript
Play full episode