Retort Learning

In RL, rewards come from the world. The environment is like the research decides that apple is worth 10 points. And when you eat it, you get 10 reward points. But we don't have quantification of our feeling of awe yet, right? It's not about qualia. It's not subjective. It's about the flow of inflation. That signal is produced by the agent. We need a different paradigm.

Play episode from 44:13

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app