AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Explorir Exploit in a Safe Way
I am curious about the explorir exploit you mention, because, i mean, reinforcement learning is known to be very different from regular supervised learning. And so i'm curious, how's that play out in this condict and a special cures? It seems like you could explore with one listener, and what you learn there then alleviates the need for exploration with another listener. I think this is a trate of witdis algritm ter. They can be very effective, but i also think there's a lot of responsibility in trying to understand as much as possible about what they do before you leverage them.