AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
The Myopic Meso-Optimizer
A myopic optimizer is one that reinforces programs based only on their performance within a short time horizon. So for example, the outside gradient descent loop might grade a strawberry picker only on how well it did picking strawberries for the first hour of deployment. If this worked perfectly, it would create an optimizer with a short time horizons. And we eliminate the incentive for deception by ensuring that the base optimizer is myopic. Even if the base optimiser is myopic, the meso-optimizer might not be. As far as we know there are no existing full meso- Optimizers. You just run the gradient descent and cross your fingers. The most likely outcome?