AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Impact of Reward Misspecification and Fixation on Feedback Mechanisms on Goal Alignment
Consistent reward misspecification can reinforce misaligned goals and lead to policies learning to maximize reward rather than fulfilling aligned goals. Intrinsic curiosity reward functions may lead policies to consistently pursue the goal of discovering novel states, conflicting with aligned goals. Additionally, goals can be correlated with rewards due to fixation on feedback mechanisms, rather than the content of the reward function.