AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Using Human Feedback to Define a Reward Function
I realized the next frontier was figuring out how to make language models actually useful. I'm still really interested in RL, but solving RL benchmarks isn't the end of the story. To use your RL algorithm, you need a reward function. How exactly do you define this reward becomes a challenging and important problem.