AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
How Reinforcement Learning with Human Feedback (RLHF) Makes ChatGPT Better
RLHF is a process in which human feedback is used to align a language model to what humans want it to do. This process makes the model more useful and easier to use, improving its capability to understand and provide desired responses. It requires remarkably little data and human supervision. The addition of human guidance through RLHF creates a feeling of alignment between the model and the user, making it feel like the model is trying to help. The science of human guidance is an important area of research, focusing on making language models more usable, wise, ethical, and aligned with human values.