Chat GPT is a powerful model that becomes more usable with reinforcement learning and human feedback. The model learns from text data and can do amazing things, but initially it's not easy to use. RLHF, or reinforcement learning with human feedback, is the magic ingredient that aligns the model with human preferences. By showing two outputs and asking for feedback, the model improves with remarkably little data.