AI Safety Fundamentals: Alignment

Illustrating Reinforcement Learning from Human Feedback (RLHF)

Jul 19, 2024

22:32

forum

Ask episode

view_agenda

Chapters

auto_awesome

Transcript

info_circle

Episode notes

This more technical article explains the motivations for a system like RLHF, and adds additional concrete details as to how the RLHF approach is applied to neural networks.

While reading, consider which parts of the technical implementation correspond to the 'values coach' and 'coherence coach' from the previous video.

A podcast by BlueDot Impact.

Learn more on the AI Safety Fundamentals website.

Home Top podcasts Popular guests Top books