How to Fine Tune a Language Model to Give Helpful, Harmless, and Honest Answers

Researchers call the three H's, namely, honesty, harmlessness, and helpfulness. The RRL-H8 fine tuning technique helps train models to use first person pronouns in a way that is somewhat systematic and reliable. When given certain prompts, such as explaining how to hotwire a car, you don't want the model to answer because it could be harmful. And so when you're not familiar with the nuts and bolts of training of a lens, it's easy to jump to conclusions.

Play episode from 45:26

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app