The Gradient: Perspectives on AI cover image

Raphaël Millière: The Vector Grounding Problem and Self-Consciousness

The Gradient: Perspectives on AI

00:00

How to Fine Tune a Language Model to Give Helpful, Harmless, and Honest Answers

Researchers call the three H's, namely, honesty, harmlessness, and helpfulness. The RRL-H8 fine tuning technique helps train models to use first person pronouns in a way that is somewhat systematic and reliable. When given certain prompts, such as explaining how to hotwire a car, you don't want the model to answer because it could be harmful. And so when you're not familiar with the nuts and bolts of training of a lens, it's easy to jump to conclusions.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app