Changelog Master Feed cover image

Creating instruction tuned models (Practical AI #223)

Changelog Master Feed

00:00

Reinforcement Learning From Human Feedback

An open source repo is available for anyone to play around with. Nikolai: How could I connect my domain experts input and their preferences into a system that I am designing? Do you have any thoughts on reinforcement learning from human feedback? Jimmy Whitaker: One of the examples I love to point to is actually Bloomberg did this and they probably did this early April now. It's based off of GPT-2 right now, so go have some fun and get your hands dirty.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app