Reinforcement Learning From Human Feedback

An open source repo is available for anyone to play around with. Nikolai: How could I connect my domain experts input and their preferences into a system that I am designing? Do you have any thoughts on reinforcement learning from human feedback? Jimmy Whitaker: One of the examples I love to point to is actually Bloomberg did this and they probably did this early April now. It's based off of GPT-2 right now, so go have some fun and get your hands dirty.

Play episode from 05:35

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app