The MPT Falcon: A Model for Reinforcement Learning

St. Peter: It is interesting actually if I'm understanding right from some of the sources that I've that I've been reading, there was a 30 or 34 billion parameter model that they were also had in prerelease and were tuning. So it could be possible as they instruction tune and get human feedback potentially more iterations of reinforcement learning from human feedback. They use two separate reward models in this fine tuning of the chat based model. One that was related to helpfulness. And then the other one, which was related to safety. St. Peter: Maybe other note which I find quite interesting are the legal implications of generative AI.

Play episode from 24:03

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app