AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
The MPT Falcon: A Model for Reinforcement Learning
St. Peter: It is interesting actually if I'm understanding right from some of the sources that I've that I've been reading, there was a 30 or 34 billion parameter model that they were also had in prerelease and were tuning. So it could be possible as they instruction tune and get human feedback potentially more iterations of reinforcement learning from human feedback. They use two separate reward models in this fine tuning of the chat based model. One that was related to helpfulness. And then the other one, which was related to safety. St. Peter: Maybe other note which I find quite interesting are the legal implications of generative AI.