AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
The Complexity of Chat GPT and RLHF
I actually think RLHF and all these fine tuning techniques are reliant on this. It's hard to write RLHF right now. How do you share the state amongst so many different problems? Like I have to allow every node in that DAG to really easily share the model. And most of those nodes are going to be running at a different schedule because or they're going to be triggered by user actions. Yeah, like DAG frameworks and like execution engines are good, but I don't think there's a simple way to solve this or to do this.