AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Is the RLHF in the Pre-Training Process?
A recent research finding is really interesting around mixing in the safety mitigation into the pre-training process. Do those same behaviors still exist in there somewhere? I don't really know. There's at least like a kernel of reason to be optimistic that we might be able to do the pre- training at scale with the right mix-ins. And we'll tell on that one.