AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Understanding text generation and RLHF in AI modeling
Text generation in AI involves interpolating the pre-trained dataset to fill gaps between training examples, and sculpting the interpolation by prompting, shaving off dimensions, and reducing the space of possibilities. RLHF in AI is akin to the Reynolds and McDonald view of multiverse of fiction, where the modeled text is the policy rollout. During RLHF tuning, multiple completions are generated and evaluated under the reward model, weighted by the reward model's evaluation to predict a subset of all text that the model can generate and that would be approved of.