How to Optimize Prefix Parameters for Training

The objective is the same as the fine-tuning objective they are both using cross-interview laws of key of y conditional x except that now the set of trainable parameters is different. This difference leads to like a large reduction in the parameters that we need to store because we are phrasing the language model parameters so we don't need to store them anymore and we only need to store p theta which is a small matrix cool yeah that makes sense. i think it's clear to me how exactly training works this one question i had related to some detail that i saw in your paper though you say that directly optimizing the prefix parameters do not does not work directly and you re-param

Play episode at 14:05

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app