Retraining Models to Steer Towards Desirable Characteristics

Episode 20: Hattie Zhou, Mila, on supermasks, iterative learning, and fortuitous forgetting

Generally Intelligent

NOTE

Retraining Models to Steer Towards Desirable Characteristics

A hypothesis was formed to steer a trained model away from undesirable characteristics by identifying a way to forget undesirable information./nDifficult examples and their associated information were considered undesirable, while easier examples were considered desirable./nAlgorithms with iterative training were used to measure the forgetting step and its impact on the model's features./nThe forgetting step resulted in a higher proportion of difficult examples being forgotten./nThe hypothesis was supported by observing that resetting the later layers of the model improved performance./nSimilar improvements were seen in other algorithms when the specificity of the forgetting step was increased.

00:00

Transcript

Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.