arxiv preprint - Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Apr 5, 2024

03:56

forum

Ask episode

view_agenda

Chapters

auto_awesome

Transcript

info_circle

Episode notes

In this episode, we discuss Mixture-of-Depths: Dynamically allocating compute in transformer-based language models by David Raposo, Sam Ritter, Blake Richards, Timothy Lillicrap, Peter Conway Humphreys, Adam Santoro. The study presents a method for transformers that allows for the dynamic allocation of computational resources within sequences by limiting the number of tokens processed at each layer using a top-k routing mechanism. This approach maintains a fixed tensor size and a static computation graph, which differs from other conditional computation strategies. The resulting model operates with fewer computations per forward pass and provides up to a 50% faster sampling rate post-training, while still matching the performance of baseline models with the same computational budget and training duration.

Home Top podcasts Popular guests Top books