Streaming the Gradients Into the Memory X

Fine grain pipelining is a technique known well in computer architecture. And so that allowed us to basically cram a huge amount of off chip memory into a memory X with some very careful hardware design, huge amounts of IO bandwidth and very nifty pipelining software. The calculations start new weights are calculated and begin streaming even before the final gradient vector is completed. They're reduced. A single vector is delivered to the optimizer, which is in the memory X. Pretty cool, huh?

Play episode from 21:45

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app