Elon Musk Podcast cover image

Tesla AI Day 2022 Optimus Robot reveal

Elon Musk Podcast

00:00

Achieving Full Model Performance With ResNet 50

8,750 nodes on 25 dies coordinating to reduce and then broadcast the bash from mean and standard deviation values. Global reduction followed by global reduction towards the middle of the tile. Then the reduced value radiating from the middle accelerated by the hardware's broadcast facility. This operation takes only 5 microseconds on 25 dojo dies. The same operation takes 150 microseconds on 24 GPUs. And while we talked about and already saw operation in the context of a batch norm, it's important to reiterate that the same advantages apply to all other communication primitives. These primitives are essential for large scale training.

Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner