AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Scaling Large Language Models to Large Cepe Clusters
Aly: We have a paper at the super computing conference coming up in november on scaling very large language models to large cepe clusters with many thousand ceps. And that's of our magatron project, we want to be able to train superassum super big transformers on h pc infor structure. When we're training very large gpt three style models on our d g x super pod, we are actually sustaining 52 % of the tenser peke through put over the entire train run. Aly sounds like there is plan to bein imperson component to the conference. Er wouldn't that be fantastic? I mean, but in terms of megatron, tell us a