The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) cover image

Networking Optimizations for Multi-Node Deep Learning on Kubernetes with Erez Cohen - #345

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

00:00

Integrating Frameworks for Distributed Deep Learning

This chapter explores the integration of TensorFlow with libraries such as Horovod and NVIDIA's Nickel for distributed training in deep learning. It covers essential configurations for managing workloads across GPUs and servers, emphasizing technologies like RDMA and GPU Direct, while promoting open-source compatibility with other frameworks like PyTorch.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app