AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Storage Options for AI Labs and Kubernetes Challenges
The chapter delves into the storage options favored by companies for AI labs, such as object storage and shared file systems, emphasizing the importance of local checkpointed files during distributed training. It also discusses challenges faced when working at scale with Kubernetes, stressing the need to understand CNI configuration and anticipate bottlenecks.