MLOps.community  cover image

GPU Uptime with VAST Data CTO

MLOps.community

00:00

Designing for GPU Farm Reliability and Rolling Updates

Andy explains data-center realities, journaling pitfalls, and architectures that avoid long replays and enable non-disruptive upgrades for GPU farms.

Play episode from 51:13
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app