How Policies Saved us a Thousand Headaches, with Alessandro Pomponio
Alessandro Pomponio from IBM Research explains how his team transformed their chaotic bare-metal clusters into a well-governed, self-service platform for AI and scientific workloads. He walks through their journey from manual cluster interventions to a fully automated GitOps-first architecture using ArgoCD, Kyverno, and Kueue to handle everything from policy enforcement to GPU scheduling.
You will learn:
How to implement GitOps workflows that reduce administrative burden while maintaining governance and visibility across multi-tenant research environments
Practical policy enforcement strategies using Kyverno to prevent GPU monopolization, block interactive pod usage, and automatically inject scheduling constraints
Fair resource sharing techniques with Kueue to manage scarce GPU resources across different hardware types while supporting both specific and flexible allocation requests
Organizational change management approaches for gaining stakeholder buy-in, upskilling admin teams, and communicating policy changes to research users
Sponsor
This episode is brought to you by Testkube—the ultimate Continuous Testing Platform for Cloud Native applications. Scale fast, test continuously, and ship confidently. Check it out at testkube.io
More info
Find all the links and info for this episode here: https://ku.bz/5sK7BFZ-8
Interested in sponsoring an episode? Learn more.