
Speed and Scale: How Today's AI Datacenters Are Operating Through Hypergrowth
MLOps.community
Racking, Robots, and Field Operations
Kris discusses emerging robotic automation for racking and need for precise field operations data.
Kris Beevers is the CEO at NetBox Labs, working on turning NetBox into the system of record and automation backbone for modern and AI-driven infrastructure.
Speed and Scale: How Today's AI Datacenters Are Operating Through Hypergrowth // MLOps Podcast #359 with Kris Beevers, CEO of NetBox Labs
Join the Community: https://go.mlops.community/YTJoinIn
Get the newsletter: https://go.mlops.community/YTNewsletter
MLOps GPU Guide: https://go.mlops.community/gpuguide
// Abstract
Hundreds of neocloud operators and "AI Factory" builders have emerged to serve the insatiable demand for AI infrastructure. These teams are compressing the design, build, deploy, operate, scale cycle of their infrastructures down to months, while managing massive footprints with lean teams. How? By applying modern intent-driven infrastructure automation principles to greenfield deployments. We'll explore how these teams carry design intent through to production, and how operating and automating around consistent infrastructure data is compressing "time to first train".
// Bio
Kris Beevers is the Co-founder and CEO of NetBox Labs. NetBox is used by nearly every Neocloud and AI datacenter to manage their networks and infrastructure. Kris is an engineer at heart and by background, and loves the leverage infrastructure innovation creates to accelerate technology and empower engineers to do their best work. A serial entrepreneur, Kris has founded and helped lead multiple other successful businesses in the internet and network infrastructure. Most recently, he co-founded and led NS1, which was acquired by IBM in 2023. He holds a Ph.D. in Computer Science from Rensselaer Polytechnic Institute and is based in New Jersey.
// Related Links
Website: https://netboxlabs.com/
Coding Agents Conference: https://luma.com/codingagents
~~~~~~~~ ✌️Connect With Us ✌️ ~~~~~~~
Catch all episodes, blogs, newsletters, and more: https://go.mlops.community/TYExplore
Join our Slack community [https://go.mlops.community/slack]
Follow us on X/Twitter [@mlopscommunity](https://x.com/mlopscommunity) or [LinkedIn](https://go.mlops.community/linkedin)]
Sign up for the next meetup: [https://go.mlops.community/register]
MLOps Swag/Merch: [https://shop.mlops.community/]
Connect with Demetrios on LinkedIn: /dpbrinkm
Connect with Kris on LinkedIn: /beevek/
Timestamps:
[00:00] Observability and Delta Analysis
[00:26] New World Exploration
[04:06] Bottlenecks in AI Infrastructure
[13:37] Data Center Optimization Challenges
[19:58] Tech Stack Breakdown
[25:26] Data Center Design Principles
[31:32] Constraints and Automation in Design
[40:00] Complexity in Data Centers
[45:02] GPU Cloud Landscape
[50:24] Data Centers in Containers
[57:45] Observability Beyond Software
[1:04:43] Tighter Integrations vs NetBox
[1:06:47] Wrap up


