
Managing Meta's millions of machines
Ship It! Cloud, SRE, Platform Engineering
00:00
Navigating Open Source Contributions and Managing AI Fleet at Meta
The chapter explores the team's approach to contributing to open source projects with an upstream first mindset, balancing upstream contributions with maintaining their own software. It delves into the challenges of managing a large number of hosts, major upgrades versus rolling OS updates, and optimizing Meta's extensive AI fleet. Additionally, the conversation focuses on the specialization within Meta's infrastructure teams, their on-prem setup, and upcoming projects involving the utilization of System D.
Transcript
Play full episode