Ship It! Cloud, SRE, Platform Engineering cover image

Ship It! Cloud, SRE, Platform Engineering

Managing Meta's millions of machines

May 4, 2024
Anita Zhang shares insights on managing millions of machines at Meta, open source contributions, automating repository syncing, and navigating AI fleet. The conversation also explores transitioning from indie dev to supporting large teams, research paper titles in AI, and generating future content ideas.
01:02:58

Episode guests

Podcast summary created with Snipd AI

Quick takeaways

  • Meta updates take a year, frequent OS updates prevent issues, contribute upstream for current systems.
  • TW scheduler runs containers with systemd, isolating jobs with features, logs handled by sidecar service.

Deep dives

Managing Updates and Contributions to Upstream

The podcast discusses Meta's approach to managing updates for its million hosts, outlining that major upgrades take about a year while rolling OS updates occur more frequently without major issues. Emphasizing contributions to upstream first enables Meta to stay current with new developments and avoid issues stemming from running outdated systems.

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner
Get the app