

Systems Software in the Large
41 snips Sep 25, 2025
Dave Pacheco, an Oxide engineer spearheading a multi-year full-system update, shares his insights on the complexities of the project. He discusses the challenges of balancing autonomy with team structure and the need for self-service updates to reduce downtime. Pacheco also highlights the concept of organizational procrastination and its impact on team productivity. With engaging anecdotes about prioritizing tasks and utilizing demos for communication, he provides a behind-the-scenes look at transforming a highly ambitious idea into a functioning system.
AI Snips
Chapters
Transcript
Episode notes
Update Was A Long, Cross-Team Effort
- Update has been a multi-year company priority with much preparatory work before Dave formally led it two years ago.
- The project balances system complexity and organizational coordination across many teams.
From Manual Rack Reimages To Self-Service
- Oxide's current update approach reimages control plane software and requires rack downtime and support involvement.
- The goal is a self-service, automated update that customers can run without Oxide support or long outages.
Autonomy And Hundreds Of Intermediate States
- Updating hundreds of components creates many intermediate mixed-version states that must remain safe.
- The system must operate autonomously because customers may be air-gapped and cannot rely on human intervention.