

Rebooting a datacenter: A decade later
6 snips May 30, 2024
Former Joyent team members reflect on a data center reboot gone wrong, discussing post-mortems, technical challenges, driver lineage, critical system failures, and the importance of transparency and collaboration in overcoming data center issues.
Chapters
Transcript
Episode notes
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Introduction
00:00 • 2min
Recalling the Data Center Outage
01:38 • 14min
Reflecting on a Decade-Old Blameless Post Mortem and Bespoke Options Parser
15:10 • 2min
Exploring Data Center Tooling and Protocols with scc and rabbit mq Interactions
16:43 • 2min
Exploring the Significance of Numbers in Programming
18:37 • 8min
Challenges and Collaboration in Resolving System Failures
26:49 • 13min
Exploring Driver Lineage and the Discovery of Source Code
39:26 • 2min
Navigating PCIE Device Interrupt Challenges
41:11 • 2min
Unexpected Connections and Critical Calls
42:41 • 3min
Overcoming Datacenter Challenges
45:49 • 14min
Navigating Service Outages and System Failures
59:39 • 14min
Email Exchanges about Potential CEO Replacement
01:14:07 • 2min
Navigating Technical Challenges in Operating Systems
01:16:02 • 16min
Challenges and Decisions in Hardware Issues
01:31:40 • 2min
Enhancing API Safety and Transparency in Operations
01:33:37 • 7min