
Detailed analysis on the facebook outage
The Backend Engineering Show with Hussein Nasser
00:00
How to Turn a Million Servers Off at the Same Time?
Even employees could, couldn't figure this out. So we gan to put level f security for you to change these. It took extra time to activate the secure access proticles needed to get people on sight and able to work on the servers. Only then could we confirm the issue and bring the backbone on line. Once our backbone network on activity was restored across our dary centre, everything came back up with it. But the problem was not over. We knew that flipping our services back on, so they did turn them physically off at once, could potentially cause a new round of crashis due to surge of traffic. This is why rolling re starts exist. You never start all of
Transcript
Play full episode