
Hard drive reliability at scale (Changelog Interviews #537)
Changelog Master Feed
00:00
How to Collect and Store Smart Data
We run SmartCTL on the drives in there. We store that data, and then we add a few things to it. Then there's a processing system which goes through and determines the notion of failure. So if a drive reported something to us, it didn't fail yet, all right? The next day we go in and we go back to that same pod, for example, and we notice that one of the drives is missing,. We only got 59. That gets reported and then that becomes part of what has to be processed by our side processes over the next few days.
Transcript
Play full episode