I want to ask you since sort of evangelizing this and writing about it and doing podcast guest appearances like this one what's changed more recently. I know one thing you brought up was that you'd recently spun up an on call process for your security team love to hear more about that and any other more recent changes you guys have made. Well your advice to listeners to not blindly copy and paste but to understand the principles behind intercom implemented sounds reminds me of a conversation I just had on this podcast with someone who had worked at Spotify and he came on to talk about how so many companies copied and pasted the Spotify squad model and similarly ran into a lot of challenges with that.
In this deep-dive episode, Brian Scanlan, Principal Systems Engineer at Intercom, describes how the company’s on-call process works. He explains how the process started and key changes they’ve made over the years, including a new volunteer model, changes to compensation, and more.
Discussion points:
- (1:28) How on-call started at Intercom
- (10:11) Brian’s background and interest in being on-call
- (14:06) Getting engineers motivated to be on-call
- (16:37) Challenges Intercom saw with on-call as it grew
- (19:53) Having too many people on-call
- (23:20) Having alarms that aren’t useful
- (26:03) Recognizing uneven workload with compensation
- (27:22) Initiating changes to the on-call process
- (30:08) Creating a volunteer model
- (33:02) Addressing concerns that volunteers wouldn’t take action on alarms
- (34:40) Equitability in a volunteer model
- (36:36) Expectations of expertise for being on-call
- (40:56) How volunteers sign up
- (44:15) The Incident Commander role
- (46:19) Using code review for changes to alarms
- (50:02) On-call compensation
- (52:50) Other approaches to compensating on-call
- (55:08) Whether other companies should compensate on-call
- (57:32) How Intercom’s on-call process compares to other companies
- (1:00:46) Recent changes to the on-call process
- (1:04:13) Balancing responsiveness and burnout
- (1:07:12) Signals for evaluating the on-call process
Mentions and links: