I want to dive specifically into the challenges you guys started seeing. I think many listeners will relate to these and it will be helpful to frame these problems before we talk about a lot of the improvements you made. And sounds like the takeaway here for leaders is, one, if you want engineers to feel excited and motivated about being on call, recognize that work and make it feel important to the business. And number two, have solid foundations and processes. If on call is just chaotic and stressful experience, of course, engineers aren't going to be excited about it.
In this deep-dive episode, Brian Scanlan, Principal Systems Engineer at Intercom, describes how the company’s on-call process works. He explains how the process started and key changes they’ve made over the years, including a new volunteer model, changes to compensation, and more.
Discussion points:
- (1:28) How on-call started at Intercom
- (10:11) Brian’s background and interest in being on-call
- (14:06) Getting engineers motivated to be on-call
- (16:37) Challenges Intercom saw with on-call as it grew
- (19:53) Having too many people on-call
- (23:20) Having alarms that aren’t useful
- (26:03) Recognizing uneven workload with compensation
- (27:22) Initiating changes to the on-call process
- (30:08) Creating a volunteer model
- (33:02) Addressing concerns that volunteers wouldn’t take action on alarms
- (34:40) Equitability in a volunteer model
- (36:36) Expectations of expertise for being on-call
- (40:56) How volunteers sign up
- (44:15) The Incident Commander role
- (46:19) Using code review for changes to alarms
- (50:02) On-call compensation
- (52:50) Other approaches to compensating on-call
- (55:08) Whether other companies should compensate on-call
- (57:32) How Intercom’s on-call process compares to other companies
- (1:00:46) Recent changes to the on-call process
- (1:04:13) Balancing responsiveness and burnout
- (1:07:12) Signals for evaluating the on-call process
Mentions and links: