Time off and loo if it's enforced can be a good way of compensating or make sure that people are rested. I think just doing time off on its own kind of some kind of problems but obviously it's cheap. Other ways of compensating as getting getting compensated per outage or per event is not something we've had any experience with in the past.
In this deep-dive episode, Brian Scanlan, Principal Systems Engineer at Intercom, describes how the company’s on-call process works. He explains how the process started and key changes they’ve made over the years, including a new volunteer model, changes to compensation, and more.
Discussion points:
- (1:28) How on-call started at Intercom
- (10:11) Brian’s background and interest in being on-call
- (14:06) Getting engineers motivated to be on-call
- (16:37) Challenges Intercom saw with on-call as it grew
- (19:53) Having too many people on-call
- (23:20) Having alarms that aren’t useful
- (26:03) Recognizing uneven workload with compensation
- (27:22) Initiating changes to the on-call process
- (30:08) Creating a volunteer model
- (33:02) Addressing concerns that volunteers wouldn’t take action on alarms
- (34:40) Equitability in a volunteer model
- (36:36) Expectations of expertise for being on-call
- (40:56) How volunteers sign up
- (44:15) The Incident Commander role
- (46:19) Using code review for changes to alarms
- (50:02) On-call compensation
- (52:50) Other approaches to compensating on-call
- (55:08) Whether other companies should compensate on-call
- (57:32) How Intercom’s on-call process compares to other companies
- (1:00:46) Recent changes to the on-call process
- (1:04:13) Balancing responsiveness and burnout
- (1:07:12) Signals for evaluating the on-call process
Mentions and links: