Arrested DevOps

Matt Stratton, Trevor Hess, Jessica Kerr, and Bridget Kromhout

Arrested DevOps is the podcast that helps you achieve understanding, develop good practices, and operate your team and organization for maximum DevOps awesomeness.

Episodes

Mentioned books

Oct 14, 2015 • 0sec

Creating DevOps Communities and Events With Andy Burgin, Dustin Collins, and Nathen Harvey

Matt spends the entire episode claiming that Nathen was famous for being on ADO11, when in fact it was ADO14. Check Outs Nathen If you’re at a big conf, find the locals and do a mini meetup onsite DevOpsDays Podcast Public post mortems Andy charlesproxy.com John Leech - Nagios Song TechnologyUG Leeds Event Dustin ContainerDays NYC - Oct 29-30 - Docker Docker Docker Downtown NYC Tech is meeting Oct 28 - Boyd Hemphill/Docker in production CMXHub - Good articles/videos on community building Jenkins Job DSL - Conjur repo w/ jobs for example Trevor Agents of Shield Season 3 Surface Book, Nexus 6P <3 gadgets Matt Classy Little Podcast Doctor Who LEGO set (Bridget told us about it, how did she know about it and we didn’t?) CHICAGO CUBS CHECK THEM OUT YO!!! BUZZ SAW!!!

Oct 1, 2015 • 0sec

Infrastructure as Code With Joshua Timberman, Eric Sorenson, and Robyn Bergeron

The Reddit post referenced in the episode: Having a difficult time wrapping my head around test driven infrastructure Transcript Check Outs Joshua: Policyfiles! Webinar coming soon, or already happened! ChefDK 0.8.0, especially if you’re starting to work with Policyfiles Fat Scotch Ale - Silver City Brewing get it at SEA TAC! :D Eric: Dark techno/dnb from Grey Area in the UK Ansible :) especially people using Ansible in conjunction with puppet and chef - @ me on twitter! Robyn: Eric: Lots of those peeps :) I am happy to help connect folks. Ansible 2.0… soonish! Fedora 23 beta: GO GET IT SPLATOON Trevor: Welcome to the Dungeon boardgame Rocket League Matt: Pac-Man 256 iOS Android The Martian (audiobook) Felicia Day’s audiobook, []“You’re Never Weird On The Internet (Almost)”](http://www.audible.com/pd/Bios-Memoirs/Youre-Never-Weird-on-the-Internet-Almost-Audiobook/B00XUTQ692)

Sep 23, 2015 • 0sec

Cognitive Neuroscience With Courtney Nash & Lindsay Holmwood

Trevor and Bridget chat with Courtney Nash (O'Reilly Media) and Lindsay Holmwood (Australian Government Digital Transformation Office) about cognitive neuroscience. We'll talk about recognizing cognitive fallacies, the psychology of alert design, how to make your conference proposals their most appealing, and about empathy from a scientific point of view.

Sep 10, 2015 • 0sec

ChatOps Extravaganza With Jason Hand, Sasha Rosenbaum, and Peter Burkholder

Recording Live from DevOpsDays Chicago! ChatOps is used by many teams and companies as the main communication tool for day to day chat, and their most important activities. In fact, ChatOps may be taking the place of email in the workplace for internal communication for tech teams as it helps communication during DevOps activities like deploys, code pushes, etc. This episode discusses best practices (if there are any) of ChatOps and how to make sure you are getting the most from your team communication tools. Asynchronous vs. Synchronous Communication 25% of the work week is spent managing your inbox. You can actually increase productivity by moving to Sync communication…we think. Sasha: ChatOps creates an enormous amount of noise while at the same time makes communication grouped and searchable. Discussion suggests it is up to the user to mediate that noise, but is it the user, the culture, or the conversation itself that dictates the role of chat? Tivo is given as an example of user mediation: you recorded a shit ton of stuff, and watched only what you wanted. The panel comes to the conclusion that important decisions should lean away from ChatOps, and into a more formal, permanent form of communication. “Important things will ‘re-bubble’ again,” but the chatroom is not the place if a team consensus is needed, especially if the team is remote. Create a culture where ChatOps is used in the way you need. Risky to go “Super Pendulum Swing” in one direction or the other. What is ChatOps good at? Solving the communication problem. Brings everybody into the same experience. Even if you are across Europe, or accross the room, you are having the same experience. Great for in the moment Q & A. Even with one on one questions, if the answer is shared in a public channel, the information is given to all on the team which moderates the need for repeated questions, and increases team efficiency. You need to be constantly pairing. If you direct message someone, you are keeping that information from the team. “If you are not working in your chat tool, you are not collaborating.” Shared History Makes communication searchable, and organized by topic, or at least team. Rooms should be broken down to their smallest parts. Topics, Meetings, Projects, they should all be open spaces for all departments. Getting messages/alerts from integrated tools is perhaps one of the most important features of ChatOps in DevOps: Jenkins, Github, Travis, etc. What’s the Problem? There are just too many messages. But they are necessary messages. Internal ChatOps tool is almost useless when you are a consultant and you are all working on different clients. Problems with Adoption? In your organization, if you are considering chat tools for different purposes, use benchmarking and measurements to monitor your usage and data in each tool (in this case, chat vs. email) Matt uses RescueTime (https://www.rescuetime.com/) religiously. His current rate of email vs Slack: 4 Hours in Slack, 48 minutes in email. Sales, Support, etc. prefer email, but that will not change until their tools are integrated with Slack as well. How to make use for permanent communication? Have you adopted ChatOps for sustained messages and conversations that need to be kept? Pinned Items are like the refrigerator…it’s emptied at the end of the week. If it should live longer than a week, then it gets moved to the wiki, google doc, or the most appropriate space for the info. How do you practice Chat-Zero? (Comment with your answer) How do you go through every message in your chat? How do you know what is important? Last Thoughts: Peter can’t wait for computers to be smart enough to interrupt us only when appropriate. “Who is the least invested in their work right now? Let’s notify that person.” If you are considering it, do it. But be careful what you use, and how you use it. Jason: It’s new tech, but its the old problem. ChatOps is just the newest efficiency on the line. Sign up for the Banana Stand for the latest ADO news.

Aug 15, 2015 • 0sec

Podcast Me Maybe With Kyle Kingsbury

Matt, Trevor, and Bridget catch up with Kyle Kingsbury about his research on failure in distributed systems, lighting rigs controlled by code, consent and representation, and more. Call Me Maybe: an exploration of failure in distributed systems using the Jepsen tool. Monitorama 2015 talk on Riemann Bridget announces that she’s joined Pivotal; Matt claims this is to diversify the podcast so it’s no longer Chef employee, Chef partner, Chef customer. Community & Events Lots of open CFPS on devopsdays.org Chef Community Summit is Oct 14 and 15 in Seattle. Matt & Trevor will be there. Matt & Trevor: at DevOpsDays Chicago August 25 & 26! Bridget: at VMWorld the first week in September. Check Outs Kyle: Subnautica! Bridget: Sense8 on Netflix Bosh tutorial Trevor: Alphabet Announcement. Google pointing themselves towards the evil Silicon Valley “Hooli” company. Dragonball Z Resurrection F Herkimer NY, and herkimer diamonds Matt: Amazon Echo Helicarrier LEGO set

Jul 29, 2015 • 0sec

Building an Ops Team With Charity Majors, Patrick McDonnell, and MCR

Charity Majors (Parse/Facebook) is an “Accidental Computerer.” She was the first infrastructure hire at Parse, which was acquired by Facebook in April 2013. Charity handles all of the backend operations and DBA work, and manages a team of 7 engineers. Patrick McDonnell is a web operations manager at Etsy. He made the transition from individual contributor to management a few years ago. Mike Rembetsy (MCR) is the VP of Technical Operations at Etsy. He was one of the first ops people at Etsy when he joined in 2008, and has helped grow the team to over 50 engineers since then. Charity points out that your first question should be whether or not you actually need an ops team. She says, “There are a lot of places out there that think they need traditional operations engineers, when all they really need is someone to really care about their infrastructure… You should have genuinely hard operations problems before you even start looking to hire engineers.” Once you do start down the path of building an ops team, MCR notes that you have to maintain it. You’ll need to grow the team, grow the individuals, grow the culture, which, if your company is in a startup phase, can be distracting to the overall goals. “The glue that holds a team together is how well people interact with one another, how well they respect each other,” says MCR. “That’s what you build good ops teams on, and frankly, that’s what you build any good team on.” As a growing company, we suggest you follow this process: understand/establish your mission communicate that mission clearly to the interviewing team find people who can fulfill that mission understand the fact that technical skills are great, but at the end of the day, it’s about the chemistry and culture of your team Trevor agrees that cultural concerns are absolutely an issue, and asks for suggestions on how to interview for that. MCR explains that at Etsy, they split their interview questions into technical questions and cultural questions about how the interviewees handled particular situations, as well as other skills. Charity notes that good ops engineers are good at learning things, but some people freeze up when they’re put on the spot. She suggests providing interviewees with at least 50% of the questions beforehand, so they have a chance to prepare the preliminary information before interacting with her in a formal interview. “You want people to bring you the self that you’re going to be working with on a day-to-day basis,” she says, “not the self that is freaking out and wondering how they’re being perceived.” Transitioning into the topic of management, Charity brings up the point that if, as a manager, you’re still responsible for key pieces of the infrastructure, you’re holding your team back in their technical development. One of the jobs of a manager, MCR says, is to help motivate your team, and then step aside and let people get things done. In doing this, you let them succeed, as well as fail, and grow as a result of those experiences. He references Daniel Pink’s book, Drive, and the three pieces of science that motivate human beings: autonomy, mastery, and purpose. “A manager’s role is a facilitator,” says Patrick. “Everyone should be doing what they think they need to do. It’s my job to remove obstacles that come in their way and make sure I can smooth things out if need be, but really, it’s to encourage and allow people to reach their full potential.” Etsy has two separate career paths: one for individual contributors and another for management, which allows for the honing of particular skills specific to the end goal. Charity agrees wholeheartedly with this approach, and says emphatically, “Management isn’t a promotion. It’s a career change.” In addition, manager and leader is not synonymous. Some people are much better leaders than they are managers, and vice versa. It is also possible to be a leader while continuing as an individual contributor. In fact, Patrick points out that in some circumstances, you might actually lose influence when moving into a management role if you’re already a leader amongst your peers. Bridget asks the panel, “What’s your best advice for someone in the position of building an ops team?” Patrick: “Focus on hiring good people. Hire people that you like. Hire people that you trust. Hire people that, maybe they need to do a little more research, maybe spend a little more time on StackExchange than the next person, but you know that they’re going to get the job done in the way that you need to get the job done.” Charity: “Build your networks. Go to meetups, talk to people. Don’t just talk to the popular kids. Reach out to diverse communities and diverse crowds, and go meet people who are doing cool and exciting things, who are slightly off the beaten path. The more people you know, the better you’re going to be at hiring.” MCR: “Make sure that you address conflict. Make sure you create a safe place for the people you do hire, to have open, honest conversations with one another. There’s nothing more toxic to a team than people chatting behind other people’s backs. Check Outs MCR: – Open Source Utility for ELK – 2012 Velocity Talk from Dr. Richard Cook: How Complex Systems Fail Charity: – Guided Meditation for Adults: “Breathe in strength, breathe out bullshit.” Patrick: – Dr. Christina Maslov’s talk at Velocity: Burnout in Tech – 5-minute burnout self-assessment Bridget: – DevOpsDays Minneapolis talks Trevor: – Batman Arkham Knight – GitKraken Upcoming Events: DevOpsDays at #alltheplaces

Jul 18, 2015 • 0sec

Eating Sushi With Andrew Clay Shafer

Transcript Andrew Clay Shafer (@littleidea). Coming to you live from DevOpsDays Minneapolis, Matt and Bridget sit down with Andrew Clay Shafer in front of a live audience to talk about the growth of DevOps, explain some commonly heard but not always understood terms, and more (after a brief detour on why episode numbers on podcasts are obnoxious, and why this episode is titled “Eating Sushi with Andrew Clay Shafer”). Don’t know who Andrew is? He suggests you Google him, but then goes on to give a little bit of his background: he’s been involved in software development and technology for almost 20 years. After rooming with Luke Kanies, founder of Puppet Labs in college, Andrew got interested in operations and system administration. O’Reilly’s Velocity Conference was also influential in Andrew’s growth, and Andrew reiminisces about his presentation on Agile Infrastructure in 2009. John Willis speaks up from the audience, and strongly suggests going to look at the slides. Matt takes a moment to explain how Arrested DevOps started: “I started listening to John and Damon on DevOps Cafe and understood about 5% of what they were talking about… There are these things that we, as part of this community, tend to know, and what we try to do with this show is break it down for the people who don’t have the tenacity or stubbornness that I did.” Along that vein, Matt asks Andrew to expand upon the “wall of confusion” idea that was referenced in his 2009 talk, and has become a commonly-used (but not always understood) term in DevOps lingo. “It’s a jargony way to talk about the different incentives that exist between developers and operations,” says Andrew. “There’s a transition that happened as software became service-oriented, versus shipped on CDs, where the servers now become this critical part of the value chain, and if you deemphasize the system administration and operation of those servers, then you don’t actually have software. In the middle of these two worlds, where in one, systems administrators were for keeping the printers and the mail server up, to where they’re a critical part of the value chain in the new world, there are broken IT practices that don’t make sense when you’re trying to manage a service. It means recognizing that the best way to optimize a system isn’t to just throw random stuff onto production servers, and then make it ops problem, but to recognize that the infrastructure itself has become an application, and that you can manage these things as an application.” Andrew points out that as much as he enjoys the attention (and who wouldn’t?), he was simply in the right place at the right time, and the right people listened to him. He connected dots to take advantage of tools and practices that were already used to manage software process, and bring that into the infrastructure and operations works. Bridget asks, “You mentioned Agile, and you mentioned Scrum. I’ve heard you say that Scrum is a disease. Can you give us your thoughts on where that sort of stuff is going?” “My personal opinion is that Scrum’s impact on software development is net negative,” Andrew says. “I think it’s particularly bad when people try to adopt it in operations. It’s really susceptible to problems when you have any interrupt-driven work whatsoever.” He suggests pursuing kanban and chatops, making work explicit and visible – tools that allow both you and your management to understand the full context and value of the situation. The conversation transitions into talking about how to make actual changes to your operations and infrastructure teams, rather than always jumping through hoops to make the necessary changes to keep the pages up or the apps running smoothly. The answer isn’t to simply communicate how difficult it is to manage a system – upper management won’t understand the pain, and therefore won’t listen to the complaints. You have to involve the rest of the team so that they understand what you’re going through first-hand. You get empathy from suffering. This all plays back to Conway’s Law, as Andrew points out: “If you believe Conway’s Law is true (as I do), then you understand that your org structure (who communicaties with whom, who reports to whom, etc.) determines the outcome of any decision.” Bridget brings up the point that this is the essence of dogfooding: requiring not only your engineers to be in the code, but your employees to be using the products that you’re creating, so that there’s a general understanding of why things work the way that they do, and a buyin for the necessary changes. Bridget asks Andrew to expand more on what he thinks we are (and should be) optimizing for, which he touched on briefly during his talk at DevOpsDays. Andrew counters that in order to do that properly, we need to first frame the context, which is a problem that plauges DevOps, Agile, and many other systems with which people are trying to transform their companies – you can’t do something prescriptive until you have enough context to understand where you’re starting from. For example, the diet and exercise program you’d give to someone who’s relatively healthy and active is very different than someone with a different set of circumstances. By the same token, you can’t prescribe a solution to an infrastructure problem without first investigating the roots causes and understanding what the foundation is. However, if you model the world as everything is an agent trying to maximize some function, then the basic premise of your decisions is cause and effect. “Looking at the way people behave, and how this plays out within organizations, you might have very different patterns of interactions and patterns of health.” Andrew continues, “Therefore, what you’ll tell people do is very different from context to context.” Despite all of the different scenarios, Andrew argues that there are three things you can always do: Understand the incentives that people are motivated by Align the incentives with behaviors Radiate information to help people make different choices Wondering what the Nash equilibrium and Pareto efficiency game theories are? Here are a few links: Nash equilibrium: Wikipedia Pareto efficiency: Wikipedia Andrew’s slides on Leading a Learning Organization

Jun 25, 2015 • 0sec

Career Devops With Jeff Hackert

Jeff’s background includes 30 years of experience working in people management. Specifically in developing the management skills, and careers of Software Engineering teams. He now works at Chef as the Director of Learning Experiences. What is Career Development Just thinking about career development in the workplace is not very common. Jeff discusses statistics around the specifics of Career Development and implementation strategies. For example, in one study, more than 60% of respondents said that Career Development was of “little to no” importance to their current employer. Less than 5% of employees at some organizations surveyed receive any career feedback from their current bosses. Jeff: There are different approaches people usually take as they grow into their careers. The first is a “go with the flow” approach which can cause issues when leadership skills are not developed in response. A second approach is more proactive in which you plan out and describe your career and where you would like to go in the future. What do you do? The panel discusses their own careers and their attempts to summarize “what you do,” their weaknesses, and their strengths. Especially if the description is to be seen by the entire organization. Can you brag? Should you be vulnerable? For some, listing strengths as an engineer can actually be more difficult than listing weaknesses. Jeff: “It’s a huge act of vulnerability to say who you want to be in an organization” Jeff: “Every Engineer is responsible for their own career development […] and you are 100% responsible for the career development for every engineer on your team” Can you be the Director of Flowers? The group discusses the merit of titles and how relevant and useful they are within an organization. Jeff: “Titles don’t matter, and they absolutely matter.” Ultimately, titles should be descriptive of what you want to do in that position and are indicative of positional authority more than anything else. Management is not for everybody. “It’s Not a Promotion - It’s a Career Change” — Lindsay Holmwood (http://fractio.nl/2014/09/19/not-a-promotion-a-career-change/) Matt: “I don’t like managing people that’s not my thing.” Jeff: Performance management is not the same as Career Development. Often, when Software Engineers get promoted to managers, coding becomes a secondary responsibility and People Management becomes a priority. For some, this is not the trajectory they want for their careers, and that’s ok. Providing context for feedback and guidance is a great way to identify these mismatches between current position and where someone wants to be. Jeff: Using analytics “Project Oxygen” from Google describes 8 qualities of a good manager. ( http://www.nytimes.com/2011/03/13/business/13hire.html ) Jeff: “The Beatle Book” - Ken Schwaber ( http://www.amazon.com/Ken-Schwaber/e/B001H6ODMC ) Trevor: Doing an emotional check-in to describe your current emotional state is a powerful management and communication tool (from the Core Protocols by Jim McCarthy - http://www.mccarthyshow.com/online/) Jeff discusses the usefulness of check-ins on every level within an organization. You know you have a good manager when: 1. They expresses real concern with your career development, separate from quality conversations. 2. They create opportunities for you to realize your goals. (or at least get closer) 3. They have your best interest at heart. …and then there’s the checkouts:

May 29, 2015 • 0sec

Disasters!

Stephanie Van Dyk @sevandyk is an SRE at Google, and has also worked on healthcare.gov. Mark Imbriaco @markimbriaco is co-founder and CEO of OperableInc. He’s worked previously at DigitalOcean, GitHub, LivingSocial, Heroku, and 37signals. Bridget starts by asking Mark what Operable is all about. Mark explains that Operable is trying to help people who are on the “pointy end” of incidents. They’re trying to build tools that help people collaboratively fix problems. “There’s a lot of tools these days that do things like wake you up and alert when you when there is a problem,” says Mark, “but we think there’s a lot of room to help people actually solve problems.” Stephanie briefly goes through some of the history of healthcare.gov, and how she first learned about it. Her position was unique, she points out. “We worked very hard, and very long hours… We were also in the fortunate position of having a lot of authority, which is important if you’re trying to fix a disaster. There’s a lot of problems to solve, and you don’t want any of your additional problems to be ‘Well, who gave you permission to do that?’” “That’s a really good point – not all disasters are created equal,” Bridget notes, “and maybe we should take a step back and think, what are the ingredients that make something a disaster?” Mark: I’m used to catestrophic problems that last for a few hours at most, or in the really bad case, mabe it lasts for two or three or four days, not something that goes on for weeks or months, so that’s a different perspective from what I have, so I’m super interested to hear about [healthcare.gov]. Matt: I think there’s the disasters where there’s a thing that happens, that’s maybe localized to one type of scenario; then there is what happens in an episode of This American Life, where it’s just one thing after another and everything unravels. There’s a quote from the episode that I like to think about when we think about these bigger disasters that are more than just an outage that may be far reaching: “One ingredient of many fiascos is that great, massive, heart-wrenching chaos and failure, are more likely to fail, when great ambition has come into play, when plans are big, expectations are great, and hopes are at their highest.” “I think you’re certainly right,” agrees Stephanie. “In order for something to be a disaster, the stakes have to be quite high… An outage that you find, and fix, and write a postmortem, and everyone learns something, and the users all get over their hurt, that’s not a disaster. That’s just life. At times, there are incidents that leave scar tissue in their wake, making people wonder for years to come if they truly want to use certain products, or trust their data to a certain company. Mark reminisces: “The gutwrenching terror is, are we going to get our customers’ data back, or is it just gone? As an ops person, there’s almost nothing more terrifying than losing data.” This provides a perfect segue into some of the non-obvious issues that arise with disasters. Stephanie brings up the point that you have to be prepared to regain the trust of your users. “How people think about your service is going to determine the fallout of it, and the impact. It’s interesting – it’s not something engineers like to think about very much, because they simply fix the problem. But someone has to be the one to reassure people that it’ll be ok.” Mark agrees: It’s really, really hard to be in the middle of responding to a serious problem, and also have to be the person who needs to communicate about that externally. There’s so much good will that can be gained from being as transparent and public as you can about what’s going on, without pulling punches or hiding, even if things are really bad. This is all well and good, but Bridget brings up a good point: “How do you know exactly how and what to communicate to people?” Stephanie: There are definitely rings of communication. You have to be able to talk to the other engineers who are working on the problem, and those conversations are going to be very different than how you talk to your customers, even if you’re trying to be super open and honest. Your customers don’t care about where in the logs you found that tiny error. They care about when it’s going to be fixed, and whether you’re actually working on it… Also, the person who’s in charge of solving the outage should not be the same person who’s in charge of communicating about the outage. You should have different roles for that. Mark agrees emphatically, and also noted that wording is incredibly important – not only what you say, but how you say it, and the words that you use. “There are three things I want to get across. The first thing is, I want to apologize to people. It has to be a sincere apology. The other thing I need to do is make sure people feel confident that I understand what happened. I need to display confidence, and a really firm grasp about the problem. The last thing I need to do is tell them what I’ll do to try to reduce the likelihood of something like this happening again.” The conversation turns a corner as Matt asks how you plan for outages and prevent disasters. Stephanie jumps in, and reminds us all: “If you don’t test your backups, you don’t really have backups. Similarly, if you don’t test your outage plan, you don’t have an outage plan.” She suggests setting up brainstorming sessions with a handful of people from your team, appointing a “DM” (dubbed “Disaster Master” rather than “Dungeon Master” by Tyler), and running through possible scenarios. Keep an eye out for a Kickstarter in the near future ;) There are definitely advantages to documenting incident reports along the way, but how do you balance the speed of talking through a solution out loud, and the value of face-to-face communication to build trust vs. the need to document things for posterity? Mark: How you interact on a day-to-day basis is also how you should communicate during an outage. The last thing you want to do is change your mode of communication when everything is falling apart and you’ve got high stress. Matt posits that sometimes what’s a disaster for one company isn’t for another, because of their size, their logistical capabilities, etc., but also, sometimes what is being presented as a disaster isn’t actually all that bad. Stephanie identifies the first benchmark as determining whether or not your users are hurting. “If they’re not hurting yet, you might have a disaster coming, but I don’t think it qualifies. But if your users are hurting, that’s when you really need to jump on board and get focused.” Mark agrees, and adds that being able to quantify how many users are affected, and in what way they’re affected, is hugely important. “That’s different than monitoring. Monitoring may tell you that the server’s down, but it doesn’t tell you how many users that impacts.” He reminds us that when you’re working at scale, “services are down for somebody literally all of the time. “What is the threshold where it becomes a disaster? When do you need to start talking about it publicly and in status? Those are questions you really need to answer up front.” Checkouts Stephanie catehuston.com, Accidentally in Code USDS Mark Pre-Accident Podcast Destiny (the Game) Bridget seedsavers.org - Organic heirloom seeds “Common Ground and Coordination in Joint Activity” by David Woods et al Trevor Pocket chainsaw Dragonball Z XENOVERSE MSOffice for Android Smartphones (Excel, Powerpoint, Word) Preview Matt Does Not Commute - iOS/Android game iOS | Android Recreating highlights of Breaking Bad in GTA5 editor I’m also obsessed with Hearthstone on iOS now (World of Warcraft card game thingy)

May 15, 2015 • 0sec

Dr. BOFH, or How I Learned to Stop Worrying and Love the DevOps

Chris Read, Kevin Hubbard, and Yvo van Doorn are reformed BOFH’s (Basterd Operator’s From Hell). Chris is back on the podcast again, this time talking about his expereince as a SysAdmin in past lives. Kevin is currently the DevOps Engineer for BCycle at Trek Bicycle Corporation, and was a SysAdmin for 15+ years. Yvo previously worked at classmates.com and McGraw Hill Corporation as a SysAdmin, and is now at Chef Software. Before we get started, you’ll need to understand the origin story of BOFH. There were stories posted on Usenet back in the 90’s, supposedly authored by a computer operator named Simon whose sole purpose was to terrorize the users of his systems. The phrase “How great would my job be if it weren’t for the f***ing users!” resonated with many SysAdmins (and still does!). Matt starts the episode off asking what it was like as everyone was starting out in this field. Kevin: To me, it was sort of operating in a scarcity model. You had limited resources, and it seemed like anytime there was a new ask for an application, I immediately went to ‘How am I going to ask for the capacity to run this?’ and I would just get so frustrated. It boiled down to ‘How are we going to support this?’ That was my standard line when someone would bring up something new, and I wish I had trophies to give those people for all of their good ideas, because we just couldn’t get it off the ground with the resources that we had. It wasn’t a fun way to operate, but it was the most realistic view. Chris: When in high school and university I was the System Administrator for the school systems. It was astounding seeing what damage to the system can be done – how people trying to do something could affect shared resources, and the after-effects of that. Most of the time it wasn’t malicious; it was due to ignorance, but it built up this mental attitude of ‘All users are just there to break things. We need to constrain them as much as possible, because when things break, we’re the ones that get shouted at.’ Matt: There was a belief that devs are stupid! All they’re going to do is break things, because they don’t care about the systems like a SysAdmin does, because they’re ours. Yvo: Our devs were incentivized not to care because they were paid based on the amount of code they shipped. I’ve had some nightmare evenings trying to fix all of the problems. Bridget then brought the conversation back around to incentives – are there situations when the incentives are diametrically opposed (or at least not aligend well) between the SysAdmins and the developers? Matt brings up the point that developers are incentivized to build features, while SysAdmins are incentivized to bring stability, which at its most basic level is maintained by things not changing. The viewpoint of ‘developers don’t know what they’re asking for’ is also a problem, Kevin reminds us. SysAdmins will often call the developer and explain why things work the way that they do, but won’t take the time to listen to the actual problem. In reality, there’s a perception of other people touching “our stuff” and things will go wrong, but let’s face it: “there are all sorts of things that can go wrong that are often not a specific person’s action,” says Bridget. Given that all of us here are supposedly reformed BOFH’s here, let’s chat about how things have changed, and what that process was. Chris: I finally realized that my interactions with the developers were better if I went to them without the ‘clue stick’ and simply spoke to them, asking them if they realized the impact of their code. It finally clicked for me when I had to work together with the client-side SysAdmins as well as the developers at Thoughtworks. Our whole purpose was to get code written by two different development teams out into production, and it was only through being an advocate for both teams that I was able to build up a relationship with both teams and understand the value. “It seems like DevOps has formalized the relationship between SysAdmins and developers,” says Kevin. “It seems like a much more natural, iterative process working with devs.” Because we’re working side by side, there’s much less going back and forth with having to figure out the direction and purpose behind projects, and simply getting to collaborate. The “handing down stone tablets” philosophy not only no longer works… it has never worked! In Yvo’s case, the change started to happen when the project management team was dissolved. Suddenly, a SysAdmin had to be a part of the development meetings, because there was no longer an intermediary passing information from one team to another. It immediately became more collaborative, and there was visibility into what was happening early on rather than being notified after everything was finished. We’ve shifted from a mentality of ‘protection’ – teaching our “PFY’s” (pimply-faced youths) to protect their systems against the evil developers – to giving history lessons about how we got to the stage that we’re at now where we need to talk to all of the involved parties, and as Chris said, “having everyone focused on the goals, trying to see things from each other’s angles rather than antagonistcally.” This is how we, as a group, move forward. “It takes a deliberate decision to shift,” Bridget observes. We have to be dedicated to teaching our PFY’s this new, collaborative way so that in the future, fewer people will start out with this BOFH mentality. From there, we shift into How can we do better? Matt asks, “We used to be this way – we’re better now – but what are some of the ways that we can still improve?” “I want to be able to maintain this new flexibility that comes with DevOps, but I feel like there’s some decision-making that needs to be made as far as tools and standards go,” says Kevin. It’s a matter of balancing the old playbook of limited resources and mixing in the new cohesion and collaboration efforts. We’ve done a great job of bringing in the greater teams of operations and developers, but most companies still have the one or two lone SysAdmins who are struggling on a daily basis to keep their heads above water. Yvo cautions that we need to bring them into the circle as well. “If we can make their lives easier, they’re going to eventually go to another shop with the perception that being alone and supporting developers is not a bad thing, but right now they really don’t like life.” Bridget’s money-back guarantee: “If you’re less BOFH-y, I promise you’ll be happier, or else we’ll pay for your therapy.” Checkouts Chris Liz Keogh - Perverse Incentives GOTO Conference Chicago 2015 Closing Keynote by Anita Sengupta from NASA JPL was awesome James Lewis - “How I finally stopped worrying and learnt to love Conway’s Law” DRW is hiring! Kevin Stache - new Trek mountain bike Yvo Hop Venom Double IPA - Boneyard</a Bridget Organic heirloom seeds Common Ground and Coordination in Joint Activity by David Woods et al Matt Effective DevOps by Jennifer Davis and Ryn Daniels asciinema

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

App store banner

Play store banner