In this episode, we cover:What Kessel Run is Doing: 00:01:27Failure Never has a Single Point: 00:05:50Lessons Learned: 00:10:50Working the DOD:00:13:40Automation and Tools: 00:18:02Links:Kessel Run: https://kesselrun.af.milKessel Run LinkedIn: https://www.linkedin.com/company/kesselrun/TranscriptOmar: But I’ll answer as much as I can. And we’ll go from there.Jason: Yeah. Awesome. No spilling state secrets or highly classified info.Omar: Yes.Jason: Welcome to Break Things on Purpose, a podcast about chaos engineering and building reliable systems.Jason: Welcome back to Break Things on Purpose. Today with us we have guest Omar Marrero. Omar, welcome to the show.Omar: Thank you. Thank you, man. Yeah, happy to be here.Jason: Yeah. So, you’ve been doing a ton of interesting work, and you’ve got a long history. For our listeners, why don’t you tell us a little bit more about yourself? Who are you? What do you do?Omar: I’ve been in the military, I guess, public service for a while. So, I was military before, left that and now I’ve joined as a government employee. I love what I do. I love serving the country and supporting the warfighters, making sure they have the tools. And throughout my career, it’s been basically building tools for them, everything they need to make their stuff happen.And that’s what drives me. That’s my passion. If you’ve got the tool to do your mission, I’m in and I’ll make that happen. That’s kind of what I’ve done for the whole of my career, and chaos has always been involved there in some fashion. Yeah, it’s been a pretty cool run.Jason: So, you’re currently doing this at a company called Kessel Run. Tell us a little bit more about Kessel Run.Omar: So, we deliver combat capability that can sense or respond to conflict in any domain, anywhere, any time. Or deliver award-winning software that our warfighters love. So, Kessel Run’s kind of… you might think of it as a software factory within the DOD. So, the whole creation of Kessel Run is to deliver quickly, fast. If you follow the news, you know DOD follows waterfall a little bit.So, the whole creation of Kessel Run was to change that model. And that’s what we do. We deliver continuously non-stop. Our users give us feedback and within hours, they got it. So, that’s the nature behind Kessel Run. It’s like a hybrid acquisition model within the government.Jason: So, I’m curious then, I mean, you obviously aren’t responsible for the company naming, but I’m sure many of our listeners being Star Wars fans are like, “Oh, that sounds familiar.” Omar: Yep, yep.Jason: If you haven’t checked out Kessel Run’s website, you should go do that; they have a really cool logo. I’m guessing that relates to just the story of Kessel Run being like, doing it really fast and having that velocity, and so bringing that to the DOD, is that the connection?Omar: Actually, it goes into the smuggling DevSecOps into the DOD, so the 12 parsecs. So, that’s where it comes from. So, we are smuggling that DevSecOps into the DOD; we’re changing that model. So, that’s where it comes from.Jason: I love that idea of we’re going to take this thing and smuggle it in, and that rebellious nature. I think that dovetails nicely into the work that you’ve been doing with chaos engineering. And I’m curious, how did you get into chaos engineering? Where did you get your start?Omar: I’ve been breaking things forever. So, part of that they deliver tools that our warfighters can use, that’s been my jam. So, I’ve been doing, you can say, chaos forever. I used to walk around, unplug power cables, network cables, turn down [WAN 00:03:24]. Yeah, that was it.Because we used to build these tools and they’re like, “Oh, I wonder if this happens.” “All right, let’s test it out. Why not?” Pull the cable and everybody would scream and say, “What are you doing?” It was like, “We figured it out.”But yeah, I’ve been following chaos engineering for a while, ever since Netflix started doing it and Chaos Monkey came out and whatnot, so that’s been something that’s always been on my mind. It’s like, “Ah, this would be cool to bring into the DOD.” And Kessel Run just made that happen. Kessel Run, the way we build tools, our distributed system was like, “Yep, this is the prime time to bring chaos into the DOD.” And Kessel Run just adopted it.I tossed the idea, I was like, “Hey, we should bring chaos into Kessel Run.” And we slowly started ramping up, and we build a team for it; team is called Bowcaster. So, we follow the breaking stuff. And that’s it. So, we’ve matured, and we’ve deployed and, of course, we’ve learned on how to deploy chaos in our different environments. And I mean, yeah, it’s been a cool run.Jason: Yeah, I’m curious. You mentioned starting off simply, and that’s always what we recommend to people to do. Tell us a little bit more about that. What were some of the tests that you ran then, and then maybe how have they matured, and what have you moved into?Omar: So, our first couple of tests were very simple. Hey, we’re going to test a database failover, and it was really manual at that point. We would literally go in and turn off Database A and see what happened. So, it was very basic, very manual work. We used to record them so we can show them off like, “Hey, check this out. This is what we did.”So, from there, we matured. We got a little bit more complex. We eventually got to the point where we were actually corrupting databases in production and seeing what happens. You should have seen everybody’s faces when we proposed that. So, from there, we’re running basically, we call it ‘Chaos Plus’ in Kessel Run.So, we’ve taken chaos engineering, the concept of chaos engineering, right, breaking things on purpose, but we’ve added performance engineering on top of it, and we’ve added cybersecurity testing on top of it. So, we can run a degraded system, and at the same time say, “All right, so we’re going to ramp up and see what a million users does to our app while it’s fully degraded.” And then we would bring in our cyber team and say, “All right, our system is degraded. See if you can find a vulnerability in it.” So, we’ve kind of evolved.And I call it, put chaos on a little bit of steroids here. But we call it Chaos Plus; that’s our thing. We’ve recently added fuzzing while we’re doing chaos. So, now we got performance chaos, our cyber team, and we’re fuzzing the systems. So, I’m just going to keep going until somebody screams at me and says, “Omar, that’s too much.” But that’s essentially a little bit of our ride in Kessel Run.Jason: That’s amazing. I love that idea of we’re going to do this test, and then we’re going to see what else can happen. One of the things that I’ve been chatting with a bunch of folks recently about is this idea, we always talk about, especially in the resilience engineering space, that failure never has a single point. It’s not a singular root cause; it’s always contributing factors. And the problem is, when you’re doing chaos eng...