AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
In this episode, we cover:
Links:
Transcript
Jason: Welcome to the Break Things on Purpose podcast, a show about our often self-inflicted failures and what we learn from them. In this episode, Leonardo Murillo, a principal partner solutions architect at Weaveworks. He joins us to talk about GitOps, Automating reliability, and Pura Vida.
Ana: I like letting our guests kind of say, like, “Who are you? What do you do? What got you into the world of DevOps, and cloud, and all this fun stuff that we all get to do?”
Leo: Well, I guess I’ll do a little intro of myself. I’m Leonardo Murillo; everybody calls me Leo, which is fine because I realize that not everybody chooses to call me Leo, depending on where they’re from. Like, Ticos and Latinos, they’re like, “Oh, Leo,” like they already know me; I’m Leo already. But people in Europe and in other places, they’re, kind of like, more formal out there. Leonardo everybody calls me Leo.
I’m based off Costa Rica, and my current professional role is principal solutions architect—principal partner solutions architect at Weaveworks. How I got started in DevOps. A lot of people have gotten started in DevOps, which is not realizing that they just got started in DevOps, you know what I’m saying? Like, they did DevOps before it was a buzzword and it was, kind of like, cool. That was back—so I worked probably, like, three roles back, so I was CTO for a Colorado-based company before Weaveworks, and before that, I worked with a San Francisco-based startup called High Fidelity.
And High Fidelity did virtual reality. So, it was actually founded by Philip Rosedale, the founder of Linden Lab, the builders of Second Life. And the whole idea was, let’s build—with the advent of the Oculus Rift and all this cool tech—build the new metaverse concept. We’re using the cloud because, I mean, when we’re talking about this distributed system, like a distributed system where you’re trying to, with very low latency, transmit positional audio, and a bunch of different degrees of freedom of your avatars and whatnot; that’s very massive scale, lots of traffic. So, the cloud was, kind of like, fit for purpose.
And so we started using the cloud, and I started using Jenkins, as a—and figure it out, like, Jenkins is a cron sort of thing; [unintelligible 00:02:48] oh, you can actually do a scheduled thing here. So, started using it almost to run just scheduled jobs. And then I realized its power, and all of a sudden, I started hearing this whole DevOps word, and I’m like, “What this? That’s kind of like what we’re doing, right?” Like, we’re doing DevOps. And that’s how it all got started, back in San Francisco.
Ana: That actually segues to one of the first questions that we love asking all of our guests. We know that working in DevOps and engineering, sometimes it’s a lot of firefighting, sometimes we get to teach a lot of other engineers how to have better processes. But we know that those horror stories exist. So, what is one of those horrible incidents that you’ve encountered in your career? What happened?
Leo: This is before the cloud and this is way before DevOps was even something. I used to be a DJ in my 20s. I used to mix drum and bass and jungle with vinyl. I never did the digital move. I used DJ, and I was director for a colocation facility here in Costa Rica, one of the first few colocation facilities that existed in the [unintelligible 00:04:00].
I partied a lot, like every night, [laugh] [unintelligible 00:04:05] party night and DJ night. One night, they had 24/7 support because we were collocations [unintelligible 00:04:12], so I had people doing support all the time. I was mixing in some bar someplace one night, and I don’t want to go into absolute detail of my state of consciousness, but it wasn’t, kind of like… accurate in its execution. So, I got a call, and they’re like, “We’re having some problem here with our network.” This is, like, back in Cisco PIX times for firewalls and you know, like… back then.
I wasn’t fully there, so I [laugh], just drove back to the office in the middle of night and had this assistant, Miguel was his name, and he looks at me and he’s like, “Are you okay? Are you really capable of solving this problem at [laugh] this very point in time?” And I’m like, “Yeah. Sure, sure. I can do this.”
We had a rack full of networking hardware and there was, like, a big incident; we actually—one of the primary connections that we had was completely offline. And I went in and I started working on a device, and I spent about half an hour, like, “Well, this device is fine. There’s nothing wrong with the device.” I had been working for half an hour on the wrong device. They’re like, “Come on. You really got to focus.”
And long story short, I eventually got to the right device and I was able to fix the problem, but that was like a bad incident, which wasn’t bad in the context of technicality, right? It was a relatively quick fix that I figured it out. It was just at the wrong time. [laugh]. You know what I’m saying?
It wasn’t the best thing to occur that particular night. So, when you’re talking about firefighting, there’s a huge burden in terms of the on-call person, and I think that’s something that we had experienced, and that I think we should give out a lot of shout-outs and provide a lot of support for those that are on call. Because this is the exact price they pay for that responsibility. So, just as a side note that comes to mind. Here’s a lot of, like, shout-outs to all the people on-call that are listening to this right now, and I’m sorry you cannot go party. [laugh].
So yeah, that’s telling one story of one incident way back. You want to hear another one because there’s a—this is back in High Fidelity times. I was—I don’t remember exactly what it was building, but it had to do with emailing users, basically, I had to do something, I can’t recall actually what it was. They was supposed to email all the users that were using the platform. For whatever reason—I really can’t recall why—I did not mock data on my development environment.
What I did was just use—I didn’t mock...