Speaker 2
Official. Yeah, exactly. So maybe we start with your role is PIPI, Safety and Security Engineer. Maybe you could tell us a little bit about what does
Speaker 1
that role entail? Absolutely. So for some background, right, PIPI is the Python package index, which is maintained by a small group of volunteers and a lot of contributors, and it is managed and hosted and funded by the Python Software Foundation, the PSF. PSF is a nonprofit that works on establishing the best possible way to use Python and the ecosystem around it, including the packaging ecosystem. And it's a relatively small nonprofit. I know that you've had Seth Larson on the show. He is the PSF or the overall Python security developer in residence, and then we also have Lukas Langa, who's the developer in residence for Python. Yeah, yeah. And those folks are kind of the broader ecosystem or the Python language runtime. And when some funding came around, thank you to Amazon Web Services to invest in making PIPI.org, the package index, a safer place to both upload and use packages. I threw my hat in the ring, having been a PIPI volunteer maintainer for a couple of years. A lot of other great folks put their names in, but went through the process and got hired to focus predominantly on the impact of PIPI in the packaging ecosystem and how to make that more secure, how to make it safe for all of the users, both, you know, corporations, individuals, scientific researchers, anybody on the planet and beyond.
Speaker 2
Yeah, yeah. We always talk about the mass amounts of resources there at PIPI. And so it's nice to have someone else who's in a full-time paid role to sort of watch over the security of it. That's fantastic. Yeah,
Speaker 1
but until now it's been pretty much, you know, I came on as the first new PIPI maintainer, like full-time maintainer, not full-time, but volunteer maintainer a couple of years ago. But it's largely been just three folks over the last decade or so who've been kind of at the forefront of handling all of the different feature requests. I mean, yes, there have been different funded contract work by different parties to get specific features, but there's been very much a really small core maintainer volunteer group. And Ederben, the director of infrastructure, yes, is an employee of the PSF, but has only been dedicating a portion of their time to PIPI because they have to worry about all of the things. So when AWS and some other funders came up with some funds to say we'd like to put some more power behind PIPI safety, it was great. We can take these funds, we can run with them, we can focus on other parts that aren't just volunteer contributions as it were.
Speaker 2
What made you decide you wanted to initially contribute to PIPI? What was something that was like a deciding factor? I want to get involved in this.
Speaker 1
Yeah, it's a fun story. So I've been in software development and engineering and systems and management and everything for roughly 30 years or so. And I've done it across a couple continents, a bunch of different startups and enterprises checked out my LinkedIn. It's very fun to read. But yeah, yeah, I had always been exposed to Python as of, I guess, 2007, 2008 was when I started actually using it for kind of work things. And it's a very pleasant language to work in once you kind of figure out some of the quirks back then it was Python 2. And then progressively over the course of the 20 years since, I've been contributing to open source in a variety of different ways. I worked a lot on the Chef community and wrote a lot of Ruby and cookbooks over there. And starting to dig into the different tools that I was using, which are mostly open source, right? We use a lot of open source. I started to see, okay, here's a bug. I can find the bug because it's open source. I can read through all the code. I can't necessarily fix it. I know I'm not that good yet, but at least I can report it to the authors. And this is back on, you know, early days of GitHub, Google, Code Plex, and source forage, of course, to kind of let people know. So I started getting engaged with the open source community. And then when I started seeing that, hey, I can do this too. I can create packages. I can create things that are useful to me and share them with others that kind of fed this desire to produce more utility, to produce more valuable things. And just having open source as the forefront of a here's here's something that we all rely on. Now it's time to give it back, right? Like every company I've worked for has used piles and piles of open source. So not every company is dedicating engineers to work on it. But it's a great opportunity for engineers to kind of pick up new and interesting parts of their career because it's like, all right, well, it's not this is not for work. I could do this for fun. And that's also a great way to do it. And when it came to PyPI and why I got involved there, well, I used Python a lot, right? You know, over the course of different companies have moved off of Ruby predominantly into Python land and moved more into management. And as a manager, you know, you do less and less hands-on development, right? At a certain scale, right? I was a senior director of engineering at one startup. I was a VP over at another enterprise. And you know, I always like to solve puzzles and open source is a great way to solve puzzles. It's a way to stay involved and stay empathetic with the engineers who worked for me to say, you know what, I know your pain. I know what you're feeling. If you're telling me this is going to take a lot longer, it's not your complaining. And it's, you know, you're trying to beat the clock or be lazy. It's no, this is actually really hard. So as a way to continue to build that empathy, I just kept contributing to open source in my spare time. And with PyPI specifically, there was this one little feature that I wanted. There was one feature. It always starts with like one little thing. Sure, sure. And it was, I wanted the on the main page of PyPI.org. There's a search bar at the top. And through Quarks and other websites, you kind of learn keyboard shortcuts. Right. If you've ever turned on keyboard shortcuts for Gmail, then you learn those. If you've ever used VIM, you kind of need to know all of your keyboard shortcuts. Or you'll never exit. You'll never exit. If you pick up VS code or PyCharm, right, I'm a big PyCharm user now, then you need to kind of learn these keyboard shortcuts because they will make you more effective. You can live without them, but they will make you more effective. And going to the PyPI.org website, there's a search bar. I naturally just hit the slash key to focus. It should hop the search bar. To hop there, right. And that's a, that's a behavior. I don't know who came up with it, but somebody came up with it. And now it's like, Oh, I, I do that to focus search bar. I think even Google does it now, which is great. But PyPI didn't have it. And it was like, all right, well, that's something that I think I can figure out. Okay. So I spent some time, I checked out the code, I read the dev docs, I figured out how, how some stuff was broken and fixed the dev environment. Like there was a bunch of stuff that it was like, okay, this needs to be brushed up. And then I got my feature and, and sent it up there and got some good feedback from the existing maintainers. And after I finished that, one of them was like, Hey, what do you think about this? Right? You want to try this one? I'm trying to figure that one out. Because like the maintainers, they, you know, they obviously have a better idea of what's in the like the issue backlog and kind of where to point people's efforts. So it was like, right, right, I think
Speaker 2
you might be good for this thing. Yeah. Yeah. And
Speaker 1
like what's, what's crazy is that feature was very much JavaScript. And that's not my strongest language. And I was like, I can do this. I can figure this out. And since it's, it's been, you know, I took the time to revamp the entire JavaScript stack and asset building for PIPI. So that way it's, it's a lot nicer for developers or contributors who want to work on it. And then got more into the Python as side of PIPI. So it was really just a, an excellent way of like, Oh, I want this thing. I think this thing should exist. I could ask for it. No one's going to do it. Because like, it's not their priority. Right. And if it was, they would have, right? So it's like, if you're not willing to like put in the time to give back, then like, all right, and you're just, you're just kind of asking for somebody to do free work for you. Whereas here, it was like, no, I will, I will put in the time and effort to figure this
Speaker 2
out. Yeah, cool. In the new role that you're taking on here, what are you most excited
Speaker 1
about? So in this role, I think the important parts are really focusing on the security aspects. Because again, it's an ecosystem of packaging. And like there's, there's any number of things that one could do. But by having some focus on safety security, it helps kind of narrow my focus onto, well, I could do a thousand things. Here are the three that I should do right now. And that kind of helps anybody very clearly define what it is they're going to do next.
Speaker 2
Yeah. Okay. And so is there a background you have in securing things like this, or you talked about a variety of different ecosystems, and then I guess maybe related to that, what are the concerns that IPI has towards security? Maybe we can talk some history
Speaker 1
there. Yeah, I mean, some folks might say, okay, Mike doesn't have, you know, a pure like security background. And it's like, you're right. I've not like a network security engineer. That's something that exists. But I have been a part and a leader for a bunch of security initiatives and platforms and stuff throughout my tenure as a engineer engineering manager that kind of positions me in a spot where it's like, okay, I'm not just worried about like the security finding. I'm also worried about like sustainable building of a secure outcome. So it's not always that I have to be the expert, but I have to be the knowledgeable generalist who can pull in expertise or rely on expertise of others in order to achieve the desired outcome. For instance, I did a post a we blog on blog.pypi.org. I did a post on just measuring our inbound malware reporting. So what happens a lot is people will put together a malicious package with some garbage name and they'll sign up for an account on ppypi because it's free service. And they will upload a package that has some malicious intent, right? Very often it's, you know, when the user installs it, look for any kind of environment variables that look interesting and then post them to some, you know, harvester. Yeah. So we partner with a bunch of different, again, volunteer security research teams that will report those to us. And today, the reporting process is very much email us to a security inbox. And we then analyze their report. And then we take a look at what the, you know, what the indicators are and make an informed decision as a maintainer, as an admin of the service, what to do next. So when it came time to say, all right, well, how bad is this? How often does this happen? I sat down and said, okay, let's analyze, analyze the data, right? You know, I can, I can come up with some ideas. But doing the data analysis is not necessarily a security competency, if you will, right? But like building a tool to scrape through Google email messages and look for the particulars and then produce results and do some data analysis and graphing. Like, that's not a security role, but it is very much a comprehensive analysis role for how we're doing on a security front.
Speaker 2
Yeah. Yeah. And you can read that on blog.pypi.org. Yeah. We love digging into those kinds of things and keeping people aware of, hey, this is what's happening out there. And you know, you need to be aware of it. So I see that your role is a lot of troubleshooting and problem solving. And do you feel like you're going to be pivoting a lot and using lots of different skills that you've developed across the spectrum? I
Speaker 1
mean, I certainly hope so, right? Like right now, some of my focus is very much on malware and this topic of malware and malicious package reporting. There's an enormous amount of focus in the industry around software package repository security and dealing with malware and reducing the time malware is potentially out there reporting vulnerabilities. And that's very much where I'm focused right now is how do we optimize this process? How do we make it easier for folks to report to us and with the structured information that we want, as well as what do we do as admins? How can we reduce the burden on admins and get to the point where we're almost semi automated in our reactions? So that way we can reduce the amount of human oversight is absolutely necessary. Obviously, there's still plenty that's going to require human intervention and analysis. But the more we can automate, the better, right? Then we can do more with our time.
Speaker 2
Yeah, maybe a lot of the attacks are automated and so it's like using bots against bots. Yeah. Well, let's talk, let's transition to that idea of like, what's happening with, this is a program started almost a year and a half ago, is that right? To get two of a happening with PIPI, maybe we can talk about the background of it and then the current push and deadlines, if
Speaker 1
you will. Yeah, absolutely. So the topic of software supply chain is on kind of every chief information officer and CTO's mind because this is a topic that continues to evolve, right? Because open source is so prolific and is out there prevalent, if you will, out there in the world, we're all using it. Well, how do I know what I'm using and how do I trust what I'm using? Yeah. So there's a variety of different initiatives going on around, okay, well, how do I create software attestations? How do I get software bill of material? So I know what I have. Yeah, yeah. And all of those are really, really important because they kind of give you that confidence in, I know what I have, but they all kind of fail on this one point is like, well, okay, just because I have it, is it good? Is it any good? Is it safe? So some notable kind of software supply chain vulnerabilities that have been a kind of reported in the past are when somebody who had, you know, a package that is of some popularity and that maintainer loses control of their, you know, their publishing account, right? The whether that be through, you know, an API token leak, you know, maybe they swipe their laptop at some event and then, you know, that was improperly secured. And now there's a token that was valid. Hey, we can upload a new version of this very popular Python package to the registry. And it says it's for me and everyone will trust that because it's me, but it contains some new malicious stuff in it. So in order to kind of combat that behavior, we've also tried to enact that like, okay, well, let's reduce the ability to use a username and password to upload to pipi.org, right? Okay. You instead of using a username password, you should use an API token, right? Why? Because those are a little easier to track and we can kind of invalidate those without invalidating a user account, which that's great. But even better is, all right, well, what if that user password was, was obtained and somebody can log in and create a new API token, right? There's nothing stopping them from doing that. So how do we combat what's called account takeover? Okay, right. So if the account is taken over, whether it be by, okay, I got access to your computer and I got in there, or I did a credential stuffing attack and use the password on a similar website. Now I'm into pipi, or even even kind of, another one is a domain expiry attack, which is fascinating. So let's say you signed up for pipi with, you know, your Christopher at real Python.com. Yeah. If real Python.com expires and you forget to renew it, because people are people and we forget to renew things, right? How many
Speaker 2
domains do I have?