Software Sessions cover image

Software Sessions

Latest episodes

undefined
Feb 28, 2025 • 46min

Hong Minhee on ActivityPub

Hong Minhee is an open source developer and the creator of the Fedify ActivityPub server framework. We talk about how applications like Mastodon and Misskey communicate with one another using ActivityPub. This includes discussions on built-in activites, extending the specification in a backwards compatible way, difficulties implementing JSON-LD, the inbox model, and his experience implementing the specification. Hong Minhee: activitypub profile fedify hollo Specifications: ActivityPub W3C specification JSON Linked Data Resource Description Framework W3C Semantic Web Standards ActivityPub and WebFinger ActivityPub and HTTP Signatures ActivityPub implementations: Mastodon Misskey Akkoma Pleroma Pixelfed Lemmy Loops GoToSocial ActivityPub support in Ghost Threads has entered the Fediverse ActivityPub tools: ActivityPub Academy BrowserPub fedify CLI -- Transcript You can help correct transcripts on GitHub. What's ActivityPub? [00:00:00] Jeremy: Today, I'm talking to Hong Minhee. He is the developer of Fedify. A TypeScript library for building ActivityPub server applications. The first thing I think we should start with is defining ActivityPub. what is ActivityPub? [00:00:16] Hong: ActivityPub is the protocol that lets social networks talk to each other and it's officially recommended by W3C. It's what powers this thing we call the Fediverse which is basically a way for different social media platforms to work together. Users of ActivityPub [00:00:39] Jeremy: Can you give some examples that people might have heard of -- either users of ActivityPub or things that are a part of this fediverse? [00:00:50] Hong: Mastodon is probably the biggest one out there. And you know what's interesting? Meta threads has actually started implementing ActivityPub this summer. So this still pretty much a one way street right now. In East Asia, especially Japan, there's this really popular microblogging platform called misskey. It's got so many forks that people actually joke around and called them forkeys. but it's not just about Twitter style microblogging, there's Pixelfed which is kind of like Instagram, but for the fediverse. And those same folks recently launched loops. Which is basically doing what TikTok does, but in the Fediverse. Then you've got stuff like Lemmy and which are doing the reddit thing up in the Fediverse. [00:02:00] Jeremy: Oh like Reddit. [00:02:01] Hong: Yeah. There are so much more out there that I haven't even mentioned. Um, most of it is open source, which is pretty cool. [00:02:13] Jeremy: So the first few examples you gave, Mastodon and Meta's threads, they're very similar to, to Twitter, right? So that's what you were calling the, the Microblogging applications. And I think what you had said, which is a little bit interesting is you had said Metas threads is only one way. So could you kind of describe like what you mean by that? [00:02:37] Hong: Currently meta threads only can be followed by other ActivityPub applications but you cannot follow other people in the fediverse. [00:02:55] Jeremy: People who are using another Microblogging platform like Mastodon can follow someone on Meta's Threads platform. But the other way is not true. If you're on threads, you can't follow someone on Mastodon. [00:03:07] Hong: Yes, that's right. [00:03:09] Jeremy: And that's not a limitation of the protocol itself. That's a design decision or a decision made by Meta. [00:03:17] Hong: Yeah. They are slowly implementing ActivityPub and I hope they will implement complete ActivityPub in the future. Interoperability through Activities [00:03:27] Jeremy: And then the other examples you gave, one is I believe it was Pixel Fed is very similar to Instagram. And then the last examples you gave was I think it was Lemmy, you said it's similar to Reddit. Because you mentioned the term Fediverse before and you mentioned that these all use ActivityPub and since these seem like different kinds of applications, what does it mean for them to interact? Because with Mastodon and Threads I can kind of understand because they're both similar to Twitter. So you're posting messages and replying, but, but what does it mean, for example, for someone on Mastodon to interact with someone on Lemmy which is like Reddit because they seem very different. [00:04:16] Hong: People in Lemmy and Mastodon are called actors and can follow each other. They have interactions between them called activities. And there are several types of activities like, create and follow and undo, like, and so on. So, ActivityPub applications tend to, use these vocabulary to implement their features. So, for example, Lemmy uses like activities for upvoting and like activities for down voting and it's translated to likes in Mastodon. So if you submit a post on Lemmy and it shows up on your Mastodon timeline. If you like that post (it) is upvoting in Lemmy. [00:05:36] Jeremy: And probably similarly with Pixelfed, which you said is like Instagram, if you follow someone's Pixelfed account in Mastodon and they post a photo in Pixel Fed, they would see it as a post in Mastodon natively and they could give it a like there. Adding activities or properties [00:05:56] Jeremy: And these activities that you mentioned -- So the like and the dislike are those part of ActivityPub itself? [00:06:05] Hong: Yes, and this vocabulary can be extended. [00:06:10] Jeremy: So you can add, additional actions (activities) or are you adding information (properties) to the existing actions? [00:06:37] Hong: It is called activity vocabulary, and there are, things like accept, add, arrive, block, create lead, dislike, flag, follow, ignore invite, join, and so on. So, basically, almost everything you need to build social media is already there in the vocabulary, but if you want to extend some more, you can define your, own vocabulary. [00:06:56] Jeremy: Most of the things that an Instagram or a Twitter, or a Reddit would need is already there. But you're saying that you can have your own vocabulary. So if there's an action or an activity that is not covered by the specification, you can create one yourself. [00:07:13] Hong: Yes. For example, Misskey and Pleroma defined emoji reactor to represent emoji reactions. [00:07:25] Jeremy: Because the systems can extend the vocabulary. What are some other examples of cases where mastodon or any other of these systems has found that the existing vocabulary is not enough. What are some other examples of applications extending it? [00:07:45] Hong: For example, uh, mastodon defined suspended -- suspended property. They are not activities, but they are properties in the activity. ActivityPub consists of several types of objects and there are activities and normal objects like, article. they can have properties and there are several existing properties, but they can be also extended. So Mastodon extended some properties they need. So for example, they define suspended or discoverable. [00:08:44] Hong: Suspended for to tell if an actor is suspended by moderators. Discoverable tells if an actor itself wants to be, searched and indexed, and there are much more properties. Mastodon extended. Actors [00:09:12] Jeremy: And these are, these are properties of the actor. These are properties of the user? [00:09:19] Hong: Yes. Actors. [00:09:21] Jeremy: Cause I think earlier you mentioned that. The concept of a user is an actor, and it sounds like what you're saying is an actor can have all these properties. There's probably a, a username and things like that, but Mastodon has extended the properties so that, you can have a property on whether you wanna be searched or indexed you can have a property that says you're suspended. So I guess your account, is still there, but can't be used anymore. Something we should probably talk about then is, so you have these actors, you have these activities that I'm assuming the actors are performing on one another. What does that data look like and what does the communication look like? [00:10:09] Hong: Actors have their own dereferencable URI and when you look up that URI you get all the info about the actor in JSON-LD format [00:10:22] Jeremy: JSON-LD? [00:10:23] Hong: Yeah. JSON-LD. linked data. (The) Actor has all the stuff you expect to find on a social account name, bio URL to the profile page, profile picture, head image and more. And there are five main types of actors: application, group, organization, person and service. And you know how sometimes on Mastodon you will see an account marked as a bot? [00:10:58] Jeremy: A bot? [00:10:59] Hong: Yeah. Bot and that's what an actor of type service looks like. And the ActivityPub spec actually let you create other types beyond these five. But I haven't seen anyone actually do that yet. JSON-LD [00:11:15] Jeremy: And you mentioned that these are all JSON objects. but the LD part, the linked data part, I'm not familiar with. So what different about the linked data part of the JSON? [00:11:31] Hong: So JSON-LD is the special way of writing RDF. Which was originally used in the semantic web. Usually RDF uses (a) format (that) is called triples. [00:11:48] Jeremy: Triples? [00:11:49] Hong: Yeah, subject and predicate and object. [00:11:55] Jeremy: Subject, predicate, object. Can you give an example of what those three would be? [00:12:00] Hong: For example, is a person, it's a triple. John is a subject and is a predicate [00:12:11] Jeremy: is, is the predicate. [00:12:12] Hong: Okay. And person is a object. That's great for showing how things are connected, but it is pretty different from how we usually handle data in REST for APIs and stuff. Like normally we say a personal object has property like name, DOB, bio, and so on. And a bunch of subject predicated object triples that's where JSON-LD comes in -- is designed to look more like the JSON we are used to working with, while still being able to represent RDF Graphs. RDF graph are ontology. It's a way to represent factual data, but is, quite different from, how we represent data in relational database. And it's a bunch of triples each subject and objects are nodes and predicates connect these nodes. Semantic Web [00:13:30] Jeremy: You mentioned the Semantic web, what does that mean? What is the semantic web? [00:13:35] Hong: It's a way to represent web in the structural way, is machine readable so that you can, scan the data in the web, using scrapers or crawlers. [00:13:52] Jeremy: Scrapers -- or what was the second one? Crawling. [00:13:59] Hong: Yeah. Then you can have graph data of web and you can, query information about things from the data. [00:14:14] Jeremy: So is the web as it exists now, is that the Semantic web or is it something different? [00:14:24] Hong: I think it is partially semantic web, you have several metadata in Your HTML. For example, there are several specification for semantic web, like, OpenGraph metadata. [00:14:32] Jeremy: Cause when I think about OpenGraph, I think about the metadata on a webpage that, that tells other applications or websites that if you link to this page: show this image or show this title and description. You're saying that specifically you consider part of the semantic web? [00:15:05] Hong: That's, semantic web. To make your website semantic web. Your website should be able to, provide structural data. And other people can make Scrapers to scan, structural data from your website. There are a bunch of attributes and text for HTML to represent metadata. For example you have relation attribute rel so if you have a link with rel=me to your another social profile. Then other people can tell two web pages represent the same person. [00:16:10] Jeremy: Oh, I see. So you could have more than one website. Maybe one is your blog and maybe one is your favorite birds or something like that. But you could put a rel tag with information about you as a person so that someone who scrapes both websites could look at that tag and see that both of these websites are by, Hong, by this person. JSON-LD is difficult to implement and not used as intended [00:16:43] Hong: Yeah. I think JSON-LD is, designed for semantic web, but in reality, ActivityPub implementations, most of them are, not aware of semantic web. [00:17:01] Jeremy: The choice of JSON Linked Data, the JSON-LD, by the people who made the specification -- They had this idea that things that implemented ActivityPub would be a part of this semantic web, but the actual implementation of a Mastodon or a Pixelfed, they use JSON-LD because it's part of the specification, but the way they use it, it ends up not really being a part of this semantic web. [00:17:34] Hong: Yeah, that's exactly.. [00:17:37] Jeremy: You've mentioned that implementing it is difficult. What makes implementing JSON LD particularly hard? [00:17:48] Hong: The JSON-LD is quite complex. Which is why a lot of programming language don't even have JSON-LD implementations and it's pretty slow compared to just working with the regular JSON. So, what happens is a lot of ActivityPub implementations just treat JSON-LD like (it) is regular JSON without using a proper JSON-LD processor. You can do that, but it creates a source of headache. In JSON-LD there are weird equivalences like if a property is missing or if it's an empty array, that means the same thing. Or if a property has one value versus an array with just that one value in it, same thing. So when you are writing code to parse JSON-LD, you've got to keep checking if something's an array how long it is and all that is super easy to mess up. It's not just reading JSON-LD that's tricky. Creating it is just as bad. Like you might forget to include the right context metadata for a vocabulary and end up with a JSON-LD document that's either invalid or means something totally different from what you wanted. Even the big ActivityPub implementations mess this up pretty often. With Fedify we've got a JSON-LD processor built in and we keep running into issues where major ActivityPub implementations create invalidate JSON-LD. We've had to create workaround for all of them, but it's not pretty and causes kind of a mess. [00:19:52] Jeremy: Even though there is a specification for JSON-LD, it sounds like the implementers don't necessarily follow it. So you are kind of parsing JSON-LD, but not really. You're parsing something that. Looks like JSON-LD, but isn't quite it. [00:20:12] Hong: Yes, that's right. [00:20:14] Jeremy: And is that true in the, the biggest implementations, Mastodon, for example, are there things that it sends in its activities that aren't valid JSON-LD? [00:20:26] Hong: Those implementations that had bad JSON-LD tends to fix them soon as a possible. But regressions are so often made. Yeah. [00:20:45] Jeremy: Even within Mastodon, which is probably one of the largest implementers of ActivityPub, there are cases where it's not valid, JSON-LD and somebody fixes it. But then later on there are other messages or other activities that were valid, but aren't valid anymore. And so it's this, it's this back and forth of fixing them and causing new issues it sounds ... [00:21:15] Hong: Yeah. Yeah. Right. [00:21:17] Jeremy: Yeah. That sounds very difficult to deal with. How instances communicate (Inbox) [00:21:20] Jeremy: We've been talking about the messages themselves are this special format of JSON that's very particular. but how do these instances communicate with one another? [00:21:32] Hong: Most of time, it all starts with a follow. Like when John follows Alice, then Alice adds both John and John's inbox URI to her followers list, and after John follows Alice, Whenever Alice posts something new that activities get sent to John's inbox behind the scenes. This is just one HTTP post request. Even though ActivityPub is built on HTTP. It doesn't really care about the HTTP response beyond did it work or not. If you want to reply to an activity, you need to figure out the standard inbox, URI and send or reply activity there. [00:22:27] Jeremy: If we define all the terms, there's the actor, which is the person, each actor can send different activities. those activities are in the form of a JSON linked data. [00:22:40] Hong: Yeah. [00:22:42] Jeremy: And everybody has an inbox. And an inbox is an HTTP URL that people post to. [00:22:50] Hong: Right. [00:22:52] Jeremy: And so when you think about that, you had mentioned that if you have a list of followers, let's say you have a hundred followers, would that mean that you have the URLs to all hundred of those follower's inboxes and that you would send one HTTP post to each inbox every time you had a new message? [00:23:16] Hong: Pretty much all ActivityPub implementations have, a thing called shared inbox, it's exactly what it sounds like. One inbox that all actors on a server share. Private stuff like DMs don't go there (it) is just for public posts and thoughts. [00:23:36] Jeremy: I think we haven't really talked about the fact that, when you have multiple users, usually they're on a server, right? That somebody chooses. So you could have tens of thousands, I don't know how many people can fit on the same server. But, rather than, you having to post to each user individually, you can post to the shared inbox on this server. So let's say, of your 100 followers, 50 them are on the same server, and you have a new post, you only need to post to the shared inbox once. [00:24:16] Hong: Yes, that's right. [00:24:18] Jeremy: And in that message you would I assume have links to each of the profiles or actors that you wanted to send that message to. [00:24:30] Hong: Yeah. Scaling challenges [00:24:31] Jeremy: Something that I've seen in the past is there are people who have challenges with scaling. Their Mastodon instance or their implementations of ActivityPub. As the, the number of followers grow, I've seen a post about, ghost one of the companies you work with mentioning that they've had challenges there. What are the challenges there and, and how do you think those can be resolved? [00:25:04] Hong: To put this in context, when Ghost mentioned the scaling, they were not using Message Queue yet. I'm pretty sure using Message Queue would help a lot of their scaling problems. That said it is definitely true that a lot of activity post software has trouble with scaling right now. I think part of the problem is that everyone's using this purely event driven approach to sending activities around. One of the big issues is that when their delivery fails it's the sender who has to retry and not the receiver. Plus there's all this overhead because the sender has to authenticate itself with HTTP signatures every time. Actually the ActivityPub spec suggests using polling too so I'd love to see more ActivityPub software try using both approaches together. [00:26:16] Jeremy: You mean the followers would poll who they're following instead of the person posting the messages having to send their posts to everyone's inboxes. [00:26:29] Hong: Yeah. [00:26:29] Jeremy: I see. So that's a part of the ActivityPubs specification, but not implemented in a lot of ActivityPub implementations, And so it sounds like maybe that puts a lot of burden on the servers that have people with a lot of followers because they have to post to every single, follower server and maybe the server is slow or they can't reach it. And like you said, they have to just keep trying and trying. There could be a lot of challenges there. [00:27:09] Hong: Right. Account migration [00:27:10] Jeremy: We've talked a little bit about the fact that each person each actor is hosted by a server and those servers can host multiple actors. But if you want to move to another server either because your server is shutting down or you just would like to change servers, what are some of the challenges there? [00:27:38] Hong: ActivityPub and Fediverse already have the specification for an account move. It's called FEP-7628 Move Actor. First thing you need to do when moving an account is prove that both the old and new accounts belong to the same person. You do this by adding the all accounts, add the URI to the new account's AlsoKnownAs property. And then the old account contacts all the other instances it's moving by sending out a move activity. When a server gets this move activity, it checks that both accounts really do belong to the same parts, and then it makes all the accounts that, uh, were following the, all the accounts start to, following the new one instead. that's how the new account gets to keep all the, all the accounts follow us. pretty much all, all the major activity post software has this feature built in, for example, Mastodon Misskey you name it. [00:29:04] Jeremy: This is very similar to the post where when you execute a move, the server that originally hosted that actor, they need to somehow tell every single other server that was following that account that you've moved. And so if there's any issues with communicating with one of those servers, or you miss one, then it just won't recognize that you've moved. You have to make sure that you talk to every single server. [00:29:36] Hong: That's right. [00:29:38] Jeremy: I could see how that could be a difficult problem sometimes if you have a lot of followers. [00:29:45] Hong: Yeah. Fedify [00:29:46] Jeremy: You've created a TypeScript library Fedify for building ActivityPub powered applications. What was the reason you decided to create Fedify? [00:29:58] Hong: Fedify is (a) ActivityPub servers framework I built for TypeScript. It basically takes away a lot of headaches you'd get trying to implement (an) ActivityPub server from scratch. The whole thing started because I wanted to build hollo -- A single user microblogging platform I built. But when I tried, to implement ActivityPub from (the) ground up it was kind of a nightmare. Imagine trying to write a CGI program in Perl or C back in the late nineties, where you are manually printing, HTTP headers and HTML as bias. there just wasn't any good abstraction layer to go with. There were already some libraries and frameworks for ActivityPub out there but none of them really hit the sweet spot I was looking for. They were either too high level and rigid. Like you could only build a mastodon clone or they barely did anything at all. Or they were written in languages I didn't really know. Ghost and Fedify [00:31:24] Jeremy: I saw that you are doing some work with, ghost. How is Ghost using fedify? [00:31:30] Hong: Ghost is an open source publishing platform. They have put some money into fedify which is why I get to work on it full time now. Their ActivityPub feature is still in private beta but it should be available to everyone pretty soon. We work together to improve fedify. Basically they are a user of fedify. They report bugs request new features to fedify then I fix them or implement them, first. [00:32:16] Jeremy: Ghost to my understanding is a blogging platform and a a newsletter platform. So what does it mean for them to implement ActivityPub? What would somebody using Mastodon, for example, get when they follow somebody using Ghost? [00:32:38] Hong: Ghost will have a fediverse handle for each blog. If you follow them in your mastodon or something (similar) then a new post is published. These post will show up (in) your timeline in Mastodon and you can like them or share them. Andin the dashboard of Ghost you can see who liked their posts or shared their posts and so on. It is like how mastodon works but in Ghost. [00:33:26] Jeremy: I see. So if you are writing a ghost blog and somebody follows your blog from Mastodon, sort of like we were talking about earlier, they can like your post, and on the blog itself you could show, oh, I have 200 likes. And those aren't necessarily people who were on your ghost website, they could be people that were liking your post from Mastodon. [00:33:58] Hong: Yes. Misskey / Forkey development in Asia [00:34:00] Jeremy: Something you mentioned at the beginning was there is a community of developers in Asia making forks of I believe of Mastodon, right? [00:34:13] Hong: Yeah. [00:34:14] Jeremy: Do you have experience working in that development community? What's different about it compared to the more Western centric community? [00:34:24] Hong: They are very similar in most ways. The key difference is language of course. They communicate in Japanese primarily. They also accept pull requests with English. But there are tons of comments in Japanese in their code. So you need to translate them into English or your first language to understand what code does. So I think that makes a barrier for Western developers. In fact, many Western developers that contribute to misskey or forkey are able to speak a little Japanese. And many of the developers of misskey and forkey are kind of otaku. [00:35:31] Jeremy: Oh otaku okay. [00:35:33] Hong: It's not a big deal, but you can see (the) difference in a glance. [00:35:41] Jeremy: Yeah. You mentioned one of the things that I believe misskey implemented was the emoji reactions and maybe one of the reasons they wanted that was so that they could react to each other's posts with you know anime pictures or things like that. [00:35:58] Hong: Yeah, that's right. [00:36:01] Jeremy: You've mentioned misskey and forkey. So is misskey a fork of Mastodon and then is forkey a fork of misskey? [00:36:10] Hong: No, misskey is not a fork of mastodon. (It) is built from scratch. It's its own implementation. And forkeys are forks of Mastodon. [00:36:22] Jeremy: Oh, I see. But both of those are primarily built by Japanese developers. [00:36:30] Hong: Yes. Whereas Mastodon (is) written in Ruby. Ruby on Rails. But misskey is built in TypeScript. [00:36:40] Jeremy: And because of ActivityPub -- they all implement it. So you can communicate with people between mastodon and misskey because they all understand the same activities. [00:36:56] Hong: Yes. Backwards compatible activity implementations [00:36:57] Jeremy: You did mention since there are extensions like misskey has the emoji reactions. When there is an activity that an implementation doesn't support what happens between the two servers? Do you send it to a server's inbox and then the server just doesn't do anything with it? [00:37:16] Hong: Some implementers consider backwards compatibility. So they design (it) to work with other implementations that don't support that activity. For example misskey uses like activity for emoji reaction. So if you put an emoji to a Mastodon post then in Mastodon you get one like. So it's intended behavior by misskey developers that they fall back to normal likes. But sometimes ActivityPub implementers introduce entirely new activity types. For example Pleroma introduced the emoji react. And if you put emoji reaction to Mastodon post from Pleroma in Mastodon you have nothing to see because Mastodon just ignores them. [00:38:37] Jeremy: If I understand correctly, both misskey and Pleroma are independent implementations of ActivityPub, but with misskey, they can tell when or their message is backwards compatible where it's if you don't understand the emoji reaction, it'll be embedded inside of a like message. Whereas with Pleroma they send an activity that Mastodon can't understand at all. So it just doesn't do anything. [00:39:11] Hong: Yes, right. But, Misskey also understands (the) emoji react activity. So between pleroma and misskey they have exchanged emoji reactions with no problem. [00:39:27] Jeremy: Oh, I see. So they, they both understand that activity. They both implement it the same way, but then when misskey communicates with Mastodon or with an instance that it knows doesn't understand it, it sends something different. [00:39:45] Hong: Yeah, that's right. [00:39:47] Jeremy: The servers -- can they query one another to know which activities they support? [00:39:53] Hong: Usually ActivityPub implementations also implement NodeInfo specification. It's like a user agent-like thing in Fediverse. Implementations tell the other instance (if it) is Mastodon or something else. You can query the type of server. [00:40:20] Jeremy: Okay, so within ActivityPub are each of the servers -- is the term node is that the word they use for each server? [00:40:31] Hong: Yes. Right. [00:40:32] Jeremy: You have the nodes, which can have any number of actors and the servers send activities to one another, to each other's inboxes. And so those are the way they all communicate. [00:40:49] Hong: Yeah. Building an ActivityPub implementation [00:40:50] Jeremy: You've implemented ActivityPub with Fedify because you found like there weren't good enough implementations or resources already. Did you implement it based off of the specification or did you look at existing implementations while you were building your implementation? [00:41:12] Hong: To be honest, instead of just, diving into the spec. I usually start by looking at actually ActivityPub software code first. The ActivityPub spec is so vague that you can't really build something just from reading it. So when we talk about ActivityPub, we are actually talking about a whole bunch of other technical standards too, WebFinger, HTTP signatures and more. So you need to understand all of these as well. [00:41:47] Jeremy: With the specification alone, you were saying it's too vague and so what ends up being -- I'm not sure if it's right to call it a spec, but looking at the implementations that people have already made that collectively becomes the spec because trying to follow the spec just by itself is maybe too difficult. [00:42:12] Hong: Yes. [00:42:14] Jeremy: Maybe that brings up the issues you were talking about before where you have specifications like JSON-LD where they're so complicated that even the biggest implementations aren't quite following it exactly. [00:42:28] Hong: Yeah. [00:42:29] Jeremy: If somebody wanted to, to get started with understanding a little bit more about ActivityPub or building something with it where would you recommend they start? [00:42:44] Hong: I recommend to dig into a lot of code from actual implementations. First, Mastodon, Misskey, Akkoma and so on. There are are some really cool tools that have been so helpful. For example, ActivityPub Academy is this awesome mastodon server for debugging ActivityPub. It makes it super easy to create a temporary account and see what activities are going back and forth. There is also BrowserPub. BrowserPub is this neat tool for looking up and browsing ActivityPub objects. It's really handy when you want to see how different ActivityPub software handles various features. I also recommend to use Fedify. I've got to mention the Fedify CLI, which comes with some really useful tools. [00:43:46] Jeremy: So if someone uses Fedify they're writing an application in TypeScript, then it sounds like they have to know the high level concepts. They have to know what are the different activities, what is inside of an actor. But the actual implementation of how do I create and parse JSON linked data, those kinds of things are taken care of by the library. [00:44:13] Hong: Yes, right. [00:44:16] Jeremy: So in some ways it seems like it might be good to, like you were saying, use the tools you mentioned to create a test Mastodon account, look at the messages being sent back and forth, and then when you're trying to implement it, starting with something like Fedify might be good because then you can really just focus on the concepts and not worry so much about the, the implementation details. [00:44:43] Hong: Yes, that's right. [00:44:45] Jeremy: Is there anything else you. Wanted to mention or thought we should have talked about? [00:44:52] Hong: Mm. I want to, talk about, a lot of stuff about ActivityPub but it's difficult to speak in English for me, so, it's a shame to talk about it very little. [00:45:15] Jeremy: We need everybody to learn Korean right? [00:45:23] Hong: Yes, please. (laughs) [00:45:23] Jeremy: Yeah. Well, I wanna thank you for taking the time. I know it must have been really challenging to give an interview in, you know, a language that's not your native one. So thank you for spending the time to talk with me. [00:45:38] Hong: Thank you for having me.
undefined
Feb 25, 2025 • 1h 13min

Prefetcher on Building PinkSea on the AT Protocol

Kacper "prefetcher" Staroń created the PinkSea oekaki BBS on top of the AT Protocol. He also made the online multiplayer game MicroWorks with Noam "noam 2000" Rubin. He's currently studying Computer Science at the Lublin University of Technology. We discuss the appeal of oekaki BBSs, why and how PinkSea was created, web design of the early 2000s, flash animations, and building an application on top of the AT Protocol. Prefetcher Bluesky Github Personal site Microworks (Free multiplayer game) PinkSea and Harbor PinkSea PinkSea Bluesky Account PinkSea repository Harbor image proxy repository Harbor post from bnewbold.net imgproxy (Image proxy used by Bluesky) Early web design Web Design Museum Pixel Art in Web Design Kaliber10000 Eboy Assembler 2advanced epuls.pl (Polish social networking site) Wipeout 3 aesthetic Restorativland (Geocities archive) Flash sites and animations My Flash Archive (Run by prefetcher) dagobah Z0r Juicy Panic - Otarie IOSYS - Marisa Stole the Precious Thing Geocities style web hosts Neocities Nekoweb AT Protocol / Bluesky PDS Relay AppViews PLC directory Decentralized Identifier lexicon Jetstream XRPC ATProto scraping (List of custom PDS and did:web) Tools to view PDS data PDSls atp.tools ATProto browser Posters mentioned vertigris (Artist that promoted PinkSea) Mary (AT Protocol enthusiast) Brian Newbold (Bluesky employee) Oekaki drawing applets Tegaki chickenpaint Group drawing canvas Drawpile Aggie Other links Bringing Geocities back with Kyle Drake (Interview with creator of Neocities) firesky.tv (View all bluesky posts) ATFile (Use PDS as a file store) PinkSky (Instagram clone) front page (Hacker news clone) Smoke Signal (Meetup clone) -- Transcript You can help correct transcripts on GitHub. Intro [00:00:00] Jeremy: Today I am talking to Kacper Staroń.  He created an oekaki BBS called PinkSea built on top of the AT protocol, and he's currently studying computer science at the Lublin University of Technology. We are gonna discuss the appeal of oekaki BBS, the web design of the early 2000s, flash animations, and building an application on top of the AT protocol. Kacper, thanks for talking with me today. [00:00:16] Prefetcher: Hello. Thank you for having me on. I'm Kacper Staroń also probably you know me as Prefetcher online. And as Jeremy's mentioned, PinkSea is an oekaki drawing bulletin board. You log in with your Bluesky account and you can draw and post images. It's styled like a mid to late 2000s website to keep it in the spirit. What's an oekaki BBS? [00:00:43] Jeremy: For someone who isn't familiar with oekaki BBSs what is different about them as opposed to say, a photo sharing website? [00:00:53] Prefetcher: The difference is that a photo sharing website you have the image already premade be it a photo or a drawing made in a separate application. And you basically log in and you upload that image. For example on Instagram or pixiv for artists even Flickr. But in the case of an oekaki BBS the thing that sets it apart is that oekaki BBSes already have the drawing tools built in. You cannot upload an already pre-made image with there being some caveats. Some different oekaki boards allow you to upload your already pre-made work. But Pinksea restricts you to a tool called Tegaki. Tegaki being a drawing applet that was built for one of the other BBSes and all of the drawing tools are inside of it. So you draw from within PinkSea and you upload it to the atmosphere. Every image that's on PinkSea is basically drawn right on it by the artists. No one can technically upload any images from elsewhere. How PinkSea got started and grew [00:01:56] Jeremy: You released this to the world. How did people find it and how many people are using it? [00:02:02] Prefetcher: I'll actually begin with how I've made it 'cause it kind of ties into how PinkSea got semi-popular. One day I was just browsing through Bluesky somewhere in the late 2024s. I was really interested in the AT Protocol and while browsing, one of the artists that I follow vertigris posted a post basically saying they'd really want to see something a drawing canvas like Drawpile or Aggie on AT Protocol or something like an oekaki board. And considering that I was really looking forward to make something on the AT Protocol. I'm like, that sounds fun. I used to be a member of some oekaki boards. I don't draw well but it's an activity that I was thinking this sounds like a fun thing to do. I'm absolutely down for it. From like, the initial idea to what I'd say was the first time I was proud to let someone else use it. I think it was like two weeks. I was posting progress on Bluesky and people seemed eager to use it. That kept me motivated. And yeah. Right as I approached the finish I posted about it as a response to vertigris' posts and people seemed to like it. I sent the early version to a bunch of artists. I basically just made a post calling for them. Got really positive feedback, things to fix, and I released it. And thanks to vertigris the post went semi-viral. The launch I got a lot of people which I would also tie to the fact that it was right after one of the user waves that came to Bluesky from other platforms. The website also seemed really popular in Japan. I remember going to sleep, waking up the next day, and I saw like a Japanese post about PinkSea and it had 2000 reposts and 3000 likes and I was just unable to believe it. Within I think the first week we got like 1000 posts overall which to me is just insane. For a week straight I just kept looking at my phone and clicking, refresh, refresh, refresh, just seeing the new posts flow in. There was a bunch of like really insane talented artists just posting their works. And I just could not believe it. PinkSea got I'd say fairly popular as an alternative AppView. People seem to really want oekaki boards back and I saw people going, oh look, it's like one of those 2000s oekaki boards! Oh, that's so cool! I haven't seen them in forever! The art stands out because it's human made [00:04:58] Prefetcher: And it made me so happy every single time seeing it. It's been since November, like four months, give or take. And today alone we got five posts. That doesn't sound seem like a lot but given that every single post is hand drawn it's still insane. People go on there and spend their time to produce their own original artworks. [00:05:26] Jeremy: This is especially relevant now when you have so much image generation stuff and they're making images that look polished but you're kind of like well... did you draw it? [00:05:39] Prefetcher: Yeah. [00:05:40] Jeremy: And when you see people draw with these oekaki boards using the tools that are there I think there's something very human and very nostalgic about oh... This came from you. [00:05:53] Prefetcher: Honestly, yeah. To me seeing even beginner artists 'cause PinkSea has a lot of really, really talented and popular people (and) also beginner artists that do it as a hobby. Ones that haven't been drawing for a long time. And no matter what you look at you just get like that homely feeling that, oh, that's someone that just spent time. That's someone that just wanted to draw for fun. And at least to me, with generative AI like images it really lacks that human aspect to it. You generate an image, you go, oh, that's cool. And it just fades away. But in this case you see people that spent their time drawing it spent their own personal time. And no matter if it's a masterpiece or not it's still incredibly nice to see people just do it for fun. [00:06:54] Jeremy: Yeah. I think whether it's drawing or writing or anything now more than ever people wanna see something that you made yourself right? They wanna know that a human did this. [00:07:09] Prefetcher: Yeah. absolutely. [00:07:11] Jeremy: So it sounds like, in terms of getting the initial users and the ones that are there now, it really all came out of a single Bluesky posts that an existing artist (vertigris) noticed and boosted. And like you said, you were lucky enough to go viral and that carried you all the way to now and then it just keeps going from there, [00:07:36] Prefetcher: Basically if not for vertigris PinkSea (would) just not exist because I honestly did not think about it. My initial idea on making something on ATProto and maybe in the future I'll do something like that would be a platform like StumbleUpon -- Something that would just allow you to go on a website, press a button, and it gets uploaded to your repo and your friends would be able to see oh -- you visited that website and there would be an AppView that would just recommend you sites based on those categories. I really liked that idea and I was dead set on making it but then like I noticed that post (from vertigris) and I'm like, no, that's better. I really wanna make that. And yeah. So right here I want to give a massive shout out to vertigris 'cause they've been incredibly nice to me. They've even contributed the German translation of PinkSea which was just insane to me. And yeah, massive shout out to every single other artist that, Reposted it, liked it, used it because, it's all just snowballed from there and even recently I've had another wave of new users from the PinkSea account. So there are periods where it goes up and it like goes chill -- and then popular again. Old internet and flash [00:08:59] Jeremy: Yeah. And so something that you mentioned is that some people who came across it they mentioned how it was nostalgic or it looked like the old oekaki BBSs from the early internet. And I noticed that that was something that you posted on your own website that you have an interest in that specifically. I wonder what about that part of the internet interests you? [00:09:26] Prefetcher: That is a really good question. Like, to me, even before PinkSea my interests lie in the early internet. I run on Twitter and also on Bluesky now an account called My Flash Archive, which was an archive of very random, like flash animations. And I still do that just not as much anymore 'cause I have a lot of other things to do. I used to on Google just type in Flash and look through the oldest archived random folders just having flash videos. And I would just go over them save all of that or go on like the dagobah or Z0r or swfchan. 'cause the early internet to me, it was really like more explorative. 'cause like now you have, people just concentrated in those big platforms like Twitter, Instagram, Facebook, whatever. And back then at least to me you had more websites that you would just go on, you would find cool stuff. And the designs were like sometimes very minimal, aesthetically pleasing. I'd named here one of my favorite sites, Kaliber10000 which had just fantastic web design. Like, I, I also spend a lot of time on like the web design museum just like looking at old web design and just in awe. My flash archive on Twitter at least got very popular. I kind of abandoned that account, but I think it was sitting at 12,000 followers if not more? And showed that people also yearn for that early internet vibe. And to me it feels really warm. Really different from the internet nowadays. Even with the death of flash you don't really have interactive experiences like it anymore. 'cause flash was supposed to be replaced by HTML5 and JavaScript and whatever but you don't really make interactive experiences that just come packaged in a single file like flash. You need a website and everything. In flash, it just had a single file. It could be shared on multiple sites and just experienced. That kind of propelled my interest. Plus I, I dunno, I just really like the old internet design aesthetics it really warms me (and really close..?) Flash loops [00:12:01] Jeremy: The flash one specifically. Were they animations or games or was there a specific type of a flash project that spoke to you? [00:12:15] Prefetcher: Something we call loops. Basically, it's sometimes animations. 'cause, surprisingly while I like flash games they weren't my main collection. What spoke to me more were loops. Basically someone would take a song, find a gif they liked, and they would just pair it together. Something like YTMND did. At least from loops I found some of my favorite musical artists, some of my favorite songs, a lot of interesting series, be it anime or TV or whatever. And you basically saw people make stuff about their favorite series and they would just share it online. I would go over those. For example, a good website as an example is z0r.de, which is surprisingly still active and updated to this day. And you would see people making loops about members of that community or whatever they like. And you would for example see like 10 posts about the same thing. So you would know someone decided to make 10 loops and just upload them at once. And yeah, to me, loops basically were like, I mean, they weren't always the highest quality or the most unique thing, but you would see someone liked something enough that they decided to make something about it. And I always found that really cool. I would late at night just browse for loops and I'm like, oh, oh, this series, I remember it. I liked it (laughs)! But of course flash games as well. I mean, I used to play a lot when I was younger, but specifically loops, even animations and especially like when someone took like their time to animate something like really in depth. My favorite example is, the music video to a song by the band Juicy Panic called Otari. Someone liked that song enough that they made an entire flash animated music video, which was basically vectorized art of various series like Azumanga Daioh or Neon Genesis Evangelion as well, and other things. And it was so cool, at least to me, like a lot of these loops just basically have an intense, like immense feeling on me (laughs). I just really liked collecting them. [00:14:38] Jeremy: And in that last example, it sounded more like it was a complete music video, not just a brief loop? [00:14:45] Prefetcher: No, it was like a five minute long music video that someone else made. [00:14:48] Jeremy: Five. Oh my gosh. [00:14:49] Prefetcher: Yeah. You would really see people's creativity shine through on just making those weird things that not a lot of people have seen, but you look at it and it's like, wow. It's different than YouTube (Sharable single file, vectorized) [00:15:01] Jeremy: It's interesting because you can technically do and see a lot of these things on, say, YouTube today, but I think it does feel a little different for some reason. [00:15:16] Prefetcher: It really is. Of course I'm not denying on YouTube you see a lot of creative things and whatever. But first and foremost, the fact that Flash is scalable. You don't lose the quality. So be able to open, I don't know, any of the IOSYS flash music videos for like their Touhou songs and the thing would just scale and you would see like in 4K and it's like, wow. And yeah, the fact that on YouTube you have like a central place where you just like put something and it just stays there. Of course not counting reuploads, but with Flash you just had like this one animation file that you would just be able to share everywhere and I don't know, like the aspect of sharing, just like having those massive collections, you would see this flash right here on this website and on that website and also on this website. And also seeing people's personal collections of flash videos and jrandomly online and you would also see this file and this file that you haven't seen it -- it really gives it, it's like explorative to me and that's what I like. You put in the effort to like go over all those websites and you just like find new and new cool stuff. [00:16:32] Jeremy: Yeah, that's a good point too that I hadn't thought about. You can open these files and you have basically the primitives of how it was made and since, like you said, it's vector based, there's no, oh, can you please upload it in 1080 p or 4K? You can make it as big as you want. [00:16:53] Prefetcher: Yeah. Web design differences, pixel art, non-responsive [00:16:55] Jeremy: I think web design as well it was very distinct. Maybe because the tools just weren't there, so a lot of people were building things more from scratch rather than pulling a template or using a framework. A lot of people were just making the design theirs I think rather than putting words on a page and filling into some template. [00:17:21] Prefetcher: Honestly, you raise a good point here that I did not think much about. 'cause like nowadays we have all of this tooling to make web design easier and you have design languages and whatnot. And you see people make really, in my opinion, still pretty websites, very usable websites on top of that. But all of them have like the same vibes to them. All of them have like a unified design language and all of them look very similar. And you kind of lose that creativity that some people had. Of course, you still find pretty websites that were made from scratch. But you don't really get the same vibes that you did get like back then. Like my favorite, for example, trend that used to be back on like the old internet is pixel art in web design. For example, Kaliber10000, or going off the top of my head, you had the Eboy or all the sites and then Poland, for example, ... (polish website) those websites use minimal graphics, like pixel graphics and everything to build really interesting looking websites. They had their own very massive charm to them that, I don't know, I don't see a lot in more modern internet. And it's also because back then you were limited by screen size, so you didn't have to worry about someone being on a Mac with high DPI or on a 32x9 monitor like I am right now. And just having to scale it up. So you would see people go more for images, like UI elements, images instead of just like building everything from scratch and CSS and whatnot. So, yeah, internet design had to accommodate the change. So we couldn't stay how it was forever 'cause technology changed. Design language has changed, but to me it's really lost its charm. Every single website was different, specific, the web design had like this weird form, at least on websites where it was like. I like to call it futuristic minimalism. They looked very modern and also very minimal and sort of dated. And I dunno, I just really like it. I absolutely recommend checking, on the web design museum fantastic website. I love them and the pixel art in web design sub page. Like those websites to me they just look fantastic. [00:19:52] Jeremy: Yeah, and that's a good point you brought up about the screen sizes where now you have to make sure your website looks good on a phone, on a tablet, on any number of monitor sizes. Back then in the late 90s, early 2000s, I think most people were looking at these websites on their 4x3 small CRT monitors. [00:20:20] Prefetcher: My favorite this website is best viewed with an 800 by 600 monitor. It's like ... what? [00:20:28] Jeremy: Exactly. Even if you open your personal site now the design is very reminiscent of those times and it looks really cool but at the same time on a lot of monitors it's a small box in the middle of the monitor, so it's like -- [00:20:49] Prefetcher: I saw that issue, 'cause I was making it on a 1080p monitor and now I have a 32x9 monitor and it does not scale. I've been working on reworking that website, but, also on the topic of my website, I, I wanna shout out a website from the 2000s that still exists today. 'cause, my website was really inspired by a website called Assembler. And Assembler, from what I could gather, was like a net art or like internet design collective. And the website still works to this day. You still had like, all of their projects, including the website that my website was based off of. [00:21:28] Jeremy: Yeah, I mean there, there definitely was an aesthetic to that time. And it's probably, like you said, it's probably people seeing someone else's site in this case, what, what did you call it? Assem? Assembler? [00:21:42] Prefetcher: Assembler. [00:21:42] Jeremy: Yeah. You see someone else's website and then maybe you try to copy some of the design language or you look at the HTML and the CSS and I mean, really at the time, these websites weren't being made with a ton of JavaScript. There weren't the minifiers, so you really could view source and just pull whatever you wanted from there. [00:22:06] Prefetcher: We also had those design studios, design agencies, notably 2advanced which check in now, their website still works, and their website is still in the same aesthetic as it was those 20 so years ago just dictating this futuristic design style that people really like. 'cause a lot of people nowadays also really like this old futurism minimalism for example a lot of people still love the Wipeout 3 aesthetic that was designed by one of my favorite studios overall the designers republic. And yeah, it's just hard for me to explain, but it feels so soulful in a way. [00:22:53] Jeremy: I think there are some trade offs. There's what we were talking about earlier with the flexibility of screen size. But there used to be with a lot of websites that used Flash, there used to be these very elaborate intros where the site is loading and there's these really neat animations. But at the same time, it's sort of like, well, to actually get to the content, it's a bit much, but, everything is a trade off. [00:23:25] Prefetcher: People had flash at their disposal and they just wanted to make, I have the tooling, I'm going to use all of the tooling and all of it. [00:23:33] Jeremy: Yeah. Yeah. but yeah, I definitely get what you're saying where when I went to make my own website I made it very utilitarian and in some ways boring, right? I think we do kind of miss some of what we used to have. [00:23:54] Prefetcher: I mean, in my opinion, utilitarian websites are just as fine. Like in some cases you don't really need a lot of flashy things and a lot of very modern very CPU intensive or whatever animations. Sometimes it is better to go on a website and just like, see, oh, there's the play button and that's it. [00:24:17] Jeremy: Yeah. Well definitely the animations and the intro and all that stuff. I guess more in terms of the aesthetics or the designs. It's tricky because there's definitely people making very cool things now things that weren't even possible back then. But it does feel like maybe the default is I'll pick this existing style sheet or this existing framework and just go with that. [00:24:47] Prefetcher: A lot of modern websites just go for similar aesthetics, similar designs, which they aren't bad, but they are also very just bland. They, they are futuristic, they are very well designed. But when you see the same website. The same -- five websites have the same feel. And this is especially, at least in my opinion, visible with websites built on top of NextJS or other frameworks. And it just feels corporate kind of dead. Like someone just makes a website that they want to sell something to you and not for fun. [00:25:26] Jeremy: With landing pages especially it's like, wow, this looks the same as every other site, but I guess it must work. [00:25:38] Prefetcher: It works. And it really cuts down on development time. You don't need to think much about it. You just already have a lot of well-established design rules that you just follow and you get a cohesive and responsive design system. Designing the PinkSea look and feel [00:25:56] Jeremy: Let's talk about that in connection with PinkSea. What was your thinking when you designed how PinkSea would look and feel? [00:26:06] Prefetcher: Honestly, at first I have to admit I looked at other websites. I looked at Bluesky first and foremost. I looked at, front page. I looked at Smoke Signal, and I thought that I might also build something that's modern and sleek and I sketched it out in an application and I showed it to some friends. One of them suggested I go for more like a 2000 aesthetic. I'm like, yeah, okay. I like that. As the website was built, I just saw more and more of how much I feel this could sit with others. Especially with the fact that it's an oekaki page an oekaki BBS and as you scroll through oekaki has a very distinct style to it. And as you scroll and you see all of those, pixel shaded, all those dithered images, non anti-aliased pens and whatnot. It feels really really cohesive somehow with the design aesthetic. But of course, PinkSea in itself is a modern website. Like if you were to go to my PinkSea repository. It's a modern website built up on top of Vue3, which talks via like XRPC API calls in real time and it's a single page app and whatever. That's kind of the thing I merged the modern way of making sites with a very oldish design language. And I feel, in my opinion, it somehow just really works. And especially it sets PinkSea apart from the other websites. It gives it that really weird aesthetic. You would go on it and you would not be like, oh, this is a modern site that connects with a modern protocol on top of a big decentralized network. This is just someone's weird BBS stuck in the 2000s that they forgot to shut down. (laughs) [00:28:00] Jeremy: Yeah. And I think that's a good reminder too, that when people are intentional about design, the tools we have now are so much better than what we used to have. There's nothing stopping us from making websites that when people go to them they really feel like something's different. I know I did not just land on Instagram. [00:28:27] Prefetcher: Yeah. And making PinkSea taught me that it's really easy to fall into that full string of thought that every site has to look modern. Because I was like, oh yeah, this is a modern protocol, a modern everything, and it has to look the part. It has to look interesting to people and everything. And after talking with a bunch of friends and other people and just going, huh, that's maybe like the 2000s isn't as bad as I thought. And yeah, the website especially it's design people seem to just really like it. Me too. I, I just absolutely love how PinkSea turned out it is really a reminder that you don't need modernness in web design always. And people really appreciate quirky looking pages, so to say, quirky like interesting. [00:29:23] Jeremy: I interviewed the, the creator of Neocities which is like kind of a modern version of GeoCities and yeah, that's really what one of the aspects that I think makes things so interesting to people from that era is, is that it really felt like you're creating your own thing, and not just everything looks the same. The term I think he used is homesteading. You're taking care of your place and it can match your sensibilities, your style, your likes, rather than having to, like you said, try to force everything to be this, this sort of base modern, look. The old spirit of the internet is coming back [00:30:08] Prefetcher: I mean Neocities and by extension also Nekoweb are websites that I often when I don't have much to do -- I like just going through them because you see a bunch of people just make their own places. And you see that even in 2025 when we have those big social media sites. You have platforms where you can get a ton of followers. You can get a ton of attention and everything. People to some extent still want that aspect of self-expression. They want to be able to make something that's uniquely theirs and you see people just make just really amazing websites build insane things on those old Geocities-like platforms using nothing but a code editor. You see them basically just wanting thing to express, oh, that's mine and no one else has it. So to say that's why. Yeah. I feel like to some extent the old school train of thought when it comes to the internet is slowly coming back. Especially with the advent of protocols like ATProto. And you'll experience more websites that just allow people to make their own homes on the internet. Cause in my opinion, one of the biggest problems is that people do not really want to register on a lot of platforms. 'cause you already have this place where you get all of your followers, you have all of your connections, and then you want to move and then you'll lose all of your connections and everything. But with something like ATProto, you can use the social graph of, for example, Bluesky. I want to add followers on PinkSea. So for example, you have an artist that has like 30,000 followers for example, I can just click import my following from Bluesky. And just like that they would already get all of the artists that they follow on Bluesky already added as followers on PinkSea. And for example, someone else joins and they followed that big artist and they instantly followed them on PinkSea as well. I think that we are slowly coming back to the advent of people owning their place online. PinkSea and ATProto (PDS) [00:32:24] Jeremy: Yeah. So let's talk a little bit more about how PinkSea fits into ATProto. For people who aren't super familiar with ATProto, maybe you could talk about how it's split up. You've got the PDS, the relays, the AppView. What are those and how do those fit into what PinkSea is? [00:32:48] Prefetcher: My favorite analogy, ATProto is a massive network, and at least me, when I saw the initial graph I was just very confused. I absolutely did not know what I'm looking at. But let's start with the base building block, something that ATProto wouldn't exist with. And it's the PDS. Think of the PDS as like a filing cabinet. You have a bunch of folders in which you have files, so to say. So you have a filing cabinet with your ID, this is the DID part that sometimes shows up and scares people. It's what we call a decentralized identifier. Basically that identifier is not really tied to the PDS, it just exists somewhere. And the end goal is that every user controls their DID. So for example, if your PDS shuts down, you can always move to somewhere else. Still keep like, for example, that you are prefetcher.miku.place. But in that filing cabinet the PDS going back to it you have your own little zone, your own cabinets, and that has your identifier, it's uniquely yours. Every single application on the AT protocol creates data. They create data and they store the data in a structured format called a record. A record is basically just a bunch of data that explains what that thing is, be it a like, a post on Bluesky an oekaki on PinkSea and an upvote on front page, or even a pixel on place.blue. And all of those records are organized into folders in your cabinet. And that folder is named with something we call a collection id. So for example, a like is, if I remember correctly, it's app.bsky.feed.like, so you see that it belongs to Bluesky. The app.bsky part. it's a feed thing, and the same way, PinkSea, for example, the oekaki and PinkSea uses com.shinolabs.pinksea.oekaki with com.shinolabs being the the collective that I use as a, pen name, so to say. PinkSea being, well, PinkSea and oekaki just being the name. It's an oekaki. If you want to see that there are a lot of tools, for example, PDSls or atp.tools or ATProto browser, if you had to go into one of those and you would type in for example, prefetcher.miku.place, you would see all of your records, the things that, you've created on the AT protocol network. Relay [00:35:19] Prefetcher: So you have a PDS, you have your data, but for example, imagine you have a PDS that you made yourself, you hosted yourself. How will, for example, Bluesky know that you exist? 'cause it won't, it's just a server in the middle of nowhere. That's where we have a relay. A relay is an application that listens to every single server. So every time you create something or you delete something, or for example, you edit a post, you delete an oekaki. You create a new, like -- Your PDS, your filing cabinet generates a record of that. It generates an event, something we call a commit. So, anytime you do something, your PDS goes, Hey, I did that thing. And relays function as big servers that a PDS can connect to. And it's a massive shout box. The PDS goes, Hey, I made this. Then the relay aggregates all of those PDSs into one and creates a massive stream of every single event that's going on the network at once. That's also where the name firehose comes from. 'cause the, the end result, the stream is like a firehose. It just shoots a lot of data directly at anyone who can connect to it. And the thing that makes AT Protocol open and able to be built on is that anyone can just go, I want to connect to jetstream1.west.bluesky.network. They just make a connection to it and boom they just get everything that's happening. You can, for example, see that via firesky.tv. If you go to it, you would open it in your browser. Every single Bluesky post being made in real time right directly in your computer. So you have the PDSs that store data, you have the relay that aggregates every, like, builds a stream of every single event on the network. AppViews [00:37:26] Prefetcher: You just get records. You can't interact with it. You can see that someone made a new record with that name, but to a human, you won't really understand what a cid is or what property something else is. That's why you have what we call AppViews. An AppView, or in full an application view is an application that runs on the AT protocol network. It connects to the relay and it transforms the network into a state that it can be used by people. That's why it's called an application view. 'cause it's a, a specialized view into the whole network. So, for example, PinkSea connects, and then it goes, hey, I want to listen on every single thing that's happening to com.shinolabs.pinksea.oekaki, and it sees all of those, new records coming in and PinkSea understands, oh, I can turn it into this, and then I can take this thing, store it in the database, and then someone can connect with a PinkSea front end. And then it can like, transform those things, those records into something that the front end understands. And then the front end can just display, for example, the timeline, the same way Bluesky, for example -- Bluesky gets every single event, every single new file, new record coming in from the network. And it goes. Okay, so this will translate into one more like on this post. And this post is a reply to that post. So I should chain it together. Oh. And this is a new feed, so I should probably display it to the user if they ask for feeds. And it basically just gets a lot of those disjoint records and it makes sense of them all. The end user has a different API to the Bluesky AppView. And then they can get a more specialized view into Bluesky. PinkSea does not store the original images, the PDS does [00:39:26] Jeremy: And so in that example, the PDSs, they can be hosted by Bluesky the company, or they could be hosted by any person. And so PinkSea itself, when somebody posts a new oekaki, a new image, they're actually telling PinkSea to go create the image in the user's PDS, right? PinkSea is itself not the the source of truth I guess you could say. [00:40:00] Prefetcher: PinkSea in itself. I don't remember which Bluesky team member said it, but I like the analogy that AppViews are like Google. So in Google, when you search something, Google doesn't have those websites. Google just knows that this thing is on that website. In the same vein, PinkSea, when you create a new oekaki, you tell PinkSea, Hey, go to my PDS and create that record for me. And then the person owns the PDS. So for example, let's say that in a year, of course I won't do it, but hypothetically here, I just go rogue and I shut down PinkSea, I delete the database. You still own the things. So for example, if someone else would clone the PinkSea repository and go here, there's PinkSea 2. They can still use all of those images that were already on the network. So, AppViews in a way basically just work as a search engine for the network. PinkSea doesn't store anything. PinkSea just indexes that a user made a thing on that server. And here I can show you how to get to it somehow. Those images aren't stored by PinkSea, but instead, I know that the image itself is stored, for example, on pds.example.com, and of course to reduce the load, we have a proxy. PinkSea asks the proxy to go to pds.example.com and fetch the image, and then it just returns it to the user. [00:41:37] Jeremy: And so what it sounds like then is if someone were to create oekaki on their own PDS completely independently of Pink Sea the fact that they had created that image would be sent to one of the relays, and then PinkSea would receive an event that says oh, this person created a new image then at that point your index could see, oh, somebody created a new image and they didn't even have to go through the PinkSea website or call the PinkSea APIs. Is that right? Sharing PDS records with other applications [00:42:14] Prefetcher: Yep. That is exactly right. For example, someone could now go, Hey, I'm making my own PinkSea-like application. And then they would go, I want to be compatible with PinkSea. So I'm using the same record. Or what we call a lexicon, basically describe how records look like. I forgot to mention that, but every single record has an attached lexicon. And lexicons serve as a blueprint. So a lexicon specifies, oh, this has an image, this has a for example, the tags attached to it, a description of the image. Validate that the record is correct, that you don't get someone just making up random stuff. But yeah, someone could just go, Hey, I'm making another website. Let's call it GreenForest for example. And GreenForest is also an oekaki website, but it uses, for example, chickenpaint instead of tegaki but I want to be able to interoperate with PinkSea. so I'm also gonna use com.shinolabs.pinksea.oekaki the collection, the same record, the same lexicon. And for example, they have their own servers and the servers just create regular oekaki records. So for example, GreenForest gets a new user, they log in, create, draw their beautiful image, and then they click upload it. So GreenForest goes to that person's PDS and tells the PDS, Hey, I want to make a new. com.shinolabs.pinksea.oekaki record. The PDS goes okay, I've done it for you. Let me just inform the relay that I did so, relay gets the notification that someone made that new PinkSea oekaki record. And so the main PinkSea instance, pinksea.art, which is listening in on the relay, gets a notification from the relay going, Hey, there is this new oekaki record. And PinkSea goes, sure, I'll index it. And so PinkSea just gets that GreenForest image directly in itself. And in the same vein, someone at PinkSea could draw something in tegaki -- their own beautiful character. And the same thing would happen with GreenForest. GreenForest would get that PinkSea image, that PinkSea record, and index it locally. So the two platforms, despite being completely different, doing completely different things, they would still be able to share images with each other. Bluesky PDS stores other AppView's data but they could stop at anytime [00:44:38] Jeremy: And these images, since they're stored in the PDS, what that would mean is that anybody building an application on ATProto, they can basically use Bluesky's PDS or the user's PDS as their storage. They could put any number of images in there and they could get into gigabytes of images. And that's the responsibility of the PDS and not yourself to keep track of. [00:45:12] Prefetcher: Yes, that can be the case. Of course, there is a hard limit on how big a single upload can be, which is, if I remember correctly, I don't wanna lie, I think it's 50 megabytes, I don't recall there being a hard cap on how big a single repository can be. I know of some people whose repositories are in the single gigabyte digits but this kind of is a thing scares app developers. 'cause you never know when Bluesky the company -- 'cause most people registering, are registering on Bluesky. We don't really know whether Bluesky, the company will want to keep it for free. Forever allow us to do something like that. You already have projects like, for example, ATFile, which just allow you to upload any arbitrary data just to store it, on their servers and they are paying for you. So we'll never know whether Bluesky will decide, okay, our services are only for Bluesky if you want to use PinkSea you have to deal with it. Or whether they go, okay, if you want to use alternative AppViews you have to pay us in order to host them. So, that also leads me to the fact that decentralization is an important part of AT protocol as Bluesky themselves say that they are a potential adversary. You cannot trust them in the long term. Right now they are benign right now, they're very nice, but, we never know how Bluesky will end up in a year or two. So if you want to be in the full control of your data, you need to sadly host it by yourself. And it's honestly really easy in order to do so. There is a ton of really useful online content blogs and whatever. I think I've set up my PDS in 10 minutes on a break between classes and university. But to a person that's non-technical that doesn't know much I'd say around an hour to two hours The liability and potential abuse from running a PDS [00:47:14] Jeremy: Yeah, I think the scary thing for a lot of people is technical or not, is even if it's easy to set up, you gotta make sure it keeps running. You gotta have backups. And so it could be a lot. [00:47:30] Prefetcher: Yeah. This is to be expected by the fact that you're in control of your data. Keeping it secure the same way, for your personal photos or your documents, for example, your master's diploma or whatever. And it's on you to keep your Bluesky interaction secure. On one hand, it's easier to get someone to do it, and I expect in the future we'll get people that are hosting public PDSes I sometimes thought of doing that for PinkSea, just like allowing people to register by PinkSea. But, doing so as a person, you also have to be constantly on call for abuse. So if someone decides to register via PinkSea and do some illicit activities, you are solely responsible for it. PDS and AppView moderation liability [00:48:17] Jeremy: So if they were to upload content that's illegal, for example, it's hosted on your servers so then it's your problem. [00:48:27] Prefetcher: Yeah, it is my problem. [00:48:29] Jeremy: At least the way that it works now, the majority of the people, their PDS is gonna be hosted by Bluesky. So if they upload content that's breaks the law, then that's the Bluesky company's problem at least currently. [00:48:44] Prefetcher: Yeah. That is something that Bluesky has to deal with. But I do believe that in the future we are going to have, more like independent entities just building infrastructure for ATProto, not even the relay it's just like PDSs for people to be able to join the atmosphere, but not directly via Bluesky. [00:49:06] Jeremy: I'm kind of curious also with the current PDSs, if it's hosted by Bluesky, are they, are they moderating what people upload to their PDSs? [00:49:16] Prefetcher: Good question. Honestly, I don't think they're moderating everything 'cause, it's infeasible for them to, for example, other than moderate Bluesky to also moderate PinkSea and moderate front page and whatnot. So it's the obvious responsibility to moderate itself and to report abuse. I'd say that if someone started uploading illicit material, I do not think, and this is not legal advice, I do not think that they would catch on until some point let's say. [00:49:52] Jeremy: I mean, from what you were describing too, it seems like the AppViews would also, have issues with this because if, let's say someone created a PinkSea record in their PDS directly and the image they put in was not an oekaki image, it's instead something pretty illegal in the country that your AppView is hosted then, Wouldn't that go straight to the PinkSea users viewing the website? [00:50:20] Prefetcher: Yes, sadly, this is something that you have to sign up as you're making an AppView and especially one with images. Sooner or later you are going to get material that you have to moderate and it's entirely on you. That's why, you have to think of moderation while you're working on an AppView. Bluesky has an insanely complicated, at least in my opinion, moderation system, which is composable and everything, which I like. But for smaller AppViews, I think it's too much to build the same level of tooling. So you have to rely more on manual work. Thankfully so far the user base on PinkSea has been nothing but stellar. I didn't have to deal with any law breaking stuff, but I am absolutely ready for one day where I'll have to sadly make some drastic moderation issues. [00:51:18] Jeremy: Yeah. I think to me that's the most terrifying thing about making any application that's open to user content. [00:51:29] Prefetcher: I get it, sadly. I'm no stranger to having issues with people, abusing my websites. Because since 2016, my, first major project was a text board based off of, a text board in a video game called DANGER/U/. It was semi-popular, during the biggest spike in activity in like 2017 and 2016, it had in the tens of thousands of monthly visitors. And sadly, yeah, even though it was only text, I've had to deal with a lot of annoying issues. So to say the worst I think was I remember waking up and people are telling me that DANGER/U/ is down. So I log in the activity logs and someone hit me with two terabytes of traffic in a day. There was a really dedicated person that just hated my website and just either spam me with posts or just with traffic. So, yeah, sadly I have experience with that. I know what to expect that's something that you sadly have to sign up for making a website that allows user content. Pinksea is a single server [00:52:42] Jeremy: To my understanding so far, PinkSea is just a single server. Is that right? [00:52:47] Prefetcher: It is a single server. Yeah. [00:52:48] Jeremy: That's kind of interesting in that, I think a lot of people when they make a project, they worry about scaling and things like that. But, was it a case where you just had a existing VPS and you're like, well hopefully this is, this is good enough? [00:53:03] Prefetcher: I actually ordered a new one even though it's not really powerful, but my train of thought was that I didn't expect it to blow up. I didn't expect it to require more than a single VPS with 8 gigabytes of RAM and whatnot. And so far it's handling it pretty well. I do not expect ever to reach the amounts of traffic that Bluesky does, so I do not really have to worry about insane scalability and whatever. But yeah. I thought of it always as a toy project until the day I released it and realized that it's a bit more than a toy project at this point. To this day, I just kind of think that that website even if it were popular, I would never expect it to have -- And in the best, most amazing case scenario, like a hundred posts a day. I do not have to deal with the amount of traffic that Bluesky does. So one VPS it is. [00:53:59] Jeremy: Yeah, that makes a lot of sense. I mean the application is also mostly reads, right? Most people are coming to see the posts and like you said, you get a few submissions a day, but all the read stuff can probably be cached. Harbor image proxy [00:54:15] Prefetcher: Yeah. The heaviest, thing that PinkSea requires is the image proxy harbor, and that's something that right now only runs on that server. It's in Luxembourg. I think that's where my coprovider hosts it but yeah, that gets the most reads. 'cause in most cases, PinkSea, all it does, all you get is reads from a database, which is just, it's a solved problem. It's really lightweight. But with something like image proxying, you have this whole new problem. 'cause it's a lot of data, and you somehow have to send it -- it's enough for me to just host it locally on that PinkSea server and just direct people to it. But sooner or later, I can always just put it behind something like Bunny CDN or whatnot to have it be worldwide. [00:55:09] Jeremy: So Harbor is something I think you added recently. How did the images work before and what is Harbor doing in its place? [00:55:18] Prefetcher: Before I did what a lot of us currently do and I just freeload atop of Bluesky CDN 'cause Bluesky CDN is just open so far. But it's something that personally irked me. 'cause, I want PinkSea to be completely independent of Bluesky Corporation. I, I wanted to persevere even if Bluesky just decides to randomly, for example, close, the CDN to others or the relay to others or the PLC directory in the worst case scenario. So I wanted to make my own CDN more like proxy. You can't really call it a CDN because it's not worldwide. It's just a single server but let's just say image proxy. So Harbor whenever a person goes to PinkSea, they start loading in all of the images and every single image instead of going to, for example, the PDS or to cdn.bluesky.app. They go to harbor.pinksea.art, you get attached the identifier of the user and what we call a content identifier. Every single, thing uploaded to a PDS has an attached content identifier, which identifies it in a secure way so to say. So Harbor does in reality a really simple set of things. First and foremost, if the user has not seen it, like, not loaded it before first Harbor asks the local cache, do I have this file? If they do, if Harbor does, it just sends the file and it tells the browser, Hey, by the way, please don't ask me about this file for the next day. And in most cases, after one refresh, the user, all of the images load instantly because the web browser just goes, of those files were already sent. And Harbor asked me not to like, ask it more about the same file. So in the case of the image isn't in harbor's local cache, Harbor, first does a lot of those steps to resolve, the users identifier through their PDS, basically resolving that identifier, the DID to a DID document, which is a document basically explaining how that user, what is their, alias, what is their handle and where can we find them, which PDS. So we find the PDS and we then ask the PDS, Hey, send us this file for this user. The PDS sends it or doesn't, in which case we just throw an error and, Harbor just saves it locally and it sends it to the client. It basically just that. But to my knowledge, it's the first non Bluesky image proxy that's deployed for any AppView. Which also caught the attention of Brian Newbold one of the Bluesky employees and made me really happy. DID PLC Lookup [00:58:14] Jeremy: The lookup when you have the user's, DID and you wanna find out where their PDS is that's talking to something called, I think it's the PLC directory? [00:58:25] Prefetcher: Actually there are two different ways. First is PLC directory, PLC originally standed for a placeholder, and then Bluesky realized that it's not a placeholder anymore, and they stealthily changed it to public ledger of credentials. So we have PLC and we have web, the most common version is PLC. The document, the DID document is stored on Bluesky controlled servers under the moniker of PLC directory. They expose a web API that basically just allows you to say, Hey, give me the document for did:plc, whatever. And, the directory goes, have it. And this is the less decentralized version. You can host your own PLC directory and you can basically ask (their) PLC directory to just send you every single document and just you can have your local copy, which some people already do, you kind of sacrifice the fact that you are not in control of the document. It's still on a centralized server, even if you control the keys. 'cause every single DID document also has a key. And that key is used to sign changes to the document. So technically, if you define your own set of keys, you can prevent anyone else from modifying your document, even Bluesky. 'cause every single document is verifiable back and forth. You can see the previous document and its key is used to sign the next document and the chain of trust is visible and no one can just make random changes to your identity, but yeah, it's still on Bluesky to control service and it's a point of contention. Bluesky eventually wants to move it to a nonprofit standards organization, but we have yet to see anything come out of it, sadly. DID WEB lookup The next method is web. And web instead of -- 'cause in did:plc, you have did:plc, and a random string of characters. [01:00:30] Prefetcher: Web relies on domains. So for example, the domain would already like be the sole authority of where the file is. So for example, if I had did:web:example.com, I would parse the DID and I would see it's hosted at example.com. So I go to example.com, I go to /.wellknown/did.json which is the well-known location for the file. And I would have the same DID document as I would have if I used, for example, a PLC DID resolved via the PLC directory. the web method, you are in control of the document entirely. It's on your server under your domain. While it's the more decentralized version, it's just kind of hard for non-technical people to make them. 'cause it relies on a bunch of things. And also the problem is that if you lose your domain, you also lose your identity. [01:01:23] Jeremy: Yeah. So unlike the PLC where it's not really tied to a specific domain, you can change domains. With the web way, you have to always keep the same domain 'cause it's a part of the DID and yeah, like you said, you can't let your renewal lapse or your credit card not work. 'cause then you just lose everything. [01:01:49] Prefetcher: Yeah. You would still be able to change handles, but you would be tied for that domain to forever send your DID otherwise you would just lose it forever. [01:01:57] Jeremy: Yeah, I had mostly only seen the PLC and I wasn't too familiar with the web, form of identification, but yeah that makes sense. [01:02:06] Prefetcher: I think the web if I remember correctly, there is slightly over 300 accounts total on the entire network that use it. Mary who is a person on Bluesky that does a lot of like, ATProto related things, has a GitHub repository that basically gives insight into the network. And on her GitHub repository, you can find the list of every single custom PDS and also how many DID webs there are in existence. And I think it was slightly over 300. [01:02:38] Jeremy: So are you on that list? [01:02:40] Prefetcher: My PDS Yeah. If you were to scroll down. I don't use a web DID 'cause I registered my account before when I was brand new to ATProto, so I didn't know anything. But if you had to scroll down, you would see pds.ata.moe, which is my custom PDS just running. [01:02:55] Jeremy: Cool. [01:02:57] Prefetcher: Yeah. Harbor image proxy can cache any image blob [01:02:58] Jeremy: So something I noticed about harbor, you take the, I believe you take the DID and then you take the CID, the content identifier. I noticed if you take any of those pairs from the ATProto network, like I go find a image somebody posted on Bluesky, I pass that post DID and CID for the image into harbor. Harbor downloads it and caches it. So it's like, does that mean anybody could technically use you as a ATProto CDN? [01:03:38] Prefetcher: Yes, the same way anyone could use like the Bluesky CDN to for example, run PinkSea like I did. cause I do not know if there is a good way to check if a CID of an image or a blob basically. 'cause files on ATProto are called blobs. I do not think there is a nice way to check if that blob is directly tied to a specific record. But that also allows you to make cool, interesting things. Crossposting to Bluesky talks directly to the PDS [01:04:06] Prefetcher: 'cause for example, PinkSea has that, cross post to Bluesky thing. So when you create an image, You already have an option to cross post it to Bluesky, which a lot of people liked. And it was a suggestion from one of the early users of PinkSea. And the way it works is that when we create a PinkSea record, we upload that image, right? And then PinkSea goes, okay, I'm gonna use that same image, the same content identifier, and just create a Bluesky post. So Bluesky and PinkSea all share the same image. I don't upload it twice, I just upload it once. use it in PinkSea and I also use it in Bluesky. And the same way Bluesky its CDN, can just fetch the image. I can also fetch the image from mine, 'cause blobs aren't tied to specific records. They just exist outside of that realm. And you could just query anything. Not even images. You could probably query a video or even a text file. [01:05:04] Jeremy: So when you cross post to Bluesky, you're creating a record directly in the person's PDS, not going through bluesky's API. [01:05:14] Prefetcher: No, I sidestep Bluesky's API completely. And, I basically directly talk to the PDS at all times. I just tell them, Hey, please, for me, create a app.bsky.feed.post record. And you have the image, the text, which also required me to manually parse text into rich text. 'cause like, Bluesky doesn't automatically detect for example, links or tags And you basically get -- like PinkSea creates a record directly with the link to the image. And all of those tags, like the PinkSea tag and whatever, And I completely sidestep. Bluesky's API. If Bluesky, the AppView would cease to exist, PinkSea would still happily create Bluesky crossposts for you. Other applications put metadata into Bluesky posts so they can treat them differently [01:06:02] Jeremy: And since you're creating the records yourself, then you can include additional metadata or fields where you know that this was a PinkSea post, or originally came from PinkSea. [01:06:13] Prefetcher: I could do that. I don't really do that right now 'cause I don't really have much of a reason other than adding a PinkSea hashtag to every single oekaki. But I, noticed, for example, I think it was PinkSky, interesting name, PinkSky, which is like (a) Bluesky Instagram client. Any single time you make a post via PinkSky it uses the Bluesky APIs. It's Bluesky, but it attaches a hidden hashtag like PinkSky underscore some random letters. In its feed building algorithm, it basically detects posts with that hashtag, that specific hashtag, and it builds a PinkSky only timeline. 'cause it's still a Bluesky post, but it has hidden additional metadata that identifies, Hey, it came from PinkSky. [01:07:02] Jeremy: It's pretty interesting how much control you have over what to put in the PDS. So, I'm sure there's a lot of interesting use cases that people are gonna come up with. [01:07:14] Prefetcher: Yeah, of course. You still lose some of the data when you go through the Bluesky API. 'cause of course it stores the record and it's all in formats and whatnot. But you can attach a lot of metadata that can identify posts and build micro networks within Bluesky itself. I see it like that. Bluesky CDN compression [01:07:37] Jeremy: And I think, this might have been a post from you. I think I saw somebody saying that when you view an image from the CDN that the Bluesky CDN specifically, there's some kind of compression going on that that messes with certain types of art. [01:07:55] Prefetcher: It's especially noticeable artists are complaining about it all the time, left and right. Bluesky is very happy with jpeg compression, by default, their CDN, -- like to every single image it applies a really not good amount of jpeg compression which is especially not small. If you compare an image that's uploaded via PinkSea, view an image on PinkSea, and view the same image, which is, it's the same content id. It's the same blob. And you view it on Bluesky, it loses so much fidelity, it loses so much of that aliasing on the pen. You just see everything become really blurry. And on top of that, when you upload an image via Bluesky itself, if I remember correctly, I don't wanna lie here, but they also downscale the image to 1024 pixels by default. So every single image, not only big ones, and artists usually work with really big canvases, they get, downscaled and also additionally they get jpegified. So for example, PinkSea directly uploads PNG files to the PDS. And for example, Harbor gives back the original file. It does no transformations on it, but Bluesky transforms all of them into JPEG compressed images and for photos, it's fine sometimes. 'cause I've also seen people just compare directly, downloaded images of the PDS versus images viewed on Bluesky. But for art it's especially noticible. And people really (do) not like that. [01:09:31] Jeremy: Yeah, that's kind of odd. 'cause if, if I understand correctly, then if you post directly to your PDS and Bluesky pulls it in you'll avoid that, that 1024 resizing. So your images will be higher quality? [01:09:47] Prefetcher: I actually do not know. That's an interesting question. Cause I know that the maybe their CDN also does that 'cause that's what I've heard from others, that on upload the image gets processed and squashed down. So I don't know if doing it via an alternative AppView would change it or would Bluesky just directly reject this post? Because for example, PinkSea, I have built-in which I think I might change in the future -- PinkSea will reject your post if it's bigger than 800x800. 'cause then it'll notice that something is off. This could not have been made with PinkSea. [01:10:26] Jeremy: Yeah, that's a good point I suppose we know at the very least, they have some third party and internal moderation tools that they feed the images through to, so they, they can do some automatic content tagging. But yeah, I, I don't know, like you said, whether, the resizing and all that stuff is at the CDN level [01:10:50] Prefetcher: The jpegification is definitely at the CDN level. 'cause, Bluesky is actually running an open source image proxy. It's called imgproxy. Brian Newbold talked about it a bit on that harbor post. And, yeah, so a lot of the compression, the end user things are done via image proxy, but that, downscaling, I don't know, you'd have to ask someone who's a bit more intimate with Bluesky's internals. [01:11:19] Jeremy: Cool. yeah, I think we've, we've covered a lot. Is there, is there anything else, you wanted to mention or thought we should have talked about? [01:11:26] Prefetcher: Regarding PinkSea I think I've mentioned a ton both the behind the scenes things and, the user things, the design principles. What I'd want to absolutely say, and it will sound cheesy, and, is that I'm eternally grateful to anyone who's actually visited PinkSea. It's definitely grown outta all of my like dreams for the platform, to the point where I'm sitting here just talking about it. I definitely hope that the future will bring us more applications (in) ATProto. I definitely have ideas on how to expand PinkSea, a lot of ideas, a lot of things I want to do, and I'm also a very busy person, so I never get around them. But yeah, think that's it, at least regarding PinkSea. [01:12:15] Jeremy: Cool. Well, if people want to check out PinkSea or see what you're up to, where can they find you? [01:12:22] Prefetcher: So PinkSea is at pinksea.art. That's the website and Bluesky Handle is at pinksea.art and me, well, search prefetcher on Bluesky, you'll probably find me. My tag is at prefetcher.miku.place. all of my socials are probably there. I'm Prefetcher pretty much every single platform except for the platforms that already had someone called Prefetcher. GitHub, github.com/purifetchi because Prefetcher was taken. And, yeah, hit me up. I'm always eager to talk. I don't bite. [01:13:00] Jeremy: Very cool. Well, Kacper thanks. Thanks for taking the time. This was fun. [01:13:04] Prefetcher: Thank you so much, Jeremy, for having me over. It was a pleasure.
undefined
Feb 6, 2025 • 1h 2min

Tom MacWright on Shutting down Placemark

Tom MacWright is a prolific contributor in the geospatial open source community. He made geojson.io, Mapbox Studio, and was the lead developer on the OpenStreetMap editor. He's currently on the team at Val Town. In 2021 he bootstrapped a solo business and created the Placemark mapping application. He acquired customers and found steady growth but after spending two years on the project he decided it was financially unsustainable. He open sourced the code and shut down the business. In this interview Tom speaks candidly about why geospatial is difficult, chasing technical rabbit holes, the mental impact of bootstrapping, and his struggles to grow a customer base. If you're interested in geospatial or the good and bad of running a solo business I think you'll enjoy this conversation with Tom. Related Links Tom's blog Placemark Play Placemark GitHub Placemark archive geojson.io Valtown Datawrapper (Visualization tool) Geospatial Companies mentioned Mapbox ArcGIS QGIS Carto -- Transcript You can help correct transcripts on GitHub. [00:00:00] Introduction Jeremy: Today I'm talking to Tom MacWright. He worked at Mapbox as a, a very early employee. He's had a lot of experience in the geospatial community, the open source community. One of his most recent projects was a mapping project called Placemark he started and ran on his own. So I wanted to talk to Tom about his experience going solo and, eventually having to, shut that down. Tom, thanks for agreeing to chat today. Tom: Yeah, thanks for having me. [00:00:32] Tools and Open Source at Mapbox Jeremy: So maybe to give everyone some context on, what your background was before you started Placemark. Um, let's talk a little bit about your experience at, at Mapbox. What did you work on there and, and what would you say are like the big things you learned from that experience? Tom: Yeah, so if you include the time that I was at Development Seed, which essentially turned into Mapbox, I kind of signed the paper to get fired from Development Seed and hired at Mapbox within the same 20 seconds. Uh, I was there for eight and a half years. so it was a lifetime in tech years. and the company really evolved from, uh, working for Human Rights Watch and Amnesty International and the World Bank and doing these small, little like micro websites to the point at which I left it. It had. Raised a lot of money, had a lot of employees. I think it was 350 or so when I left. and yeah, just expanded into a lot of different, uh, try trying to own more and more of the mapping stack. but yeah, I was kind of really focused on the creative and tooling side of it. that's kind of where I see a lot of the, the fun and programming is making these tools where, uh, they can give people the same kind of fun like interaction loop that programming has where you, you know, you do a little bit of math and you see the result and you're able to just play with, uh, what you're working on, letting people have that in other domains. so it was really cool to figure out how to get A map design tool where somebody changes the background color and it just automatically changes that in your browser. and it covered like data editing. It covered, um, map styling and we did, uh, three different versions of that tool over the years. and then Mapbox is also a company that was, it came from, kind of people who are working on the Howard Dean campaign. And so it was pretty ideological and part of the ideology was being pretty hardcore about open source. we hired a lot of people who were working on open source projects before and basically just paid them to work on the open source projects, uh, for their whole time there. And during my time there, I just tried to make as much of my work, uh, open as possible, which was, you know, at the time it was, it was pretty great. I think in the long term it's been, o open source has changed a lot. but during the time that we were there, we both kind of, helped things like leaflet and mapnik and openstreetmap, uh, but also made like some larger contributions to the open source world. yeah, that, that's kind of like the, the internal company facing side. And also like what I try to create as like a more of a, uh, enduring work. I think the open source stuff will hopefully have more of a, a long term, uh, benefit. [00:03:40] How open source has changed (value capture by large companies) Jeremy: When I was working on a project that needed offline maps, um, we couldn't use Google Maps or any of the, the other publicly available, cloud APIs. So yeah, we actually used a, a tool, called Tile Mill that I, I hadn't known that you'd worked on, but recently found out you did. So that actually let us pull in OpenStreetMap data and then use this style, uh, language called carto to, to basically let us choose what the colors would be and how the different, uh, the roads and the buildings would look. What's kind of interesting to me is that it being open source really let us, um, build something we otherwise wouldn't have been able to do. But like, at the same time, we also didn't pay Mapbox any money. (laughs) So I'm, I'm kind of curious, like, if it's changed, like what the thinking was in terms of, you know, we pay for people to build all these things. We make it open source. but then people may just not ever pay us, you know, for all these things we did. Tom: Yeah. Yeah. I think that the main thing that's changed since the era of tilemill is, the dominance of cloud platforms. Like back then, I think, uh, Mapbox was still using, we were using like a little bit of AWS but people were still just on like VPSs and, uh, configuring things in cPanel and sometimes even running their own servers. And the, the danger of people using the product for free was such a small thing for us. especially when tile Mill was also funded by the Knight Foundation, so, you know, that at least paid half of my salary for, or, well, sorry, probably, yeah, maybe half of my salary for the first year that I was there and half of three other people's salaries. but that, yeah, so like when we built Tile Mill, a few companies have really like built on those same tools. Uh, there's a company called Carto coincidentally, they had the same name as Carto CSS, and they built on a lot of the same stack they built on mapnik. Um, and it was, was... I mean, I'm not gonna say that it was all like, you know, sunshine and roses, but it was never a thing that we talked about in terms of like this being a brutal competition between us and these other startups. Mapbox eventually closed source some stuff. they made it a source available license. and eventually Mapbox Studio was a closed source product. Um, and that was actually a decision that I advocated for. And that's mostly just because at one point, Esri, Microsoft, Amazon, all had whitelisted versions of Mapbox code, which, uh, hurts a little bit on a personal level and also makes it pretty hard to think about. working almost like it. You don't want to go to your scrappy open source company and do unpaid labor for Amazon. Uh, you know, Bezos can afford to pay for the labor himself. that's just kind of my personal, uh, that I'm obviously, I haven't worked there in a long time, so I'm not speaking for the company, but that's kind of how it felt like. and it yeah, kind of changed the arithmetic of open source in this way that. It made it less fun and, more risky, um, for people I think. [00:07:11] Don't worry about the small free users Jeremy: Yeah. So it sounds like the thinking was if someone on a small team or an individual, they took the open source software and they used it for their own projects, that was fine. Like you expected that and didn't worry about it. It's more that when these really large organizations like a, a Microsoft comes in and, just like you said, white labels the software, and doesn't really contribute significantly back. That's, that's when it, the, the thinking sort of shifted. Tom: Yeah, like a lot of the people who can't pay full price in USD to use your product are great users and they're doing cool stuff. Like when I was working on Placemark and when I was like selling. The theme for my blog, I would get emails from like some kid in India and it's like, you know, you're selling this for a hundred dollars, which is a ton of money. And like, you know, why, why should I care? Why shouldn't I like, just send them the zip file for free? it's like nothing to me and a lot to them. and mapping tools are really, really expensive. So the fact that Mapbox was able to create a free alternative when, you know, ArcGIS was $500 a month sometimes, um, depending on your license, obviously. That's, that's good. You're always gonna find a way for, like, your salespeople are gonna find a way to charge the big companies a lot of money. They're great at that. Um, and that's what matters really for your, for the revenue. [00:08:44] ESRI to Google Maps with little in-between Jeremy: That's a a good point too about like the, my impression of the, the mapping space, and maybe this has changed more recently, but you had the, probably the biggest player Esri, who's selling things at enterprise prices and then there were, or there are like a few open source options. but they feel like the, the barrier to entry feels a little high. And so, and then I guess you have stuff like Google Maps, right? That's, um, that's very accessible, but it's pretty limited, so. There's this big gap, it feels like right between the, the Esri and the, the Google Maps and open source. It's, it's sort of like, there's almost like there's no sweet spot. guess May, maybe it's just because people's uses are so different, but I'm, I'm not sure, um, what makes maps so unique in that way Tom: Yeah, I have come to understand what Esri and QGIS do as like an extension of what CAD is like. And if you've used CAD software recently, it's just as crazy and as expensive and as powerful. and it's really hard to capture like the people who are motivated enough to make a map but don't want to go down the whole rabbit hole. I think that was one of the hardest things about Placemark was trying to be in the middle of those things and half of the people were mystified by the complexity and half the people wanted more complexity. Uh, and I just couldn't figure out how to get it to the right in between spot. [00:10:25] Placemark and its origins in geojson.io Jeremy: Yeah. So let's, let's talk a little bit about Placemark then, in terms of from its start. What was your, your goal with Placemark and, and what was the product itself? Tom: So the seed of the idea for Placemark, uh, is this website called geojson.io, uh, which is still around. And, Chris Fong (correction -- Whong) at, at Mapbox is still, uh, developing it. And that had become pretty useful for a lot of people who I knew in the industry who were in this position of managing geospatial data but not wanting to boot up ArcGIS uh, geojson.io is based on, I just tweeted, I was like, why? Why is there not a thing where you can edit data on a map and have a GeoJSON representation and just go Back and forth between the two really easily. and it started with that, and then it kind of grew to be a little bit more powerful. And then it was just a tool that was useful for everyone. And my theory was just that I wanted that to be more useful. And I knew just like anything else that you build and you work on for a long time, you know exactly how it could be so much better. And, uh, all the things that you would do better if you did it again. And I was, uh, you know, hoping that there was something where like if you make that more powerful and you make it something that's like so essential that somebody's using every day, then maybe there's some some value in that. And so Placemark kind of started as being like, oh, this is the thing where if you're tasking a satellite and you need a bounding box on a specific city, this is the easiest way to do that. Um, and it grew a little bit into being like a tool for collaborating because people were collaborating on it. And I thought that that would be, you know, an interesting thing to support. but yeah, I think it, it like tried to be in that middle of like, not exactly Google my Maps and certainly a lot, uh, simpler than, uh, QGIS or ArcGIS Jeremy: something I noticed, so I've actually used geojson.io as well when I was first learning how to put stuff on a map and learning that GeoJSON was a format that a lot of things were using, it was actually really helpful to, to be able to draw, uh, polygons and see, okay, this is how the JSO looks and all that stuff. And it was. Like just very simple. I think there's something like very powerful about, websites or applications like that where it, it does this one thing and when you go there, you're like, oh, okay, I, I, I know what I'm doing and it's, it's, uh, you know, it's gonna help me do the, this very specific thing I'm trying to do. [00:13:16] Placemark use cases (Farming, Transportation, Interior mapping, Satellite viewsheds) Jeremy: I think with Placemark, so, one question I would have is, you gave an example of, uh, someone, I think you said for a satellite, they're, are they drawing the, the area? What, what was the area specifically for? Tom: the area of interest, the area where they want the, uh, to point the camera. Jeremy: so yeah, with, with Placemark, I mean, were there, what were some of the specific customers or use cases you had in mind? 'cause that's, that's something about. Um, placemark as a product I noticed was it's sort of like, here's this thing where you can draw polygons put markers and there's all these like things you can do, but I think unless you already have the specific use case, it's not super clear, who uses it for what. So maybe you could give some examples of what you had in mind. Tom: I didn't have much in mind, but I can tell you what people, what some people used it for. so some of the more interesting uses of it, a bunch of, uh, farming oriented use cases, uh, especially like indoor and small scale farming. Um, there were some people who, uh, essentially had a bunch of flower farms and had polygons on the map, and they wanted to, uh, mark the ones that had mites or needed to be watered, other things that could spread in a geometric way. And so it's pretty important to have that geospatial component to it. and then a few places were using it for basically transportation planning. Um, so drawing out routes of where buses would go, uh, in Luxembourg. And, then there was also a little bit of like, kind of interesting, planning of what to buy more or less. Uh, so something of like, do we want to buy this tract of land or do we wanna buy this tract of land or do we wanna buy access to this one high speed internet cable or this other high speed internet cable? and yeah, a lot of those things were kind of like emergent use cases. Um, there's a lot of people who were doing either architecture or internal or in interior mapping essentially. Jeremy: Interior, you mean, inside of a building Tom: yeah. yeah. Jeremy: Hmm. Okay. Tom: Which I don't think it was the best tool for. Uh, but you know, people used it for that. Jeremy: Interesting. Yeah. I guess, would people normally use some kind of a CAD tool for that, or Tom: Yeah. Uh, there's CAD tools and there are a few, uh, companies that do just, there's a company that just does interior maps especially of airports, and that's their whole business model. Um, but it's, it's kind of an interesting, uh, problem because most CAD architecture work is done with like a local coordinate system, and you have like very good resolution of everything, and then you eventually place it in geo geospatial space. Uh, but if you do it all in latitude and longitude, you know, you're, you're moving a door and it's moving the 10th or 12th decimal point, and eventually you have some precision problems. Jeremy: So it's almost like if you start with latitude and longitude, it's hard to go the other way. Right? you have to start more specific and then you can move it into the, the geospatial, uh, area. Tom: Yeah. Uh, that's kind of why we have local projections for towns is that you can do a lot of work just in that local projection. And the numbers are kind of small 'cause your town's small, relatively. Jeremy: yeah, those are kind of interesting. So it sounds like just anytime somebody wants to, like you gave the example of transportation planning or you want to visually see where things are, like your crops or things like that, and that, that kind of makes sense. I mean, I think if you just think about paper maps, if somebody wants to sketch something out and, and sort of track the layout of something, this could serve the same purpose but be editable. and like you said, I think it's also. Collaborative so you can have multiple people editing the same, um, map. that makes sense. I think something that I believe I saw on your website is you said though that it was, it's like an editing tool, but it's not necessarily a visualization tool. Uh, I'm kind of curious what you, what you meant by that. [00:17:39] An editing tool that allows you to export data not a visualization tool Tom: Yeah, I, when you say a map, I think there's, people can interpret that as everything from raw data to satellite imagery and raster data. and then a lot of it is like, can I use this to make a choropleth map of the voter turnout in our, in my country? and that placemark did a little bit, but I think that it was, it was never going to be the, the thing that it did super well. and so, yeah, and also like the, the two things kind of, don't mesh all that well. Like if you have a scale point map and you have that kind of visualization of it and then you're editing the points at the same time and you're dragging around these like gigantic points because this point means a lot of population, it just doesn't really make that much sense. There are probably ways to square that circle and have different views, but, uh, I felt like for visualizations, I mean partly I just think data wrapper is kind of great and uh, I had already worked for observable at that point, which is also, which I think also does like great visualization work. Jeremy: Would that be the case of somebody could make a map inside a placemark and then they would take the GeoJSON and then import that into another visualization tool? Is that what you were kind of imagining people would do? Tom: Yeah. Yeah, exactly. Jeremy: And I could see from the customer's perspective, a lot of them, they may have that end, uh, visualization in mind. So they might look for a tool that kind of just does both. Right. Tom: Yeah. Yeah. Certain people definitely, wanted that. And yeah, it was an interesting direction to go down. I think that market was going to be a lot different than the people who wanted to manage and edit data. And also, I, one thing that I had in mind a lot, uh, was if Placemark didn't work out, how much would people be burned? and I think if I, if I built it in a way that like everyone was heavily relying on the API and embeds, people would be suffer a lot more, if I eventually had to shut it down. every API that you release is really a, a long-term commitment. And instead for me, like guilt wise, having a product where you can easily export everything that you ever did in any format that you want was like the least lock in, kind of. Jeremy: Yeah. And I imagine the, the scope of the project too, you're making it much smaller if you, if you stick to that editing experience and not try to do everything. Tom: Yeah. Yeah. I, the scope was already pretty big. as you can tell from the open source project, it's, it's bigger than I wish it was. the whole time I was really hoping that I could figure out some niche that was much more compact. there's, I forget the name, but there's somebody who has a, an application that's very similar to Placemark in. Technical terms, but is just a hundred percent focused on planning septic systems. And I'm just like, if I just did this just for septic systems, like would that be a much, would that be 10,000 lines of code instead of 40,000 lines of code? And it would be able to perfectly serve those customers. but you know, that I didn't do enough experimentation to figure that out. Um, I, that's, I think one thing that I wish I had done a lot more was, pivot and do experiments. Jeremy: that septic example, do you know if it's a, a business in and of itself where it can actually support one person or a staff of people? Or is it, is that market just too small? Tom: I think it's still a solo bootstrapped project. yeah. And it's, it's so hard to tell whether a company's doing well or not. I could ask the person over DM. [00:21:58] Built the base technology before going public Jeremy: So when you were first starting. placemark. You were, you were doing it as a solo, developer. A solo entrepreneur, reallyyou worked on it for quite a while, I think before you announced, right? Like maybe a year or so? Tom: Yeah, yeah. Almost, almost a year, I think, maybe, maybe 10 months in the dark. Jeremy: I think that there's, there was a lot of overlap between the different directions that I would eventually go in and. So just building a collaborative editor that can edit map data fairly quickly and checks all the boxes of being able to import and export things, um, that is, was a lot of work. and I mean also I, I was, uh, freelancing during part of it, so it wasn't a hundred percent of my time. Tom: But that, that core, I think even now if I were to build something similar, I would probably still use that work. because that, whether you're doing the septic planning application or you're doing a general purpose kind of map editor or some kind of social application, a lot of that stuff will be in common. Um, and so I wanted to really get, like, to figure out that problem space and get a few solutions that I could live with. Jeremy: The base. libraries or technologies you were gonna pick to get the map and have the collaborative aspect. Those are all things you wanted to get settled first. And then you figured, okay, once I have this base, then I can go find the, you know, the, the, the customers or, or find the specifics of what I'm gonna build. Tom: Yeah, exactly. Jeremy: I I think you had said that going forward when you're gonna work on another project, you would probably still start the same way. [00:23:51] Geospatial is a tough industry, no public companies Tom: if I was working on a project in the geospatial space, I would probably heavily reference the work that I already did here. but I don't know if I'll go back to, to maps again. It's a tough industry. Jeremy: Is it because of the, the customer base? Is it because like people don't really understand the market in terms of who actually needs the maps? I'm kind of curious what you feel makes it tough. Tom: I think, well there are no, there are no public mapping companies. Esri is I think one of the 10 largest private companies in the us. but it's not like any of these geospatial companies have ever been like a pure play. And I think that makes it hard. I think maps are just, they're kind of like fonts in a way in which they are this. Very deep well of complexity, which is absolutely fascinating. If you're in it, it's enough fun and engineering to spend an entire career just working on that stuff. And then once you're out of it, you talk to somebody and you're just like, oh, I work on this thing. And they're like, oh, that you Google maps. Um, or, you know, I work at a font type like a, you know, a type factory and it's like, oh, do you make, uh, you know, courier in, uh, word. It's really infrastructure, uh, that we mostly take for granted, which is, that's, that means it's good in some ways. but at the same time, I, it's hard to really find a niche in which the mapping component is that, that is that useful. A lot of the companies that are kind of mapping companies. Like, I think you could say that like Strava and Palantir are kind of geospatial companies, both of them. but Strava is a fitness company and Palantir is a military company. so if you're, uh, a mapping expert, you kind of have to figure out what, how it ties into the real world, how it ties into the business world and revenue. And then maps might be 50% of the solution or 75% of the solution, but it's probably not going to be, this is the company that makes mapping software. Jeremy: Yeah, it's more like, I have this product that I'm gonna sell and it happens to have a map as a part of it. versus I'm going to sell you, tools that, uh, you know, help you make your own map. That seems like a, a harder, harder sell. Tom: yeah. And especially pro tools like the. The idea of people being both invested in terms of paying and invested in terms of wanting to learn the tool. That's, uh, that's a lot to ask out of people. [00:26:49] Knowing the market is tough but going for it anyways Jeremy: I think the things we had just talked about, about mapping being a tough industry and about there being like the low end is taken care of by Google, the high end is taken care of by Esri with ArcGIS. Uh, I think you mentioned in a blog post that when you started Placemark you, you, you knew all this from the start. So I'm kind of curious, like, knowing that, what made you decide like, I'm gonna, I'm gonna go for it and, you know, do it anyways. Tom: uh, I, well, I think that having seen, I, like I am a co-founder of val.town now, and every company that I've worked for, I've been pretty early enough to see how the sausage is made and the sausage is made with chaos. Like every company doesn't know what it's doing and is in an impossible fight against some Goliath figure. And the product that succeeds, if it ever does succeed, is something that you did not think of two or three years in advance. so I looked at this, I looked at the odds, and I was like, oh, these are the typical odds, you know, maybe someday I'll see something where it's, uh, it's an obvious open blue water market opportunity. But I think for the, for the most part, I was expecting to grind. Uh, you know, like even, even if, uh, the odds were worse, I probably would've still done it. I think I, I learned a lot. I should have done a lot more marketing and business and, but I have, I have no regrets about, you know, taking, taking a one try at solving a very hard to solve problem. Jeremy: Yeah, that's a good point in that the, the odds, like you said, are already stacked against you. but sometimes you just gotta try it and see how it goes, Tom: Yeah. And I had the, like I was at a time where I was very aware of how my life was set up. I was like, I could do a startup right now and kind of burn money for a little while and have enough time to work on it, and I would not be abandoning an infant child or, you know, like all of the things that, all the life responsibilities that I will have in the near future. Um. So, you know, uh, the, the time was then, I guess, [00:29:23] Being a solo developer Jeremy: And comparing it to your time at Mapbox and the other startups and, and I suppose now at val.town, when you were working on Placemark, you're the sole developer, you're in charge of everything. how did that feel? Did you enjoy that experience or was it more like, I, I really wish I had other people to, you know, to kind of go through this with, Tom: Uh, around the end I started to chat with people who, like might be co-founders and I even entertained some chats with, uh, venture capital people. I am fine with the, the day to day of working on stuff alone of making a lot of decisions. That's what I have done in a lot of companies anyway. when you're building the prototype or turning a prototype into something that can be in production, I think that having, uh, having other people there, It would've been better for my mentality in terms of not feeling like it was my thing. Um, you know, like feeling detached enough from the product to really see its flaws and really be open to, taking more radical shifts in approach. whereas when it's just you, you know, it's like you and the customers and your email inbox and, uh, your conscience and your existential dread. Uh, and you know, it's not like a co-founder or, uh, somebody to work with is gonna solve all of that stuff for you, but, uh, it probably would've been maybe a little bit better. I don't know. but then again, like I've also seen those kinds of relationships blow up a lot. and I wanted to kind of figure out what I was doing before, adding more people, more complexity, more money into the situation. But maybe you, maybe doing that at the beginning is kind of the same, you know, like you, other people are down for the same kind of risk that you are. Jeremy: I'm sure it's always different trade offs. I mean, I, I think there probably is a power to being able to unilaterally say like, Hey, this is, this is what I wanna do, so I'm gonna do it. Tom: Yeah. [00:31:52] Spending too much time on multiplayer without a business case Jeremy: You mentioned how there were certain flaws or things you may not have seen because you were so in it. Looking back, what, what were some of those things? Tom: I think that, uh, probably the, I I don't think that most technical decisions are all that important, um, that it never seems like the thing that means life or death for companies. And, you know, Facebook is still on PHP, they've fought, fixed, the problem with, with money. but I think I got rabbit holed into a few things where if I had like a business co-founder, then they would've grilled me about like, why are we spending? The, the main thing that comes to mind, uh, is real time multiplayer, real time. It was a fascinating problem and I was so ready to think about that all the time and try to solve it. And I think that took up a lot of my time and energy. And in the long term, most people are not editing a map. At the same time, seeing the cursors move around is a really fun party trick, and it's great for marketing, but I think that if I were to take a real look at that, that was, that was a mistake. Especially when the trade off was things that actually mattered. Like the amount of time, the amount, the amount of data that the, that could be handled at. At the same time, I could have figured out ways to upload a one gigabyte or two gigabyte or three gigabyte shape file and for it to just work in that same time, whereas real time made it harder to solve that problem, which was a lot closer to what, Paying customers cared about and where people's expectations were? Jeremy: When you were working on this realtime collaborative functionality, was this before the product was public? Was this something you, built from the start? Tom: Yeah. I built the whole thing without it and then added it in. Not as like a rewrite, but like as a, as a big change to a lot of stuff. Jeremy: Yeah, I, I could totally see how that could happen because you are trying to envision people using this product, and you think of something like Google Docs, right? It's very powerful to be typing in a document and see the other cursors and, um, see other people typing. So, I could see how you, you would make that leap and say like, oh, the map should, should do that too. Yeah. [00:34:29] Financial pressures of bootstrapping, high COL, and healthcare Tom: Yeah. Yeah. Um, and, you know, Figma is very cool. Like the, it's, it's amazing. It's an amazing thing. But the Figma was in the dark for way longer than I was, and uh, Evan is a lot smarter than I was. Jeremy: He probably had a big bag of money too. Right. Tom: Yeah. Jeremy: I, I don't actually know the history of Figma, but I'm assuming it's, um, it's VC funded, right? Tom: Uh, yeah, they're, they're kind of famous for just having, I don't think they raised that much in the beginning, but they just didn't hire very much and it was just like the two co-founders, or two or three people and they just kept building for long time. I feel like it's like well over three years. Jeremy: Oh wow. Okay. I think like in your case, I, I saw a comment from you where you were saying, this was your sole source of income and you gotta pay for your health insurance, and so you have no outside investments. So, the pressures are, are very different I think. Tom: Yeah. Yeah. And that's really something to on, to appreciate about venture capital. It gives you the. Slack in your, in your budget to make some mistakes and not freak out about it. and sadly, the rent is not going down anytime soon in, in Brooklyn, and the health insurance is not going down anytime soon. I think it's, it's kind of brutal to like leave a job and then realize that like, you know, to, to be admitted to a hospital, you have to pay $500 a month. Jeremy: I'm, I'm sure that was like, shocking, right? The first time you had to pay for it yourself. Tom: Yeah. And it's not even good. Uh, we need to fix this like that. If there's anything that we could do to fix entrepreneurship in this country, it's just like, make it possible to do this without already being wealthy. Um, it was, it was a constant stress. [00:36:29] Growth and customers Jeremy: As you worked on it, and maybe especially as you, after you had shipped, was there a period where. You know, things were going really well in terms of customers and you felt like, okay, this is really gonna work. Tom: I was, so, like, I basically started out by dropping, I think $5,000 in the business bank account. And I was like, if I break even soon, then I'll be happy. And I broke even in the first month. And that was amazing. I mean, the costs were low and everything, but I was really happy to just be at that point and that like, it never went down. I think that probably somebody with more, uh, determination would've kept going after, after I had stopped. but yeah, like, and also The people who used Placemark, who I actually chatted with, and, uh, all that stuff, they were awesome. I wish that there were more of them. but like a lot of the customers were doing cool stuff. They were supportive. They gave me really informative feedback. Um, and that felt really good. but there was never a point at which like the, uh, the growth scale looked like, oh, we're going to hit a point at which this will be a sustainable business within a year. I think it, according to the growth when I left it, it would've been like maybe three years until I would've been, able to pay my rent and health insurance and, live a comfortable life in, in New York. Jeremy: So when you mentioned you broke even that was like the expenses into the business, but not for actually like rent and health insurance and food and all that. Okay. Okay. can you say like roughly how much was coming in or how many customers you had? Tom: Uh, yeah, the revenue initially I think was, uh, 1500 MRR, and eventually it was like 4,000 or so. Jeremy: And the growth was pretty steady. [00:38:37] Bootstrapping vs fundraising Tom: Um, so yeah, I mean, the numbers where you're just like, maybe I could have kept going. but it's, the other weird thing about VCs is just that I think I have this rich understanding of like, if you're, if you're running a business that will be stressful, but be able to pay your bills and you're in control of it, versus running a startup where you might make life changing money and then not have to run a business again. It's like the latter is kind of better. Uh, if stress affects you a lot, and if you're not really wedded to being super independent. so yeah, I don't know between the two ways of like living your life, I, I have some appreciation for, for both. doing what Placemark entailed if I was living cheaply in a, in a cheap city and it didn't stress me out all the time, would've been a pretty good deal. Um, but doing it in Brooklyn with all the stress was not it, it wasn't affecting my life in positive ways and I, I wanted to, you know, go see shows at night with my friends and not worry about the servers going down. Jeremy: Even putting the money aside, I think that's being the only person responsible for the app, right? Probably feels like you can't really take a vacation. Right. Tom: Yeah, I did take a vacation during it. Like I went to visit my partner who was in, uh, Germany at the time, and we were like on a boat, uh, between Germany, across the lake to Switzerland, and like the servers went down and I opened up my laptop and fixed the servers. It's just like, that is, it's a sacrifice that people make, but it is hard. Jeremy: There's, there's on call, but usually it's not just you 24 7. Tom: Yeah. If you don't pick up somebody else [00:40:28] Financial stress and framing money spent as an investment Jeremy: Yeah, yeah, yeah, I guess at what point, because I'm trying to think. You started in 2021 and then maybe wrapped up, was it sometime in 2024? Tom: Uh, I took a job in, uh, I, I mean I joined val.town in the early 2023 and then wrapped up in November, 2023. Jeremy: At what point did you really start feeling the, the stress? Like I, I imagine maybe when you first started out, you said you were doing consulting and stuff, so, um, probably things were okay, but once you kind of shifted away from that, is that kind of when the, the, the worries about money started coming in? Tom: Yeah. Um, I think maybe it was like six or eight months, um, in. Just that I felt like I wasn't finding, uh, like a, a way to grow the product without adding lots of complexity to it. and being a solo founder, the idea of succeeding, but having built like this hulking mess of a product felt just as bad as not succeeding. like ideally it would be something that I could really be happy maintaining for the long term. Uh, but I was just seeing like, oh, maybe I could succeed by adding every feature in QGIS and that's just not, not a, not something that I wanted to commit to. but yeah, I don't, I don't know. I've been, uh, do you know, uh, Ramit Sethie he's like a, Jeremy: I don't. Tom: an internet money guy. He's less scummy than the rest of them, but still, I. an internet money guy. Um, but he does adjust a lot of stuff about like, money psychology. And that has made me realize that a lot of what I thought at the time and even think now is kind of a rational, you know, like, I think one of the main things that I would do differently is just set a budget for Placemark. Like if I had just set away, like, you know, enough money to live on for a year and put that in, like the, this is for Placemark bucket, then it would've felt better to me then having it all be ad hoc, month to month, feeling like you're burning money instead of investing money in a thing. but yeah, nobody told me, uh, how to, how to think about it then. Uh, yeah, you only get experience by experiencing it. Jeremy: You're just seeing your, your bank account shrinking and there's this, psychological toll, right? Where you're not, you're not used to that feeling and it, it probably feels like something's wrong, Tom: Yeah, yeah. I'm, I think it, I'm really impressed by people who can say, oh, I invested, uh, you know, 50 or a hundred thousand dollars into this business and was comfortable with that risk. And like, maybe it works out, maybe it doesn't. Maybe you just like threw a lot of money down into that. and the people, I think with the healthy, productive, uh, relationship with it. Do think of it as like, oh, I, I paid for kind of a bet on a risk. and that's, that's what I was doing anyway. You know, like I was paying my rent and my health insurance and spending all my time working on the product instead of paying, uh, freelance work. but if you don't frame it that way, it doesn't feel like an investment. It feels like you're making a risky gamble. Jeremy: Yeah. And I think that makes sense to, to actually, I think, like you were saying, have a separate account or a separate thing set aside where you are like, this is, this is this money for this purpose. And like you said, look at it as an investment, which with regular investments can go down. Tom: Yeah, exactly. Yeah. Jeremy: Yeah [00:44:26] In hindsight might have raised money or tried smaller bets Jeremy: Were there, there other things, whether technical or or business wise, that, that if you were to to do it again, you would do differently? Tom: I go back and forth on whether I should have raised venture capital. there are, there's kind of a, an assumption in venture capital that once you're on it, you have to go the whole way. You have to become a billion dollar company, uh, or at least really tell people that you're going to be a billion dollar company and I am not. yeah, I, I don't know. I've seen, I've seen other companies in my space, or like our friends of my current company who are not really targeting that, or ones who were, and then they had somewhere in between the billion dollar and the very small outcome. Uh, and that's a little bit of a point in the favor of accepting a big pile of money from the venture capitalists. I'm also a little bit biased right now because val.town has one investor and he's like the, the best venture capitalist that I have ever met. Big fan. don't quote me on that. If he sacks me in like a year, we'll see. Um, but uh, yeah, there, I, I think that I understand more why people take that approach. or I've understood more why people take like the venture capital but not taking $300 million from SoftBank approach. yeah, and I don't know, I think that, trying a lot of things also seems really appealing. Uh, people who do the same kind of. of Maybe 10 months, but they build four or five different products or three different products instead of just one. I think that, that feels, feels like a good idea to me. Jeremy: And in doing that, would that be more of a, like as a solo entrepreneur or you, you're thinking you would take investment and then say, I'm gonna try all these things with, with your money. Tom: Oh, I've seen both. I, that I, yeah, one friend's company has pivoted like four times between very different ideas and yeah, it, it's one way to do it, but I think in the long term, I would want to do that as a solo developer and try to figure out, you know, something. but yeah, I, I think, uh, so much of it is mindset, that even then if I was working on like three different projects, I think I. My qualifications for something being worth, really adopting and spending all my time doing, you just have to accept, uh, a lot of hits and a lot of misses and a lot of like keeping things alive and finding out how to turn them into something. I am really inspired by my friends who like started around the same time that I did and they're not that much further in terms of revenue and they're like still, still doing it because that is what they want to do in life. and if you develop the whole ecosystem and mindset around it, I think that's somewhere that people can stay and, and be happy. just trying to find, trying to find a company that they own and control and they like. Jeremy: While, while making the the expenses work. Tom: Yeah. Yeah. that's the, that's the hard part, like freelancing on the side also. I probably could have kept that up. I liked my freelance clients. I would probably still work with them as well. but I kind of just wanted the, I wanted the focus, I wanted the motivation of, of being without a net. Jeremy: Yeah, I mean, energy wise, do you think that that would've worked? I mean, I imagine that Placemark took a lot of your time when you were working full time, so you're trying to balance, you know, clients and all your customers and everything you're doing with the software. It just feels like it might be a lot. Tom: Yeah. Yeah. Maybe with different freelance clients. I, I loved my freelance clients because I, after. leaving config. I, I wanted to work on climate change stuff and so I was working for climate change foundations and that is not the way to max out your paycheck. It's the way to feel good about your conscience. And so I still feel great about those projects, but in the future, yeah, I would probably just work for, uh, you know, a hedge fund or something. [00:49:02] Marketing to developers but not potential customers Jeremy: I think something you mentioned in one of your posts is that you maybe could have spent more time or had a different approach with marketing. Maybe you could kind of say what you did do and then what maybe worked and what didn't. Tom: Yeah. So I like my sweet spot is writing documentation and blog posts and technical stuff. And so I did a lot of that and a lot of that like worked in a way that didn't matter. I am at this point, weirdly good at writing stuff that gets on Hacker News. I've written a lot of stuff that's gotten to the top of Hacker News and unfortunately, writing about your technical approach and your geospatial project for handling errors, uh, in your JavaScript code is not really a way to get customers. and I think doing a lot of documentation was also great, but it was also, I think that the, the thing that was missing is the thing that I think Mapbox does fairly well now, in which the homepage really pushes you toward use cases immediately. and I should have been saying to each customer who had anything compelling as a use case, like, let's write an article about you and what you're doing, and here's how you use this in your industry. and that probably would've also been like a good, a good way to figure out which of those verticals was the one that was most worth spending all the time on. yeah. So it, it was, it was a lot of good marketing to nerds. and it could have been better in terms of marketing to actual customers and to people who are making the buying decisions. Jeremy: Yeah. Looking at the, the Placemark blog, I can definitely see how as a developer, a lot of the posts are appealing to me, right? It's about how you worked on a technical challenge or decisions you made, but maybe less so to somebody who they wanna. Draw a map to manage their crops. They're like, I don't care about any of this. Right. Tom: Yeah, like the Mapbox blog used to be, just all that stuff as well. We would write about designing protocol buffer layouts, and it was amazing for hiring and amazing for getting nerds in the door. But now it's just, Toyota is launching with, Mapbox Maps or something like that. And that's, that's what you, you should do if you're trying to sell a product. Jeremy: Yeah. And I think the, the sort of technical aspect, it makes sense too. If you're venture funded and you are looking to hire, right? You wanna build your team and you just want to increase like, the amount of stuff you're building and not worrying so much about, am I gonna have a paycheck next Tom: Yeah. Yeah. I, I just kind of do it because it's fun, which is not the right reason to do it, but, Yeah, I mean, I still write my blog mostly just because it's, it's a fun thing to do, but it's not the best way to, um, to run a business. Jeremy: Yeah. Well, the fun part is important too though. Tom: Yeah. Yeah. That's, that's maybe the whole thing. May, that's maybe the most important thing, but you can't do it if you don't do the, the money part. [00:52:35] Most customers came from existing audience Jeremy: Right. So the people who did find you, was it mostly word of mouth from people who did identify with the technical posts, or were there places that surprised you, that people found you? Tom: Uh, a lot of it was people who were familiar with the Mapbox ecosystem or with, with me. and then eventually, yeah, a few of the users came in through, um, through Hacker News, but it was mostly, mostly word of mouth also. The geospatial community is like fairly tight and it's, and it's not too hard to be the person who writes the article about some geospatial challenge that everyone finds. Jeremy: Hmm. Okay. Yeah, that's a good point about like being in that community, especially since you've done so much work in geospatial and in open source that you have this little, this built-in audience, I guess. Tom: yeah. Which I appreciate. It makes me nervous, but yeah. [00:53:43] Val.town marketing to developers Jeremy: Comparing that to something like val.town, how is val.town marketing? How is it finding users? 'cause from what I can tell, it's, it's getting a lot of, uh, a lot of people coming in, right? Tom: Yeah. Uh, well, right now our, our kind of target user, or the user that we think of is a hobbyist, is somebody who's, sometimes a pro developer or somebody, sometimes just somebody who's really interested in the field. And so writing these things that are just about, you know, programming, does super well. Uh, but it, we have exactly the same problem and that that is kind of being revamped as we speak. uh, we hired somebody who actually knows marketing and has a good sense for it. And so a lot of that stuff is shifting to show you what you can do with val.town because it, it suffers from the same problem as well. It's an empty text field in which you can type, type script, code, and it runs. And knowing what you can do with that or what you should do with that is, is hard if you don't have a grasp of TypeScript and web applications. so pretty soon we'll have pages which are like, here's how to connect linear and GitHub with OW Town, or, you know, two nouns connect them, for all of those companies and to do automations and all these like concrete applications. I think that's, you have to do it. You have to figure it out. Jeremy: Just briefly for someone who hasn't heard of val.town, like what, what does it do? Tom: Uh, val.town is a social website, so it has comments and likes and all of that stuff. but it's for writing these little snippets of TypeScript and JavaScript code that run. So a lot of them are websites, some of them are automations, so they receive emails or send emails or connect one service to another. And yeah, it's, it's like combining some aspects of, GitHub or like a code platform, uh, but with the assumption that every time that you save, everything's instantly deployed. Jeremy: So it's maybe a little bit like, um, like a glitch, I guess? Tom: Uh, yeah. Yeah, it takes a lot of experience, a lot of, uh, inspiration from Glitch. Jeremy: And I, I think, like you had mentioned, you enjoy writing the, the technical blog posts and the documentation. And so at least with val.town, your audience is developers versus, the geospatial community who probably largely doesn't care about, TypeScript and the, the different technical decisions there. Tom: Yeah, it, it makes it easier, that's for sure. The customer is, is me. [00:56:30] Shifting from solo to in-person teams Jeremy: Nice. Yeah. Looking at, you know, you, you worked as a, a solo developer for Placemark, and then now you've got a team of, is it like maybe five Tom: Uh, it is seven at the moment. Jeremy: Seven people. Okay. Are you all in person or is it, remote Tom: We all sit around two tables in Brooklyn. It's very nice. Jeremy: So how did that feel? Like shifting from, I'm in, I don't know if you worked from home while you were working on Placemark or if you were in coworking spaces, but you're, you're shifting from I'm like in my own head space doing everything myself to, to, I'm in a room with all these people and we're like working on this thing together. I'm kind of curious like how that felt for you. Tom: Yeah, it's been a big difference. And I think that I was just talking with, um, one, one of our, well an engineer at, at val.town about how everyone kind of had, had been working remote for obvious pandemic world reasons. And this kind of privilege of just being around the same table, if that's what you like is, a huge difference in terms of, I just remember having to. Trick myself into going on a walk around the block because I would get into such a dark mental head space of working on the same project for eight hours straight and skipping lunch. and now there's a little bit more structure. yeah, it's, it's been, it's been a overall, an improvement. Some days I wish that I could go on a run at noon 'cause that's the warmest time of the day. but, uh, overall, like it makes things so much easier. just reading the emotions in people's faces when they're telling you stuff and being able to, uh, not get into discussions that you don't need to get into because you can talk and just like understand each other very quickly. It's, it's very nice. I don't wanna force everyone to do it, you know, but it it for the people who want it, they, they, uh, really enjoy it. Jeremy: Yeah. I think if you have the right set of people, it's definitely more enjoyable. And um, if you don't, maybe not so Tom: Yeah, we haven't hired any, like, extremely loud chewers yet or anything like that, but yeah, maybe my story will change. Jeremy: No, no one microwaving fish. Tom: No, there's, uh, yeah, thankfully the microwave is outside of the office. Jeremy: Do you live close to the office? Tom: Yeah. Yeah. Like most of the team is within a 20 or 30 minute walk of the office and it's very fortunate. I think there's been something of a mass migration to New York. A lot of us didn't live in New York before four years ago, and now all of us do. it's, it's, uh, it's very comfortable to be here. Jeremy: I think that makes, uh, such a big difference. 'cause I think the majority of people, at least within the US you know, you're, you're getting in your car, you're sitting in traffic. and I know people who, during the pandemic, they actually moved further, right? Because they went, oh, like, uh, I don't need to come into the office. but yeah, if you are close enough where you can walk, yeah, I think that makes a big difference. Tom: Oh yeah. If I had to drive to work, I think my blood pressure would be so much higher. Uh, especially in New York. Oh, I feel so bad for the people who have to drive, whereas I'm just walking with, you know, a bagel in hand, enjoying listening to the birds. Jeremy: Yeah. Yeah. well now they have, what is it, the congestion pricing in Tom: Yeah. Yeah. We're all in Brooklyn, so it doesn't affect us that much, but it's supposedly, it's, it's working great. Um, yeah. I hope we can keep it. Jeremy: I've never driven in New York and I, I wouldn't want to Tom: Yeah. It's only for the brave or the crazy. [01:00:37] The value of public writing and work Jeremy: I think that's probably a good place to, to wrap up, but is there any other thoughts you had or things you wanted to mention? Tom: No, I've just, uh, thank you so much. This has been, this has been a lot of fun. You're, you're very good at this as well. I feel like it's, uh, Jeremy: Thank you Tom: It's not easy to, to steer a conversation in a way that makes awkward people sound, uh, normal. Jeremy: I wouldn't say that, but um, what's been actually pretty helpful to me is, you have such a body of work, I guess I would say, in terms of your blogging and, just the amount that you write and the long history of projects that, that there's, you know, there's a lot to talk about and I'm sure it helps, helps your thought process as well. Tom: Yeah. I, I've been lucky to have a lot of jobs where people, where companies were like, cool with publishing everything, you know? so a lot of what I've done is, uh, is public. it's, it's, uh, I'm very, very thankful for like, early on that being a big part of company culture. Jeremy: And you can definitely tell, I think for people who look at the Placemark blog posts or, or now your, your val.town blog posts, like there's, there's a clear difference when somebody like is very intentional and, um, you know, it's good at writing versus you're doing it because, um, it's your corporate responsibility or whatever, like people can tell. Yeah. Tom: Yeah. You can't fake being interested. so you gotta work on things that are interesting. Jeremy: Tom, thanks again for, for agreeing to chat. This was fun. Tom: Yeah thank you so much.
undefined
Jan 16, 2025 • 1h 7min

Paul Frazee on Bluesky and ATProto

Paul Frazee, CTO of Bluesky and former developer of the Beaker browser and Secure Scuttlebutt, shares insights into building decentralized social networks. He discusses the journey behind Bluesky and its ATProto, focusing on user autonomy and data integrity. The conversation covers the challenges of content moderation and lessons learned from past peer-to-peer projects. Paul also highlights the importance of scalable architecture and innovative user identification. Tune in for a deep dive into the future of social media!
undefined
Nov 30, 2023 • 28min

Mayra Navarro on Getting to RubyConf (RubyConf 2023)

Mayra Navarro is an organizer of WNB.rb and Ruby Perú. Mayra shares how the Ruby community helped her get to RubyConf, going from project manager to developer, and the different ways people learn and communicate. This is the final interview recorded at RubyConf 2023 in San Diego. -- Mayra's Github Peruvian Digital Platform Codeable bootcamp Groups Ruby Perú WNB.rb Atlanta Ruby People Cody Norman Stefanni Brasil of hexdevs Dave Kimura of Drifting Ruby Conferences RubyConf RailsConf -- Transcript You can help correct transcripts on GitHub. [00:00:00] Jeremy: I hope you've been enjoying the conversations from RubyConf. Before we get started. I just want to say thank you to everyone. I met at the conference, all the guests who were so generous with their time. And to Irene from RubyConf for arranging a space and helping me connect with guests. [00:00:16] This final interview is with Mayra Navarro. She's an organizer of the Ruby community in Lima, Peru and a member of the women and non-binary Ruby online community. She's going to tell us how the community pulled together. Both friends of hers and strangers she had never met. To get her to RubyConf this year, we start the story in April where she's just finished attending the Ruby on rails conference, RailsConf in Atlanta. Getting to RubyConf [00:00:44] Mayra: So the thing is, in the last RailsConf, the last day that I was in the US, um, the day that I was returning to Peru, I got fired. (laughs) Yes. So I was with all the stress. [00:01:01] All the luggages that I had to pick or maybe overweight or something like that. And then I received that [00:01:08] Jeremy: Oh my gosh. [00:01:09] Mayra: And I, oh my gosh, what I can do? [00:01:13] Jeremy: That's terrible. [00:01:15] Mayra: It was awful. Since that, I think it was May 1st or something like that. And I was looking for job like everybody else who were fired all these times It was a difficult time for me. my plan was just before September I get a job so I have enough money and not using my, my savings for, for going to the RailsConf. Sorry, the RubyConf. but, uh, eventually at the end of September, October, I didn't get anything. [00:01:44] I'm Christian. So I, well, God doesn't want me to come here. [00:01:50] it there must be a reason, but there was something inside me that. I just to have to do something else. and I thank to my mom because she's someone that is always fighting for what she wants. [00:02:02] So I say, okay, I went to sleep that day that I say, okay, maybe I don't want to go. So next day I have the idea. Maybe you didn't use your last card. There is something else. That is something that I have from my mom. I can feel that. I say, what if you ask for money? Well, like a fundraising, I learned about that word later, and I say what, what else could you lose you don't have anything to lose right now. [00:02:30] So I say, okay, I'm going to write something. I asked Cody Norman, that is someone that I really appreciate right now. I asked him about suggestions, if that's a good idea or no, maybe not. He said, yes, you can do it. Uh, and I asked him if he could help me with the speech because I tried to write something and also I'm not good at writing things on Twitter and especially asking for money because I had to be open myself and be vulnerable to do that. [00:03:02] And it was like, uh, the last break for myself [00:03:05] a... I sent the speech to Cody, he helped me to update some things that I have to just improve. And I did it. I, also, I didn't know how to open a GoFundMe campaign because that's only for the US and Mexico. I think it doesn't exist in Peru. [00:03:23] So he said, Oh, there is another page that you can go. [00:03:27] I did it. So I just published that. I didn't open that until three, four hours later, because I was like, no, I don't want to see. And then I, I open it and I started to contact with the people who. [00:03:44] Well, who knows me because I like to be connected with a lot of people. I'm part of the FL RB even being in Peru, I am part of the FL RB. I go attend to the Atlanta Ruby Group. I go, I, I know a lot of people because of the conference. I try to help to the woman and non binary community also. I am organizer of the Ruby Peru, but I didn't want to ask them money for them, but I have some close friends from the conference that I, that I go for all. [00:04:13] All these two years. So they helped me to share that. And in two days, I got the money. [00:04:20] It was like a, I can't believe it. It is what, and I'm not good open myself for things like that. I love helping people, but it's difficult when I, you have to help yourself. So. All these people who I could see their names because it's, it's transferred to PayPal. [00:04:39] So I could see their name is like, uh, I really appreciate the thing that they don't know me. Some people, they don't know me, but, but I know them. I know who you are, if you're listening to this and I thank you appreciate for doing that. I also had the opportunity because I need to talk about this. I got a ticket from HexDev, uh, from Stephanie. [00:05:01] Jeremy: Oh... Hexdevs. Yeah. [00:05:02] Mayra: Yes. And, also I applied for the volunteer positions, just in case But I got the volunteer position. So what I did is, besides all my expense, I mean, that trip and also the hotel, expenses, I don't know, does it work? I, I said, I'm going to give this ticket that I have left to one of the women in the wnb community. [00:05:25] So I did that and say, I have a ticket and also I can share the room. I don't want to say her name because She's trying not to be too connected to social media, but she, she accepted sharing the room with me. [00:05:40] Jeremy: Yeah. [00:05:43] Mayra: So she's already here with me and I feel so happy because People not only helping me, they helping me to help and it feel like, wow. [00:05:53] Yeah. And that is, that is my story. And I still, well, I accepted to come to talk about this today because I received a job offer in the morning that I accepted. [00:06:06] So I wanted to. Send a happy ending for all my story about this. Yeah. And especially because I know that in the next conference I'll be with my own money, I could say, expenses. Asking for help [00:06:20] Jeremy: So was this all this year, the RailsConf? That was this year where you, you went to the conference, you enjoyed the talks and you were employed. And so it was the day that it was over that you, you found out that you got laid off. Wow. it's this, you have this high, right, you've met all your friends and, you know, you're learning all this stuff and you're really excited. [00:06:42] And then you get this notice and it's like, what, what happened? [00:06:47] Mayra: Yeah. That is what's happening. [00:06:50] Jeremy: then you, you kind of, like you said, you opened yourself up but yeah, it takes courage to say like, Hey, I need, you know, I need some help. [00:06:57] Mayra: Yeah, it's, this is something that I learned about this is always asking for help. This is something that I have been I bring into my life is always asking for help. I know as a woman, uh, as a woman, I have the thing that Try to be strong sometimes. I can do it by myself. I don't need help. [00:07:16] I don't need help, but sometimes you need, as human, you can open yourself. It's not something related to [00:07:23] gender it's more like humanity. That's how I feel right now. That is the feeling that I have and that is what is going to keep with me. For the rest of my life, I know. (laughs) [00:07:33] Jeremy: Yeah. Cause I, I think when, when people don't know, they, they might assume because you're so involved, right. With your, your local community and then the community internationally where people just assume that, Hey, Mayra is doing great. Right? She's, she's got no problems, no issues, and, there's just no way for people to know unless, you know, you, you share, and then that way people can help you, [00:07:58] Mayra: Yes. [00:07:59] Jeremy: That's a great story. I'm, I'm, I'm glad that, like, getting laid off is never good, right? That's never fun. But at least... Uh, things positive came out of it in terms of people coming out to support you, but also like you said, being able to support another, you know, another member of our community, [00:08:19] Mayra: And I would do it, and I would do it again. [00:08:21] I know that. Attending conferences [00:08:23] Jeremy: You know, now that you've gotten to come out, how, how has the, the conference been for you? Like, [00:08:29] Mayra: It was really good. I feel less insecure than the last time that I was here. (laughs) Actually, my first RubyConf was RubyConf Mini in Rhode Island. So this is like a, the Ruby real not the real one. It's just my more it's different. [00:08:48] Mayra: But, uh, the same time is. It's closer. That's how I feel it. I mean, this is my fifth conference. My first one was in Colombia. RubyConf Colombia. And I got a ticket as a scholarship. but until now I can say that it's like a, the feeling of the Ruby community, not only Ruby on Rails. Ruby community is like a It's really positive. It has changed my life so much since the first time that I joined to community that it's, I'm so happy to be developer instead of what, you know, everybody switched jobs. [00:09:20] I did too so it's like, uh, I won't regret. From project manager to developer [00:09:24] Jeremy: Hmm. Tell, tell me a little bit about that. How did you get interested in Ruby or, or have been involved with the community? [00:09:31] Mayra: I wasn't, I it was because money. Because it is. [00:09:34] Jeremy: That's a good reason. That's a good reason. [00:09:36] Mayra: Yeah. The thing is, I am graduated, uh, of the university. Uh, in Peru, but I was project manager before, well, I've switched a lot of careers because I was looking what I wanted to do and eventually I was project manager also. but I hated me in that position wasn't really good and it wasn't the company. [00:09:59] I knew it was me. I wasn't satisfied with my job and also I didn't like that, uh, working from nine to six every day in an office or something like that. It wasn't for me. So I remember that someone, one friend on Facebook shared something like a bootcamp that was about to start in Peru, that they were teaching Ruby on Rails. [00:10:21] I didn't know what was Ruby on Rails at the time. And then, and also React and JavaScript because, and you have to pay only if you get your first job. [00:10:32] Jeremy: Oh, this is like a bootcamp. [00:10:34] Mayra: Yeah, it's bootcamp, the first one that I met, I know, but that time there was someone called Laboratoria, but it's only related to JavaScript, but this one's a little bit more complete. So I apply, I didn't know that I could make it. I did it and it was an intensive bootcamp, six months from nine to six, but also I remember I didn't leave. [00:10:59] The place until 11 o'clock PM, because we were all 19 people there. So we really wanted to change our future. When everything ends, there was a moment when I, I could feel that Ruby also, especially Ruby on Rails, it was like, uh, something that I really like, uh, the syntax. Things like that. And also our teachers used to say, I can see who could be backend, who's going to be frontend. [00:11:31] I consider myself full stack, but she, she used to say that. I remember that. [00:11:36] Jeremy: So, which one were you? [00:11:38] Mayra: I was the Ruby side. The backend. [00:11:43] Mayra: Then I got an internship in the same companies who that was promoting the, the bootcamp. After three months, the, the, the internship ended because it was part of the contract but I wanted really to work in a place that they had Ruby on Rails. [00:12:05] So that's what I got. It took me more time than the rest of my friends, it's maybe it was like. four of us got a job in Ruby on Rails, uh, and I got mine. I remember my first job, full time job for Ruby on Rails was for the government of Peru. Actually, they use, they use Ruby on Rails for, CMS that they manage, that is called gov.pe. So I started working there. So it was a nice experience and I love, and I learned a lot about that experience also. So that is my story how it started. [00:12:44] Jeremy: Yeah. So you had talked about your friend and your friend referred you to the bootcamp, had you ever done any programming or anything related? [00:12:54] Mayra: When I was project manager, I had the opportunity to, to manage developers. I have developers in charge. So, but I had the kind of person that even I was. Your product manager, I try to help you to solve some things, like something that I say is a pseudocode. Instead of coding, I tried to give you the pseudocode that the things that you could do. So with that, maybe I can help. Well, my, my goal wanted to help you to unlock if you, you, you got stuck in something. [00:13:30] So with that, I just have a little bit of knowledge of what to do, but I. I felt that I hadn't the tools or I hadn't the skill to do that. That's why I decided to, to study in a bootcamp because they can teach me about the, that kind of tools because I couldn't study by myself. I couldn't. I can understand how the things goes right now, but at that time I, I was, I was lost. [00:13:56] Jeremy: But that's like, the, the skill that you already had as a project manager, being able to write the pseudocode and, and talk to your developers about the type of problem they're, they're trying to solve. That, that's a really important skill already. So I think, like, going into the bootcamp, nothing was totally new. [00:14:15] Right? I, I think that's really great that, that you got to see that beforehand and, and hopefully get a sense for like, that you might. Enjoy this, this sort of thing too, right? [00:14:25] Mayra: I love solving puzzles, so that was puzzle for me. I started with Code Wars. I know everybody started with that, but it's like a resolving puzzles. I need something there. And one of the things I really love I love helping people. I discovered that when I was helpdesk before. eventually all this time, even this time without job, I realized I can bring that. [00:14:47] Oh, I being more. aware about that I can help people just coding. So that's, that's what I've been doing all these months because I try to understand about gems or learning things more, but my focus is always going to be helping people. [00:15:03] Jeremy: I, I think that's really great that that's something that really appeals to you because that's something everybody needs. [00:15:12] Mayra: You know, the word that comes to my mind all the time that I say this is server. If you think about the word server, it's what I do. It receives something and gives you something. It's all the time. It's, you know, I know it's a machine or something like that, but if you think about the word, you are receiving something and giving something. [00:15:36] In all the time you are waiting for a, for a request to give something. So that is the word that it's, is for me, it's kind of helping people. [00:15:45] Jeremy: Hmm. we all serve one another in, in, in one way or another. Yeah. the boot camp, you said it was, uh, six months, and... You were, you were staying till 11 at night. So what, what was that experience like? How different people communicate and learn [00:16:02] Mayra: it's a nightmare, don't do it. (laughs) No, it was fun. that bootcamp changed my life. Before that bootcamp, I would say, I'm not going to say I was lack of some of the skills. The thing is, I didn't know how to, how to communicate. And one of the things that I learned besides that you need English or things like that, it's more about how to communicate with people because, there are multiple ways. [00:16:28] You can't talk in a way, for example, there's something that I'm always going to remember about it is you would prefer. WhatsApp, for example, and I would prefer Slack, or maybe voice, voice records instead of writing, or maybe an email. So, there must be a point between you and me. that might help us to communicate in a good way. [00:16:53] I, me, myself, especially, don't have to be forced people to do it in my way. It's a way of two, for two, you and me, and the best way, and try to do the same for the rest of the team, for example, or the rest of the people. Maybe they don't prefer this in this way, maybe another way. I have to be open to that. [00:17:12] Before the bootcamp I didn't know anything about it, but, and I tried to do it in my way. And then, right, thinking about it, it was a little bit selfish, but I need to learn. I need to be aware about [00:17:25] Jeremy: You're, you're referring to the way people. Communicate, or the ways people learn, or... [00:17:31] Mayra: Communicate actually. If we talk about the people, how the people learn. I am the example of, I am bad at listening book, podcasts. [00:17:41] Jeremy: Uh... Oh (laughs)... [00:17:42] Mayra: I'm sorry, I, I have ADHD. So it's hard for me to follow videos and podcasts because I have to. Pause, uh, Rewind, and Play again. And this is something when I miss so many ideas. So I prefer reading blogs or maybe transcriptions because it helps me just do reading. [00:18:04] And then return and continue reading when I can't understand something. it was difficult for me just to understand also that people prefer videos. Yes, it's not only my way to learn, it's their way to learn. And also we need to be open to that. Even when I used to, I mentor a couple of... people So, I had to be open to that also. [00:18:29] I am the kind of person you can write me at 2 a. m. and if I am awake, I'm going to answer you because maybe you are desperate for an answer at the time, but I can understand there are no people who are not. Who doesn't like, don't like that, so I try to be open to that or maybe improve our communications or maybe give some rules and not to think that everything is personal, right? [00:18:56] It's just, I hope the best of you. And I try that you get the best of me. (laughs) [00:19:03] Jeremy: Yeah, it's understanding their expectations, what they feel comfortable with, so that you know, It's okay if I send Mayra a message at 2 AM, but if I send someone else a message at 2 AM, it, it, maybe their phone dings and, you know, now they're distracted and, yeah, so, that, that makes sense. [00:19:26] Mayra: Yeah. If you, for example, I, I, I'm going to ask you because I learned that, can I send you a message at that time? But I, even I have to think about the time zone. [00:19:36] Jeremy: Yeah, oh, that's true, that's true. [00:19:38] Mayra: For example, because now I, I would have think about just my friends from Peru, but now I have friends all over the world because of the Woman Non-binary community. [00:19:49] So I have to think about things like that when I write. So what, all the things that has been useful for me is, for example, is like when you schedule a message that has been useful for me when I have to ask or sending messages. [00:20:03] Jeremy: It's interesting that you mentioned how with learning you prefer blogs and, and books and things like that because this may be a generalization, but maybe with, newer developers or, or younger people, a lot of them really like the video form Yesterday I was, interviewing, David Copeland, who he, he wrote a book about, sustainable web development with Ruby on Rails, and, yeah, we were talking a little bit about, it's like, so many people want video, is there a, is there still a place for me with my, you know, my, my book? [00:20:39] And stuff like that, and I think it's important to remember that there's people like yourself, and, um, I, I'm partly the same way, like, I like to be able to have the text so that I can read it at my own pace and copy and paste stuff and stuff like that. But you're right that everybody learns differently, so it, it makes sense for there to be the videos, for there to be, podcasts and blog posts. [00:21:05] There's different people who learn different ways [00:21:07] Mayra: And also, some people, including me, needs to pay for something if you want to learn something, because sometimes when it is free, you won't have the value that [00:21:22] Mayra: it has. [00:21:23] Jeremy: I totally understand that, yeah. Accessibility for videos and podcasts [00:21:26] Mayra: And there is something I would like to mention after you talk about this is I open to videos and podcasts, I can't take my time because I have now a lot of friends who, who create this type kind of content, but I like to remember that there are people with other difficult things. [00:21:45] No, it's, it's related to accessibility because when you have deaf people, they need. Transcriptions, they need closed captions. So maybe you are in a place with a lot of noise and it can help you, even if the video has closed captions, so people can read it. So it is something like, it's not me. It's more to be more open to people who are really has a disability. [00:22:12] That's a word that was like, so yeah. Or maybe they, their main language is not English and a lot of the content or the majority of the content about coding is in English. So when you have the transcriptions or you have the blog, you can translate it. And it's easy for that is access to them. [00:22:31] Jeremy: Yeah, that's definitely true. And I think even past people who aren't native speakers or have a disability, if you aren't in either of those categories, there's still a lot of people who they want to have a transcript or outside. Ruby or programming, people who watch movies and TV shows, a lot of them turn on the subtitles and they're native English speakers. [00:22:55] The dialogue is in English. They still want to see the subtitles. [00:22:59] Yeah, I think it's becoming very common. So, to your point when you have video having a way for people to still get the information another way, I think is helpful for everyone. Yeah. [00:23:15] Mayra: And it isn't too difficult nowadays because now you have AI or maybe, programs that can get you the, at least the closest words. [00:23:24] Jeremy: It gets you maybe 90 percent of the way there, so it definitely saves time, but I will say it is still work. [00:23:33] Mayra: Yes. Yes. It still work. Actually, it's because it could be a couple of words that need that maybe the, the program needs to improve. You can improve how the program, uh, translated, but it is something behind the meaning that you still can keep. [00:23:49] Jeremy: It being there and not perfect is kind of better than having nothing, I guess, yeah. What's next [00:23:55] Jeremy: Now that you've spent time at RubyConf, like, what are, like what are your plans next? [00:24:02] Mayra: There is a story behind all this, but I'm going to, the TLDR [00:24:06] Jeremy: Okay. [00:24:08] Mayra: Could be easy because I have a plenty of time without working to make a lot of thoughts in my mind. So it's just like, uh, I would like to explore more about Ruby, Ruby without Rails, something like that. So one of the things that I would like to do after this is just. [00:24:27] I would like to investigate more about the use of Ruby out of what is what application. It is something I was talking like a couple of hours ago, because there I found a blog about how is, how are the conference. The Ruby conference in Japan. So it was really interesting. It was, an article that is really old, but it got my attention because I never thought about it because I came from a boot camp and it was like a. [00:24:59] There is something else. So I could see a couple of talks about talking about, for example, Rack. [00:25:06] Mayra: So I will like how, oh, for example, we have another talk about how to create desktop applications with Ruby. So that is something that I would like to investigate. I would like to try also with the Ruby Peru community. [00:25:20] We decided to choose to investigate more about it and prepare talks about it in Spanish, because that is the mission. Our vision of our community is to create content in Spanish. [00:25:34] And also I was planning to give a lightning talk, but I wasn't ready yet to do it because I was nervous about, because I applied for jobs or things like that, I hadn't the time to prepare that, but actually I would like. I dunno if you heard about Dave Kimura and Drifting Ruby? [00:25:52] Jeremy: Uh, yes. Yes. They, they do the videos, right. [00:25:54] Mayra: He has a, blog on how to implement some kind of, uh, when your test test fails, you saw the light can change to red or green based on that The test that you are running. So it was really interesting. It isn't related to rails, but it, it is based on ruby so it's like, wow, I want to learn how to do that. Woman and non-binary community (WNB-rb.dev) [00:26:17] Jeremy: Anything else you want to mention or think we should have talked about? [00:26:22] Mayra: Yeah, because I am a member of the Women and Non-binary community. So if you are a woman or a non-binary person, you're invited to our community, we are open to, to you and we have meetups monthly. Uh, we have book clubs and we are always open to new ideas to share, to help you to do. that's it, I think. Yes. [00:26:47] Jeremy: And where can they find you if they're interested in that? [00:26:51] Mayra: Uh, we have a webpage, [00:26:53] wnb-rb.dev That is dev in English, I think. [00:27:00] Yes. And there you can find us. And also there's a form, where you can give us your, your info. It won't be shared only. No, it won't be shared only for the organizers. And that's all. We keep your privacy there. [00:27:17] Jeremy: Very cool. So [00:27:19] wnb-rb.dev [00:27:23] Mayra: yes, it is. [00:27:26] Jeremy: Well, Myra thank you so much for spending time to talk with me today. [00:27:30] Mayra: Thank you and sorry for my English. Ha ha ha ha [00:27:34] Jeremy: Your English is good, your English is much better than my Spanish. [00:27:38] Mayra: Okay.
undefined
Nov 21, 2023 • 51min

Mike Perham on Keeping it solo (RubyConf 2023)

Mike Perham is the creator of Sidekiq, a background job processor for Ruby. He's also the creator of Faktory a similar product for multiple language environments. We talk about the RubyConf keynote and Ruby's limitations, supporting products as a solo developer, and some ideas for funding open source like a public utility. Recorded at RubyConf 2023 in San Diego. -- A few topics covered: Sidekiq (Ruby) vs Faktory (Polyglot) Why background job solutions are so common in Ruby Global Interpreter Lock (GIL) Ractors (Actor concurrency) Downsides of Multiprocess applications When to use other languages Getting people to pay for Sidekiq Keeping a solo business Being selective about customers Ways to keep support needs low Open source as a public utility Mike Mike's blog mastodon Sidekiq faktory From Employment to Independence Ruby Ractor The Practical Effects of the GVL on Scaling in Ruby Transcript You can help correct transcripts on GitHub. Introduction [00:00:00] Jeremy: I'm here at RubyConf San Diego with Mike Perham. He's the creator of Sidekiq and Faktory. [00:00:07] Mike: Thank you, Jeremy, for having me here. It's a pleasure. Sidekiq [00:00:11] Jeremy: So for people who aren't familiar with, I guess we'll start with Sidekiq because I think that's what you're most known for. If people don't know what it is, maybe you can give like a small little explanation. [00:00:22] Mike: Ruby apps generally have two major pieces of infrastructure powering them. You've got your app server, which serves your webpages and the browser. And then you generally have something off on the side that... It processes, you know, data for a million different reasons, and that's generally called a background job framework, and that's what Sidekiq is. [00:00:41] It, Rails is usually the thing that, that handles your web stuff, and then Sidekiq is the Sidekiq to Rails, so to speak. [00:00:50] Jeremy: And so this would fit the same role as, I think in Python, there's celery. and then in the Ruby world, I guess there is, uh, Resque is another kind of job. [00:01:02] Mike: Yeah, background job frameworks are quite prolific in Ruby. the Ruby community's kind of settled on that as the, the standard pattern for application development. So yeah, we've got, a half a dozen to a dozen different, different examples throughout history, but the major ones today are, Sidekiq, Resque, DelayedJob, GoodJob, and, and, and others down the line, yeah. Why background jobs are so common in Ruby [00:01:25] Jeremy: I think working in other languages, you mentioned how in Ruby, there's this very clear, preference to use these job scheduling systems, these job queuing systems, and I'm not. I'm not sure if that's as true in, say, if somebody's working in Java, or C sharp, or whatnot. And I wonder if there's something specific about Ruby that makes people kind of gravitate towards this as the default thing they would use. [00:01:52] Mike: That's a good question. What makes Ruby... The one that so needs a background job system. I think Ruby, has historically been very single threaded. And so, every Ruby process can only do so much work. And so Ruby oftentimes does, uh, spin up a lot of different processes, and so having processes that are more focused on one thing is, is, is more standard. [00:02:24] So you'll have your application server processes, which focus on just serving HTTP responses. And then you have some other sort of focused process and that just became background job processes. but yeah, I haven't really thought of it all that much. But, uh, you know, something like Java, for instance, heavily multi threaded. [00:02:45] And so, and extremely heavyweight in terms of memory and startup time. So it's much more frequent in Java that you just start up one process and that's it. Right, you just do everything in that one process. And so you may have dozens and dozens of threads, both serving HTTP and doing work on the side too. Um, whereas in Ruby that just kind of naturally, there was a natural split there. Global Interpreter Lock [00:03:10] Jeremy: So that's actually a really good insight, because... in the keynote at RubyConf, Mats, the creator of Ruby, you know, he mentioned the, how the fact that there is this global, interpreter lock, [00:03:23] or, or global VM lock in Ruby, and so you can't, really do multiple things in parallel and make use of all the different cores. And so it makes a lot of sense why you would say like, okay, I need to spin up separate processes so that I can actually take advantage of, of my, system. [00:03:43] Mike: Right. Yeah. And the, um, the GVL. is the acronym we use in the Ruby community, or GIL. Uh, that global lock really kind of is a forcing function for much of the application architecture in Ruby. Ruby, uh, applications because it does limit how much processing a single Ruby process can do. So, uh, even though Sidekiq is heavily multi threaded, you can only have so many threads executing. [00:04:14] Because they all have to share one core because of that global lock. So unfortunately, that's, that's been, um, one of the limiter, limiting factors to Sidekiq scalability is that, that lock and boy, I would pay a lot of money to just have that lock go away, but. You know, Python is going through a very long term experiment about trying to remove that lock and I'm very curious to see how well that goes because I would love to see Ruby do the same and we'll see what happens in the future, but, it's always frustrating when I come to another RubyConf and I hear another Matt's keynote where he's asked about the GIL and he continues to say, well, the GIL is going to be around, as long as I can tell. [00:04:57] so it's a little bit frustrating, but. It's, it's just what you have to deal with. Ractors [00:05:02] Jeremy: I'm not too familiar with them, but they, they did mention during the keynote I think there Ractors or something like that. There, there, there's some way of being able to get around the GIL but there are these constraints on them. And in the context of Sidekiq and, and maybe Ruby in general, how do you feel about those options or those solutions? [00:05:22] Mike: Yeah, so, I think it was Ruby 3. 2 that introduced this concept of what they call a Ractor, which is like a thread, except it does not have the global lock. It can run independent to the global lock. The problem is, is because it doesn't use the global lock, it has pretty severe constraints on what it can do. [00:05:47] And the, and more specifically, the data it can access. So, Ruby apps and Rails apps throughout history have traditionally accessed a lot of global data, a lot of class level data, and accessed all this data in a, in a read only fashion. so there's no race conditions because no one's changing any of it, but it's still, lots of threads all accessing the same variables. [00:06:19] Well, Ractors can't do that at all. The only data Ractors can access is data that they own. And so that is completely foreign to Ruby application, traditional Ruby applications. So essentially, Ractors aren't compatible with the vast majority of existing Ruby code. So I, I, I toyed with the idea of prototyping Sidekiq and Ractors, and within about a minute or two, I just ran into these, these, uh... [00:06:51] These very severe constraints, and so that's why you don't see a lot of people using Ractors, even still, even though they've been out for a year or two now, you just don't see a lot of people using them, because they're, they're really limited, limited in what they can do. But, on the other hand, they're unlimited in how well they can scale. [00:07:12] So, we'll see, we'll see. Hopefully in the future, they'll make a lot of improvements and, uh, maybe they'll become more usable over time. Downsides of multiprocess (Memory usage) [00:07:19] Jeremy: And with the existence of a job queue or job scheduler like Sidekiq, you're able to create additional processes to get around that global lock, I suppose. What are the... downsides of doing so versus another language like we mentioned Java earlier, which is capable of having true parallelism in the same process. [00:07:47] Mike: Yeah, so you can start up multiple Ruby processes to process things truly in parallel. The issue is that you do get some duplication in terms of memory. So your Ruby app maybe take a gigabyte per process. And, you can do copy on write forking. You can fork and get some memory sharing with copy on write semantics on Unix operating systems. [00:08:21] But you may only get, let's say, 30 percent memory savings. So, there's still a significant memory overhead to forking, you know, let's say, eight processes versus having eight threads. You know, you, you, you may have, uh, eight threads can operate in a gigabyte process, but if you want to have eight processes, that may take, let's say, four gigabytes of RAM. [00:08:48] So you, you still, it's not going to cost you eight gigabytes of RAM, you know, it's not like just one times eight, but, there's still a overhead of having those separate processes. [00:08:58] Jeremy: would you say it's more of a cost restriction, like it costs you more to run these applications, or are there actual problems that you can't solve because of this restriction. [00:09:13] Mike: Help me understand, what do you mean by restriction? Do you mean just the GVL in general, or the fact that forking processes still costs memory? [00:09:22] Jeremy: I think, well, it would be both, right? So you're, you have two restrictions right now. You have the, the GVL, which means you can't have parallelism within the same process. And then your other option is to spin up a bunch of processes, which you have said is the downside there is that you're using a lot more RAM. [00:09:43] I suppose my question is that Does that actually stop you from doing anything? Like, if you throw more money at the problem, you go like, we're going to have more instances, I'll pay for the RAM, it's fine, can that basically get you out of these situations or are these limitations actually stopping you from, from doing things you could do in other languages? [00:10:04] Mike: Well, you certainly have to manage the multiple processes, right? So you've gotta, you know, if one child process crashes, you've gotta have a parent or supervisor process watching all that and monitoring and restarting the process. I don't think it restricts you. Necessarily, it just, it adds complexity to your deployment. [00:10:24] and, and it's just a question of efficiency, right? Instead of being able to deploy on a, on a one gigabyte droplet, I've got to deploy to a four gigabyte droplet, right? Because I just, I need the RAM to run the eight processes. So it, it, it's more of just a purely a function of how much money am I going to have to throw at this problem. [00:10:45] And what's it going to cost me in operational costs to operate this application in production? When to use other languages? [00:10:53] Jeremy: So during the. Keynote, uh, Matz had mentioned that Rails, is really suitable as this one person framework, like you can have a very small team or maybe even yourself and, and build this product. And so I guess from... Your perspective, once you cross a certain threshold, is like, what Ruby and what Sidekiq provides not enough, and that's why you need to start looking into other languages? [00:11:24] Or like, where's the, turning point, or the, if you [00:11:29] Mike: Right, right. The, it's all about the problem you're trying to solve, right? At the end of the day, uh, the, the question is just what are we trying to solve and how are we trying to solve it? So at a higher level, you got to think about the architecture. if the problem you're trying to solve, if the service you're trying to build, if the app you're trying to operate. [00:11:51] If that doesn't really fall into the traditional Ruby application architecture, then you, you might look at it in another language or another ecosystem. something like Go, for instance, can compile down to a single binary, which makes deployment really easy. It makes shipping up a product. on to a user's machine, much simpler than deploying a Ruby application onto a user's desktop machine, for instance, right? [00:12:22] Um, Ruby does have this, this problem of how do you package everything together and deploy it somewhere? Whereas Go, when you can just compile to a single binary, now you've just got a single thing. And it's just... Drop it on the file system and execute it. It's easy. So, um, different, different ecosystems have different application architectures, which empower different ways of solving the same problems. [00:12:48] But, you know, Rails as a, as a one man framework, or sorry, one person framework, It, it, I don't, I don't necessarily, that's a, that's sort of a catchy marketing slogan, but I just think of Rails as the most productive framework you can use. So you, as a single person, you can maximize what you ship and the, the, the value that you can create because Rails is so productive. [00:13:13] Jeremy: So it, seems like it's maybe the, the domain or the type of application you're making. Like you mentioned the command line application, because you want to be able to deliver it to your user easily. Just give them a binary, something like Go or perhaps Rust makes a lot more sense. and then I could see people saying that if you're doing something with machine learning, like the community behind Python, it's, they're just, they're all there. [00:13:41] So Room for more domains in Ruby [00:13:41] Mike: That was exactly the example I was going to use also. Yeah, if you're doing something with data or AI, Python is going to be a more, a more traditional, natural choice. that doesn't mean Ruby can't do it. That doesn't mean, you wouldn't be able to solve the problem with Ruby. And, and there's, that just also means that there's more space for someone who wants to come in and make an impact in the Ruby community. [00:14:03] Find a problem that Ruby's not really well suited to solving right now and build the tooling out there to, to try and solve it. You know, I, I saw a talk, from the fellow who makes the Glimmer gem, which is a native UI toolkit. Uh, a gem for building native UIs in Ruby, which Ruby traditionally can't do, but he's, he's done an amazing job at sort of surfacing APIs to build these, um, these native, uh, native applications, which I think is great. [00:14:32] It's awesome. It's, it's so invigorating to see Ruby in a new space like that. Um, I talked to someone else who's doing the Polars gem, which is focused on data processing. So it kind of takes, um, Python and Pandas and brings that to Ruby, which is, is awesome because if you're a Ruby developer, now you've got all these additional tools which can allow you to solve new sets of problems out there. [00:14:57] So that's, that's kind of what's exciting in the Ruby community right now is just bring it into new spaces. Faktory [00:15:03] Jeremy: In addition to Sidekiq, you have, uh, another product called Faktory, I believe. And so does that serve a, a similar purpose? Is that another job scheduling, job queueing system? [00:15:16] Mike: It is, yes. And it's, it's, it's similar in a way to Sidekiq. It looks similar. It's got similar concepts at the core of it. At the end of the day, Sidekiq is limited to Ruby. Because Sidekiq executes in a Ruby VM, it executes the jobs, and the jobs are, have to be written in Ruby because you're running in the Ruby VM. [00:15:38] Faktory was my attempt to bring, Sidekiq functionality to every other language. I wanted, I wanted Sidekiq for JavaScript. I wanted Sidekiq for Go. I wanted Sidekiq for Python because A, a lot of these other languages also could use a system, a background job system. And the problem though is that. [00:16:04] As a single man, I can't port Sidekiq to every other language. I don't know all the languages, right? So, Faktory kind of changes the architecture and, um, allows you to execute jobs in any language. it, it replaces Redis and provides a server where you just fetch jobs, and you can use it from it. [00:16:26] You can use that protocol from any language to, to build your own worker processes that execute jobs in whatever language you want. [00:16:35] Jeremy: When you say it replaces Redis, so it doesn't use Redis, um, internally, it has its own. [00:16:41] Mike: It does use Redis under the covers. Yeah, it starts Redis as a child process and, connects to it over a Unix socket. And so it's really stable. It's really fast. from the outside, the, the worker processes, they just talk to Faktory. They don't know anything about Redis at all. [00:16:59] Jeremy: I see. And for someone who, like we mentioned earlier in the Python community, for example, there is, um, Celery. For someone who is using a task scheduler like that, what's the incentive to switch or use something different? [00:17:17] Mike: Well, I, I always say if you're using something right now, I'm not going to try and convince you to switch necessarily. It's when you have pain that you want to switch and move away. Maybe you have Maybe there's capabilities in the newer system that you really need that the old system doesn't provide, but Celery is such a widely known system that I'm not necessarily going to try and convince people to move away from it, but if people are looking for a new system, one of the things that Celery does that Faktory does not do is Celery provides like data adapters for using store, lots of different storage systems, right? [00:17:55] Faktory doesn't do that. Faktory is more, has more of the Rails mantra of, you know, Omakase where we choose, I choose to use Redis and that's it. You don't, you don't have a choice for what to use because who cares, you know, at the end of the day, let Faktory deal with it. it's, it's not something that, You should even necessarily be concerned about it. [00:18:17] Just, just try Faktory out and see how it works for you. Um, so I, I try to take those operational concerns off the table and just have the user focus on, you know, usability, performance, and that sort of thing. but it is, it's, it's another background job system out there for people to try out and see if they like that. [00:18:36] And, and if they want to, um, if they know Celery and they want to use Celery, more power to Faktory them. Sidekiq (Ruby) or Faktory (Polyglot) [00:18:43] Jeremy: And Sidekiq and Faktory, they serve a very similar purpose. For someone who they have a new project, they haven't chosen a job. scheduling system, if they were using Ruby, would it ever make sense for them to use Faktory versus use Sidekiq? [00:19:05] Mike: Uh Faktory is excellent in a polyglot situation. So if you're using multiple languages, if you're creating jobs in Ruby, but you're executing them in Python, for instance, um, you know, if you've, I have people who are, Creating jobs in PHP and executing them in Python, for instance. That kind of polyglot scenario, Sidekiq can't do that at all. [00:19:31] So, Faktory is useful there. In terms of Ruby, Ruby is just another language to Faktory. So, there is a Ruby API for using Faktory, and you can create and execute Ruby jobs with Faktory. But, you'll find that in the Ruby community, Sidekiq is much widely... much more widely used and understood and known. So if you're just using Ruby, I think, I think Sidekiq is the right choice. [00:19:59] I wouldn't look at Faktory. But if you do need, find yourself needing that polyglot tool, then Faktory is there. Temporal [00:20:07] Jeremy: And this is maybe one, maybe one layer of abstraction higher, but there's a product called Temporal that has some of this job scheduling, but also this workflow component. I wonder if you've tried that out and how you think about that product? [00:20:25] Mike: I've heard of them. I don't know a lot about the product. I do have a workflow API, the Sidekiq batches, which allow you to fan out jobs and then, and then execute callbacks when all the jobs in that, in that batch are done. But I don't, provide sort of a, a high level. Graphical Workflow Editor or anything like that. [00:20:50] Those to me are more marketing tools that you use to sell the tool for six figures. And I don't think they're usable. And I don't think they're actually used day to day. I provide an API for developers to use. And developers don't like moving blocks of code around in a GUI. They want to write code. And, um, so yeah, temporal, I, like I said, I don't know much about them. [00:21:19] I also, are they a venture capital backed startup? [00:21:22] Jeremy: They are, is my understanding, [00:21:24] Mike: Yeah, that, uh, any, any sort of venture capital backed startup, um, who's building technical infrastructure. I, I would look long and hard at, I'm, I think open source is the right core to build on. Of course I sell commercial software, but. I'm bootstrapped. I'm profitable. [00:21:46] I'm going to be around forever. A VC backed startup, they tend to go bankrupt, because they either get big or they go out of business. So that would be my only comment is, is, be a little bit leery about relying on commercial venture capital based infrastructure for, for companies, uh, long term. Getting people to pay for Sidekiq [00:22:05] Jeremy: So I think that's a really interesting part about your business is that I think a lot of open source maintainers have a really big challenge figuring out how to make it as a living. The, there are so many projects that they all have a very permissive license and you can use them freely one example I can think of is, I, I talked with, uh, David Kramer, who's the CTO at Sentry, and he, I don't think they use it anymore, but they, they were using Nginx, right? [00:22:39] And he's like, well, Nginx, they have a paid product, like Nginx. Plus that or something. I don't know what the name is, but he was like, but I'm not going to pay for it. Right. I'm just going to use the free one. Why would I, you know, pay for the, um, the paid thing? So I, I, I'm kind of curious from your perspective when you were coming up with Sidekiq both as an open source product, but also as a commercial one, how did you make that determination of like to make a product where it's going to be useful in its open source form? [00:23:15] I can still convince people to pay money for it. [00:23:19] Mike: Yeah, the, I was terrified, to be blunt, when I first started out. when I started the Sidekiq project, I knew it was going to take a lot of time. I knew if it was successful, I was going to be doing it for the next decade. Right? So I started in 2012, and here I am in 2023, over a decade, and I'm still doing it. [00:23:38] So my expectation was met in that regard. And I knew I was not going to be able to last that long. If I was making zero dollars, right? You just, you burn out. Nobody can last that long. Well, I guess there are a few exceptions to that rule, but yeah, money, I tend to think makes things a little more sustainable for sure. [00:23:58] Especially if you can turn it into a full time job solving and supporting a project that you, you love and, and is, is, you know, your, your, your baby, your child, so to speak, your software, uh, uh, creation that you've given to the world. but I was terrified. but one thing I did was at the time I was blogging a lot. [00:24:22] And so I was telling people about Sidekiq. I was telling people what was to come. I was talking about ideas and. The one thing that I blogged about was financial experiments. I said bluntly to the, to, to the Ruby community, I'm going to be experimenting with financial stability and sustainability with this project. [00:24:42] So not only did I create this open source project, but I was also publicly saying I I need to figure out how to make this work for the next decade. And so eventually that led to Sidekiq Pro. And I had to figure out how to build a closed source Ruby gem, which, uh, There's not a lot of, so I was kind of in the wild there. [00:25:11] But, you know, thankfully all the pieces came together and it was actually possible. I couldn't have done it if it wasn't possible. Like, we would not be talking if I couldn't make a private gem. So, um, but it happened to work out. Uh, and it allowed me to, to gate features behind a paywall effectively. And, and yeah, you're right. [00:25:33] It can be tough to make people pay for software. but I'm a developer who's selling to other developers, not, not just developers, open source developers, and they know that they have this financial problem, right? They know that there's this sustainability problem. And I was blunt in saying, this is my solution to my sustainability. [00:25:56] So, I charge what I think is a very fair price. It's only a thousand dollars a year to a hobbyist. That may seem like a lot of money to a business. It's a drop in the bucket. So it was easy for developers to say, Hey, listen, we want to buy this tool for a thousand bucks. It'll ensure our infrastructure is maintained for the next decade. [00:26:18] And it's, and it's. And it's relatively cheap. It's way less than, uh, you know, a salary or even a laptop. So, so that's, that's what I did. And, um, it's, it worked out great. People, people really understood. Even today, I talk to people and they say, we, we signed up for Sidekiq Pro to support you. So it's, it's, it's really, um, invigorating to hear people, uh, thank me and, and they're, they're actively happy that they're paying me and our customers. [00:26:49] Jeremy: it's sort of, uh, maybe a not super common story, right, in terms of what you went through. Because when I think of open core businesses, I think of companies like, uh, GitLab, which are venture funded, uh, very different scenario there. I wonder, like, in your case, so you started in 2012, and there were probably no venture backed competitors, right? [00:27:19] People saying that we're going to make this job scheduling system and some VC is going to give me five million dollars and build a team to work on this. It was probably at the time, maybe it was Rescue, which was... [00:27:35] Mike: There was a venture backed system called IronMQ, [00:27:40] Jeremy: Hmm. [00:27:41] Mike: And I'm not sure if they're still around or not, but they... They took, uh, one or more funding rounds. I'm not sure exactly, but they were VC backed. They were doing, background jobs, scheduled jobs, uh, you know, running container, running container jobs. They, they eventually, I think, wound up sort of settling on Docker containers. [00:28:06] They'll basically spin up a Docker container. And that container can do whatever it wants. It can execute for a second and then shut down, or it can run for, for however long, but they would, um, yeah, I, yeah, I'll, I'll stop there because I don't know the actual details of exactly their system, but I'm not sure if they're still around, but that's the only one that I remember offhand that was around, you know, years ago. [00:28:32] Yeah, it's, it's mostly, you know, low level open source infrastructure. And so, anytime you have funded startups, they're generally using that open source infrastructure to build their own SaaS. And so SaaS's are the vast majority of where you see sort of, uh, commercial software. [00:28:51] Jeremy: so I guess in that way it, it, it gave you this, this window or this area where you could come in and there wasn't, other than that iron, product, there wasn't this big money that you were fighting against. It was sort of, it was you telling people openly, I'm, I'm working on this thing. [00:29:11] I need to make money so that I can sustain it. And, if you, yeah. like the work I do, then, you know, basically support me. Right. And, and so I think that, I'm wondering how we can reproduce that more often because when you see new products, a lot of times it is VC backed, right? [00:29:35] Because people say, I need to work on this. I need to be paid. and I can't ask a team to do this. For nothing, right? So [00:29:44] Mike: Yeah. It's. It's a wicked problem. Uh, it's a really, really hard problem to solve if you take vc you there, that that really kind of means that you need to be making tens if not hundreds of millions of dollars in sales. If you are building a small or relatively small. You know, put small in quotes there because I don't really know what that means, but if you have a small open source project, you can't charge huge amounts for it, right? [00:30:18] I mean, Sidekiq is a, I would call a medium sized open source project, and I'm charging a thousand bucks for it. So if you're building, you know, I don't know, I don't even want to necessarily give example, but if you're building some open source project, and It's one of 300 libraries that people's applications will depend on. [00:30:40] You can't necessarily charge a thousand dollars for that library. depending on the size and the capabilities, maybe you can, maybe you can't. But there's going to be a long tail of open source projects that just, they can't, they can't charge much, if anything, for them. So, unfortunately, we have, you know, these You kind of have two pathways. [00:31:07] Venture capital, where you've got to sell a ton, or free. And I've kind of walked that fine line where I'm a small business, I can charge a small amount because I'm bootstrapped. And, and I don't need huge amounts of money, and I, and I have a project that is of the right size to where I can charge a decent amount of money. [00:31:32] That means that I can survive with 500 or a thousand customers. I don't need to have a hundred million dollars worth of customers. Because I, you know, when I started the business, one of the constraints I said is I don't want to hire anybody. I'm just going to be solo. And part of the, part of my ability to keep a low price and, and keep running sustainably, even with just You know, only a few hundred customers is because I'm solo. [00:32:03] I don't have the overhead of investors. I don't have the overhead of other employees. I don't have an office space. You know, my overhead is very small. So that is, um, you know, I just kind of have a unique business in that way, I guess you might say. Keeping the business solo [00:32:21] Jeremy: I think that's that's interesting about your business as well But the fact that you've kept it you've kept it solo which I would imagine in most businesses, they need support people. they need, developers outside of maybe just one. Um, there's all sorts of other, I don't think overhead is the right word, but you just need more people, right? [00:32:45] And, and what do you think it is about Sidekiq that's made it possible for it to just be a one person operation? [00:32:52] Mike: There's so much administrative overhead in a business. I explicitly create business policies so that I can run solo. you know, my support policy is officially you get one email ticket or issue per quarter. And, and anything more than that, I can bounce back and say, well, you're, you're requiring too much support. [00:33:23] In reality, I don't enforce that at all. And people email me all the time, but, but things like. Things like dealing with accounting and bookkeeping and taxes and legal stuff, licensing, all that is, yeah, a little bit of overhead, but I've kept it as minimal as I can. And part of that is I don't want to hire another employee because then that increases the administrative overhead that I have. [00:33:53] And Sidekiq is so tied to me and my knowledge that if I hire somebody, they're probably not going to know Ruby and threading and all the intricate technical detail necessary to build and maintain and support the system. And so really you'll kind of regress a little bit. We won't be able to give as good support because I'm busy helping that other employee. Being selective about customers [00:34:23] Mike: So, yeah, it's, it's a tightrope act where you've got to really figure out how can I scale myself as far as possible without overwhelming myself. The, the overwhelming thing that I have that I've never been able to solve. It's just dealing with billing inquiries, customers, companies, emailing me saying, how do we buy this thing? [00:34:46] Can I get an invoice? Every company out there, it seems wants an invoice. And the problem with invoicing is it takes a lot more. manual labor and administrative overhead to issue that invoice to collect payment on the invoice. So that's one of the reasons why I have a very strict policy about credit card only for, for the vast majority of my customers. [00:35:11] And I demand that companies pay a lot more. You have to have a pretty big enterprise license if you want an invoice. And if the company, if the company comes back and complains and says, well, you know, that's ridiculous. We don't, we don't want to pay that much. We don't need it that much. Uh, you know, I, I say, okay, well then you have two, two things, two, uh, two things. [00:35:36] You can either pay with a credit card or you can not use Sidekiq. Like, that's, that's it. I'm, I don't need your money. I don't want the administrative overhead of dealing with your accounting department. I just want to support my, my customers and build my software. And, and so, yeah, I don't want to turn into a billing clerk. [00:35:55] So sometimes, sometimes the, the, the best thing in business that you can do is just say no. [00:36:01] Jeremy: That's very interesting because I think being a solo... Person is what probably makes that possible, right? Because if you had the additional staff, then you might say like, Well, I need to pay my staff, so we should be getting, you know, as much business as [00:36:19] Mike: Yeah. Chasing every customer you can, right. But yeah. [00:36:22] Every customer is different. I mean, I have some customers that just, they never contact me. They pay their bill really fast or right on time. And they're paying me, you know, five figures, 20, a year. And they just, it's a, God bless them because those are, are the. [00:36:40] Best customers to have and the worst customers are the ones who are paying 99 bucks a month and everything that they don't understand or whatever is a complaint. So sometimes, sometimes you, you want to, vet your customers from that perspective and say, which one of these customers are going to be good? [00:36:58] Which ones are going to be problematic? [00:37:01] Jeremy: And you're only only person... And I'm not sure how many customers you have, but [00:37:08] Mike: I have 2000 [00:37:09] Jeremy: 2000 customers. [00:37:10] Okay. [00:37:11] Mike: Yeah. [00:37:11] Jeremy: And has that been relatively stable or has there been growth [00:37:16] Mike: It's been relatively stable the last couple of years. Ruby has, has sort of plateaued. Um, it's, you don't see a lot of growth. I'm getting probably, um, 15, 20 percent growth maybe. Uh, so I'm not growing like a weed, like, you know, venture capital would want to see, but steady incremental growth is, is, uh, wonderful, especially since I do very little. [00:37:42] Sales and marketing. you know, I come to RubyConf I, I I tweet out, you know, or I, I toot out funny Mastodon Toots occasionally and, and, um, and, and put out new releases of the software. And, and that's, that's essentially my, my marketing. My marketing is just staying in front of developers and, and, and being a presence in the Ruby community. [00:38:06] But yeah, it, it's, uh. I, I, I see not a, not a huge amount of churn, but I see enough sales to, to, to stay up and keep my head above water and to keep growing, um, slowly but surely. Support needs haven't grown [00:38:20] Jeremy: And as you've had that steady growth, has the support burden not grown with it? [00:38:27] Mike: Not as much because once customers are on Sidekiq and they've got it working, then by and large, you don't hear from them all that much. There's always GitHub issues, you know, customers open GitHub issues. I love that. but yeah, by and large, the community finds bugs. and opens up issues. And so things remain relatively stable. [00:38:51] I don't get a lot of the complete newbie who has no idea what they're doing and wants me to, to tell them how to use Sidekiq that I just don't see much of that at all. Um, I have seen it before, but in that case, generally, I, I, I politely tell that person that, listen, I'm not here to educate you on the product. [00:39:14] It's there's documentation in the wiki. Uh, and there's tons of, of more Ruby, generic Ruby, uh, educational material out there. That's just not, not what I do. So, so yeah, by and large, the support burden is, is not too bad because once people are, are up and running, it's stable and, and they don't, they don't need to contact me. [00:39:36] Jeremy: I wonder too, if that's perhaps a function of the price, because if you're a. new developer or someone who's not too familiar with how to do job processing or what they want to do when you, there is the open source product, of course. but then the next step up, I believe is about a hundred dollars a month. [00:39:58] And if you're somebody who is kind of just getting started and learning how things work, you're probably not going to pay that, is my guess. And so you'll never hear from them. [00:40:11] Mike: Right, yeah, that's a good point too, is the open source version, which is what people inevitably are going to use and integrate into their app at first. Because it's open source, you're not going to email me directly, um, and when people do email me directly, Sidekiq support questions, I do, I reply literally, I'm sorry I don't respond to private email, unless you're a customer. [00:40:35] Please open a GitHub issue and, um, that I try to educate both my open source users and my commercial customers to try and stay in GitHub issues because private email is a silo, right? Private email doesn't help anybody else but them. If I can get people to go into GitHub issues, then that's a public record. [00:40:58] that people can search. Because if one person has that problem, there's probably a dozen other people that have that same problem. And then that other, those other 11 people can search and find the solution to their problem at four in the morning when I'm asleep. Right? So that's, that's what I'm trying to do is, is keep, uh, keep everything out in the open so that people can self service as much as possible. Sidekiq open source [00:41:24] Jeremy: And on the open source side, are you still primarily the main contributor? Or do you have other people that are [00:41:35] Mike: I mean, I'd say I do 90 percent of the work, which is why I don't feel guilty about keeping 100 percent of the money. A lot of open source projects, when they look for financial sustainability, they also look for how can we split this money amongst the team. And that's, that's a completely different topic that I've. [00:41:55] is another reason why I've stayed solo is if I hire an employee and I pay them 200, 000 a year as a developer, I'm meanwhile keeping all the rest of the profits of the company. And so that almost seems a little bit unfair. because we're both still working 40 hours a week, right? Why am I the one making the vast majority of the, of the profit and the money? [00:42:19] Um, so, uh, I've always, uh, that's another reason why I've stayed solo, but, but yeah, having a team of people working on something, I do get, regular commits, regular pull requests from people, fixing a bug that they found or just making a tweak that. that they saw, that they thought they could improve. [00:42:42] A little more rarely I get a significant improvement or feature, as a pull request. but Sidekiq is so stable these days that it really doesn't need a team of people maintaining it. The volume of changes necessary, I can easily keep up with that. So, I'm still doing 90 95 percent of the work. Are there other Sidekiq-like opportunities out there? [00:43:07] Jeremy: Yeah, so I think Sidekiq has sort of a unique positioning where it's the code base itself is small enough where you can maintain it yourself and you have some help, but primarily you're the main maintainer. And then you have enough customers who are willing to, to pay for the benefit it gives them on top of what the open source product provides. [00:43:36] cause it's, it's, you were talking about how. Every project people work on, they have, they could have hundreds of dependencies, right? And to ask somebody to, to pay for each of them is, is probably not ever going to happen. And so it's interesting to think about how you have things like, say, you know, OpenSSL, you know, it's a library that a whole bunch of people rely on, but nobody is going to pay a monthly fee to use it. [00:44:06] You have things like, uh, recently there was HashiCorp with Terraform, right? They, they decided to change their license because they, they wanted to get, you know, some of that value back, some of the money back, and the community basically revolted. Right? And did a fork. And so I'm kind of curious, like, yeah, where people can find these sweet spots like, like Sidekiq, where they can find this space where it's just small enough where you can work on it on your own and still get people to pay for it. [00:44:43] It's, I'm trying to picture, like, where are the spaces? Open source as a public utility [00:44:48] Mike: We need to look at other forms of financing beyond pure capitalism. If this is truly public infrastructure that needs to be maintained for the long term, then why are we, why is it that we depend on capitalism to do that? Our roads, our water, our sewer, those are not Capitalist, right? Those are utilities, that's public infrastructure that we maintain, that the government helps us maintain. [00:45:27] And in a sense, tech infrastructure is similar or could be thought of in a similar fashion. So things like Open Collective, things like, uh, there's a, there's a organization in Europe called NLNet, I think, out of the Netherlands. And they do a lot of grants to various open source projects to help them improve the state of digital infrastructure. [00:45:57] They support, for instance, Mastodon as a open source project that doesn't have any sort of corporate backing. They see that as necessary social media infrastructure, uh, for the long term. And, and I, and I think that's wonderful. I like to see those new directions being explored where you don't have to turn everything into a product, right? [00:46:27] And, and try and market and sale, um, and, and run ads and, and do all this stuff. If you can just make the case that, hey, this is, this is useful public infrastructure that so many different, um, Technical, uh, you know, applications and businesses could rely on, much like FedEx and DHL use our roads to the benefit of their own, their own corporate profits. [00:46:53] Um, why, why, why shouldn't we think of tech infrastructure sort of in a similar way? So, yeah, I would like to see us explore more. in that direction. I understand that in America that may not happen for quite a while because we are very, capitalist focused, but it's encouraging to see, um, places like Europe, uh, a little more open to, to trialing things like, cooperatives and, and grants and large long term grants to, to projects to see if they can, uh, provide sustainability in, in, you know, in a new way. [00:47:29] Jeremy: Yeah, that's a good point because I think right now, a lot of the open source infrastructure that we all rely on, either it's being paid for by large companies and at the whim of those large companies, if Google decides we don't want to pay for you to work on this project anymore, where does the money come from? [00:47:53] Right? And on the other hand, there's the thousands, tens of thousands of people who are doing it. just for free out of the, you know, the goodness of their, their heart. And that's where a lot of the burnout comes from. Right. So I think what you're saying is that perhaps a lot of these pieces that we all rely on, that our, our governments, you know, here in the United States, but also around the world should perhaps recognize as this is, like you said, this is infrastructure, and we should be. [00:48:29] Paying these people to keep the equivalent of the roads and, and, uh, all that working. [00:48:37] Mike: Yeah, I mean, I'm not, I'm not claiming that it's a perfect analogy. There's, there's, there's lots of questions that are unanswered in that, right? How do you, how do you ensure that a project is well maintained? What does that even look like? What does that mean? you know, you can look at a road and say, is it full of potholes or is it smooth as glass, right? [00:48:59] It's just perfectly obvious, but to a, to a digital project, it's, it's not as clear. So, yeah, but, but, but exploring those new ways because turning everybody into a businessman so that they can, they can keep their project going, it, it, it itself is not sustainable, right? so yeah, and that's why everything turns into a SaaS because a SaaS is easy to control. [00:49:24] It's easy to gatekeep behind a paywall and it's easy to charge for, whereas a library on GitHub. Yeah. You know, what do you do there? You know, obviously GitHub has sponsors, the sponsors feature. You've got Patreon, you've got Open Collective, you've got Tidelift. There's, there's other, you know, experiments that have been run, but nothing has risen to the top yet. [00:49:47] and it's still, it's still a bit of a grind. but yeah, we'll see, we'll see what happens, but hopefully people will keep experimenting and, and maybe, maybe governments will start. Thinking in the direction of, you know, what does it mean to have a budget for digital infrastructure maintenance? [00:50:04] Jeremy: Yeah, it's interesting because we, we started thinking about like, okay, where can we find spaces for other Sidekiqs? But it sounds like maybe, maybe that's just not realistic, right? Like maybe we need more of a... Yeah, a rethinking of, I guess the, the structure of how people get funded. Yeah. [00:50:23] Mike: Yeah, sometimes the best way to solve a problem is to think at a higher level. You know, we, the, the sustainability problem in American Silicon Valley based open source developers is naturally going to tend toward venture capital and, and capitalism. And I, you know, I think, I think that's, uh, extremely problematic on a, on a lot of different, in a lot of different ways. [00:50:47] And, and so sometimes you need to step back and say, well, maybe we're, maybe we just don't have the right tool set to solve this problem. But, you know, I, I. More than that, I'm not going to speculate on because it is a wicked problem to solve. [00:51:04] Jeremy: Is there anything else you wanted to, to mention or thought we should have talked about? [00:51:08] Mike: No, I, I, I loved the talk, of sustainability and, and open source. And I, it's, it's a, it's a topic really dear to my heart, obviously. So I, I am happy to talk about it at length with anybody, anytime. So thank you for having me. [00:51:25] Jeremy: All right. Thank you very much, Mike.
undefined
Nov 18, 2023 • 44min

Sara Jackson on Teaching in Kanazawa (RubyConf 2023)

Sara Jackson, team lead at thoughtbot, talks about her experience teaching in Kanazawa, differences between students in Japan vs the US, transitioning from Java to Ruby, LAN parties in Rochester, and her closing thoughts on RubyConf.
undefined
Nov 17, 2023 • 49min

David Copeland on Medium Sized Decisions (RubyConf 2023)

David was the chief software architect and director of engineering at Stitch Fix. He's also the author of a number of books including Sustainable Web Development with Ruby on Rails and most recently Ruby on Rails Background Jobs with Sidekiq. He talks about how he made decisions while working with a medium sized team (~200 developers) at Stitch Fix. The audio quality for the first 19 minutes is not great but the correct microphones turn on right after that. Recorded at RubyConf 2023 in San Diego. A few topics covered: Ruby's origins at Stitch Fix Thoughts on Go Choosing technology and cloud services Moving off heroku Building a platform team Where Ruby and Rails fit in today The role of books and how different people learn Large Language Model's effects on technical content Related Links David's Blog Mastodon Transcript You can help correct transcripts on GitHub. Intro [00:00:00] Jeremy: Today. I want to share another conversation from RubyConf San Diego. This time it's with David Copeland. He was a chief software architect and director of engineering at stitch fix. And at the start of the conversation, you're going to hear about why he decided to write the book, sustainable web development with Ruby on rails. Unfortunately, you're also going to notice the sound quality isn't too good. We had some technical difficulties. But once you hit the 20 minute mark of the recording, the mics are going to kick in. It's going to sound way better. So I hope you stick with it. Enjoy. Ruby at Stitch Fix [00:00:35] David: Stitch Fix was a Rails shop. I had done a lot of Rails and learned a lot of things that worked and didn't work, at least in that situation. And so I started writing them down and I was like, I should probably make this more than just a document that I keep, you know, privately on my computer. Uh, so that's, you know, kind of, kind of where the genesis of that came from and just tried to, write everything down that I thought what worked, what didn't work. Uh, if you're in a situation like me. Working on a product, with a medium sized, uh, team, then I think the lessons in there will be useful, at least some of them. Um, and I've been trying to keep it up over, over the years. I think the first version came out a couple years ago, so I've been trying to make sure it's always up to date with the latest stuff and, and Rails and based on my experience and all that. [00:01:20] Jeremy: So it's interesting that you mention, medium sized team because, during the, the keynote, just a few moments ago, Matz the creator of Ruby was talking about how like, Oh, Rails is really suitable for this, this one person team, right? Small, small team. And, uh, he was like, you're not Google. So like, don't worry about, right. Can you scale to that level? Yeah. Um, and, and I wonder like when you talk about medium size or medium scale, like what are, what are we talking? [00:01:49] David: I think probably under 200 developers, I would say. because when I left Stitch Fix, it was closing in on that number of developers. And so it becomes, you know, hard to... You can kind of know who everybody is, or at least the names sound familiar of everybody. But beyond that, it's just, it's just really hard. But a lot of it was like, I don't have experience at like a thousand developer company. I have no idea what that's like, but I definitely know that Rails can work for like... 200 ish people how you can make it work basically. yeah. [00:02:21] Jeremy: The decision to use Rails, I'm assuming that was made before you joined? [00:02:26] David: Yeah, the, um, the CTO of Stitch Fix, he had come in to clean up a mess made by contractors, as often happens. They had used Django, which is like the Python version of Rails. And he, the CTO, he was more familiar with Rails. So the first two developers he hired, also familiar with Rails. There wasn't a lot to maintain with the Django app, so they were like, let's just start fresh, fresh with Rails. yeah, but it's funny because a lot of the code in that Rails app was, like, transliterated from Python. So you could, it would, it looked like the strangest Ruby code in the world because it was basically, there was no test. So they were like, let's just write the Ruby version of this Python just so we know it works. but obviously that didn't, didn't last forever, so. [00:03:07] Jeremy: So, so what's an example of a, of a tell? Where you're looking at the code and you're like, oh, this is clearly, it came from Python. [00:03:15] David: You'd see like, very, very explicit, right? Like Python, there's a lot of like single line things. very like, this sounds like a dig, but it's very simple looking code. Like, like I don't know Python, but I was able to change this Django app. And I had to, I could look at it and you can figure out immediately how it works. Cause there's. Not much to it. There's nothing fancy. So, like, this, this Ruby code, there was nothing fancy. You'd be like, well, maybe they should have memoized that, or maybe they should have taken that into another class, or you could have done this with a hash or something like that. So there was, like, none of that. It was just, like, really basic, plain code like you would see in any beginning programming language kind of thing. Which is at least nice. You can understand it. but you probably wouldn't have written it that way at first in Ruby. Thoughts on Go [00:04:05] Jeremy: Yeah, that's, that's interesting because, uh, people sometimes talk about the Go programming language and how it looks, I don't know if simple is the right word, but it's something where you look at the code and even if you don't necessarily understand Go, it's relatively straightforward. Yeah. I wonder what your thoughts are on that being a strength versus that being, like, [00:04:25] David: Yeah, so at Stitch Fix at one point we had a pro, we were moving off of Heroku and we were going to, basically build a deployment platform using ECS on AWS. And so the deployment platform was a Rails app and we built a command line tool using Ruby. And it was fine, but it was a very complicated command line tool and it was very slow. And so one of the developers was like, I'm going to rewrite it in Go. I was like, ugh, you know, because I just was not a big fan. So he rewrote it in Go. It was a bazillion times faster. And then I was like, okay, I'm going to add, I'll add a feature to it. It was extremely easy. Like, it's just like what you said. I looked at it, like, I don't know anything about Go. I know what is happening here. I can copy and paste this and change things and make it work for what I want to do. And it did work. And it was, it was pretty easy. so there's that, I mean, aesthetically it's pretty ugly and it's, I, I. I can't really defend that as a real reason to not use it, but it is kind of gross. I did do Go, I did a small project in Go after Stitch Fix, and there's this vibe in Go about like, don't create abstractions. I don't know where I got that from, but every Go I look at, I'm like we should make an abstraction for this, but it's just not the vibe. They just don't like doing that. They like it all written out. And I see the value because you can look at the code and know what it does and you don't have to chase abstractions anywhere. But. I felt like I was copying and pasting a lot of, a lot of things. Um, so I don't know. I mean, the, the team at Stitch Fix that did this like command line app in go, they're the platform team. And so their job isn't to write like web apps all day, every day. There's kind of in and out of all kinds of things. They have to try to figure out something that they don't understand quickly to debug a problem. And so I can see the value of something like go if that's your job, right? You want to go in and see what the issue is. Figure it out and be done and you're not going to necessarily develop deep expertise and whatever that thing is that you're kind of jumping into. Day to day though, I don't know. I think it would make me kind of sad. (laughs) [00:06:18] Jeremy: So, so when you say it would make you kind of sad, I mean, what, what about it? Is it, I mean, you mentioned that there's a lot of copy and pasting, so maybe there's code duplication, but are there specific things where you're like, oh, I just don't? [00:06:31] David: Yeah, so I had done a lot of Java in my past life and it felt very much like that. Where like, like the Go library for making an HTTP call for like, I want to call some web service. It's got every feature you could ever want. Everything is tweakable. You can really, you can see why it's designed that way. To dial in some performance issue or solve some really esoteric thing. It's there. But the problem is if you just want to get an JSON, it's just like huge production. And I felt like that's all I really want to do and it's just not making it very easy. And it just felt very, very cumbersome. I think that having to declare types also is a little bit of a weird mindset because, I mean, I like to make types in Ruby, I like to make classes, but I also like to just use hashes and stuff to figure it out. And then maybe I'll make a class if I figure it out, but Go, you can't. You have to have a class, you have to have a type, you have to think all that ahead of time, and it just, I'm not used to working that way, so it felt, I mean, I guess I could get used to it, but I just didn't warm up to that sort of style of working, so it just felt like I was just kind of fighting with the vibe of the language, kind of. Yeah, [00:07:40] Jeremy: so it's more of the vibe or the feel where you're writing it and you're like this seems a little too... Explicit. I feel like I have to be too verbose. It just doesn't feel natural for me to write this. [00:07:53] David: Right, it's not optimized for what in my mind is the obvious case. And maybe that's not the obvious case for the people that write Go programs. But for me, like, I just want to like get this endpoint and get the JSON back as a map. Not any easier than any other case, right? Whereas like in Ruby, right? And you can, I think if you include net HTTP, you can just type get. And it will just return whatever that is. Like, that's amazing. It's optimized for what I think is a very common use case. So it makes me feel really productive. It makes me feel pretty good. And if that doesn't work out long term, I can always use something more complicated. But I'm not required to dig into the NetHttp library just to do what in my mind is something very simple. [00:08:37] Jeremy: Yeah, I think that's something I've noticed myself in working with Ruby. I mean, you have the standard library that's very... Comprehensive and the API surface is such that, like you said there, when you're trying to do common tasks, a lot of times they have a call you make and it kind of does the thing you expected or hoped for. [00:08:56] David: Yeah, yeah. It's kind of, I mean, it's that whole optimized for programmer happiness thing. Like it does. That is the vibe of Ruby and it seems like that is still the way things are. And, you know, I, I suppose if I had a different mindset, I mean, because I work with developers who did not like using Ruby or Rails. They loved using Go or Java. And I, I guess there's probably some psychological analysis we could do about their background and history and mindset that makes that make sense. But, to me, I don't know. It's, it's nice when it's pleasant. And Ruby seems pleasant. (laughs) Choosing Technology [00:09:27] Jeremy: as a... Software Architect, or as a CTO, when, when you're choosing technology, what are some of the things you look at in terms of, you know? [00:09:38] David: Yeah, I mean, I think, like, it's a weird criteria, but I think what is something that the team is capable of executing with? Because, like, most, right, most programming languages all kind of do the same thing. Like, you can kind of get most stuff done in most common popular programming languages. So, it's probably not... It's not true that if you pick the wrong language, you can't build the app. Like, that's probably not really the case. At least for like a web app or something. so it's more like, what is the team that's here to do it? What are they comfortable and capable of doing? I worked on a project with... It was a mix of like junior engineers who knew JavaScript, and then some senior engineers from Google. And for whatever reason someone had chosen a Rails app and none of them were comfortable or really yet competent with doing Ruby on Rails and they just all hated it and like it didn't work very well. Um, and so even though, yes, Rails is a good choice for doing stuff for that team at that moment. Not a good choice. Right. So I think you have to go in and like, what, what are we going to be able to execute on so that when the business wants us to do something, we just do it. And we don't complain and we don't say, Oh, well we can't because this technology that we chose, blah, blah, blah. Like you don't ever want to say that if possible. So I think that's. That's kind of the, the top thing. I think second would be how widely supported is it? Like you don't want to be the cutting edge user that's finding all the bugs in something really. Like you want to use something that's stable. Postgres, MySQL, like those work, those are fine. The bugs have been sorted out for most common use cases. Some super fancy edge database, I don't know if I'd want to be doing, doing that you know? Choosing cloud services [00:11:15] Jeremy: How do you feel about the cloud specific services and databases? Like are you comfortable saying like, oh, I'm going to use... Google Cloud, BigQuery. Yeah. [00:11:27] David: That sort of thing. I think it would kind of fall under the same criteria that I was just, just saying like, so with AWS it's interesting 'cause when we moved from Heroku to AWS by EC2 RDS, their database thing, uh, S3, those have been around for years, probably those are gonna work, but they always introduce new things. Like we, we use RabbitMQ and AWS came out with. Some, I forget what it was, it was a queuing service similar to Rabbit. We were like, Oh, maybe we should switch to that. But it was clear that they weren't really ready to support it. So. Yeah, so we didn't, we didn't switch to that. So I, you gotta try to read the tea leaves of the provider to see are they committed to, to supporting this thing or is this there to get some enterprise client to move into the cloud. And then the idea is to move off of that transitional thing into what they do support. And it's hard to get a clear answer from them too. So it takes a little bit of research to figure out, Are they going to support this or not? Because that's what you don't want. To move everything into some very proprietary cloud system and have them sunset it and say, Oh yeah, now you've got to switch again. Uh, that kind of sucks. So, it's a little trickier. [00:12:41] Jeremy: And what kind of questions or research do you do? Is it purely a function of this thing has existed for X number of years so I feel okay? [00:12:52] David: I mean, it's kind of similar to looking at like some gem you're going to add to your project, right? So you'll, you'll look at how often does it change? Is it being updated? Uh, what is the documentation? Does it look like someone really cared about the documentation? Does the documentation look updated? Are there issues with it that are being addressed or, or not? Um, so those are good signals. I think, talking to other practitioners too can be good. Like if you've got someone who's experienced. You can say, hey, do you know anybody back channeling through, like, everybody knows somebody that works at AWS, you can probably try to get something there. at Stitch Fix, we had an enterprise support contract, and so your account manager will sometimes give you good information if you ask. Again, it's a, they're not going to come out and say, don't use this product that we have, but they might communicate that in a subtle way. So you have to triangulate from all these sources to try to. to try to figure out what, what you want to do. [00:13:50] Jeremy: Yeah, it kind of makes me wish that there was a, a site like, maybe not quite like, can I use, right? Can I use, you can see like, oh, can I use this in my browser? Is there, uh, like an AWS or a Google Cloud? Can I trust this? Can I trust this? Yeah. Is this, is this solid or not? [00:14:04] David: Right, totally. It's like, there's that, that site where you, it has all the Apple products and it says whether or not you should buy it because one may or may not be coming out or they may be getting rid of it. Like, yeah, that would... For cloud services, that would be, that would be nice. [00:14:16] Jeremy: Yeah, yeah. That's like the Mac Buyer's Guide. And then we, we need the, uh, the technology. Yeah. Maybe not buyers. Cloud Provider Buyer's Guide, yeah. I guess we are buyers. [00:14:25] David: Yeah, yeah, totally, totally. [00:14:27] Jeremy: it's interesting that you, you mentioned how you want to see that, okay, this thing is mature. I think it's going to stick around because, I, interviewed, someone who worked on, I believe it was the CloudWatch team. Okay. Daniel Vassalo, yeah. so he left AWS, uh, after I think about 10 years, and then he wrote a book called, uh, The Good Parts of AWS. Oh! And, if you read his book, most of the services he says to use are the ones that are, like, old. Yeah. He's, he's basically saying, like, S3, you know you're good. Yeah. Right? but then all these, if you look at the AWS webpage, they have who knows, I don't know how many hundreds of services. Yeah. He's, he's kind of like I worked there and I would not use, you know, all these new services. 'cause I myself, I don't trust [00:15:14] David: it yet. Right. And so, and they're working there? Yeah, they're working there. Yeah. No. One of the VPs at Stitch Fix had worked on Google Cloud and so when we were doing this transition from Heroku, he was like, we are not using Google Cloud. I was like, really? He's like AWS is far ahead of the game. Do not use Google Cloud. I was like, all right, I don't need any more info. You work there. You said don't. I'm gonna believe you. So [00:15:36] Jeremy: what, what was his did he have like a core point? [00:15:39] David: Um, so he never really had anything bad to say about Google per se. Like I think he enjoyed his time there and I think he thought highly of who he worked with and what he worked on and that sort of thing. But his, where he was coming from was like AWS was so far ahead. of Google on anything that we would use, he was like, there's, there's really no advantage to, to doing it. AWS is a known quantity, right? it's probably still the case. It's like, you know, you've heard the nobody ever got fired for using IBM or using Microsoft or whatever the thing is. Like, I think that's, that was kind of the vibe. And he was like, moving all of our infrastructure right before we're going to go public. This is a serious business. We should just use something that we know will work. And he was like, I know this will work. I'm not confident about. Google, uh, for our use case. So we shouldn't, we shouldn't risk it. So I was like, okay, I trust you because I didn't know anything about any of that stuff at the time. I knew Heroku and that was it. So, yeah. [00:16:34] Jeremy: I don't know if it's good or bad, but like you said, AWS seems to be the default choice. Yeah. And I mean, there's people who use Azure. I assume it's mostly primarily Microsoft. Yeah. And then there's Google Cloud. It's not really clear why you would pick it, unless there was a specific service or something that only they had. [00:16:55] David: Yeah, yeah. Or you're invested in Google, you know, you want to keep everything there. I mean, I don't know. I haven't really been at that level to make that kind of decision, and I would probably choose AWS for the reasons discussed, but, yeah. Moving off Heroku [00:17:10] Jeremy: And then, so at Stitch Fix, you said you moved off of Heroku [00:17:16] David: yeah. Yeah, so we were heavy into Heroku. I think that we were told that at one point we had the biggest Heroku Postgres database on their platform. Not a good place to be, right? You never want to be the biggest customer person, usually. but the problem we were facing was essentially we were going to go public. And to do that, you're under all the scrutiny. about many things, including the IT systems and the security around there. So, like, by default, a Postgres, a Heroku Postgres database is, like, on the internet. It's only secured by the password. all their services are on the internet. So, not, not ideal. they were developing their private cloud service at that time. And so that would have given us, in theory, on paper, it would have solved all of our problems. And we liked Heroku and we liked the developer experience. It was great. but... Heroku private spaces, it was still early. There's a lot of limitations that when they explained why those limitations, they were reasonable. And if we had. started from scratch on Heroku Private Spaces. It probably would have worked great, but we hadn't. So we just couldn't make it work. So we were like, okay, we're going to have to move to AWS so that everything can be basically off the internet. Like our public website needs to be on the internet and that's kind of it. So we need to, so that's basically was the, was the impetus for that. but it's too bad because I love Heroku. It was great. I mean, they were, they were a great partner. They were great. I think if Stitch Fix had started life a year later, Private Spaces. Now it's, it's, it's way different than it was then. Cause it's been, it's a mature product now, so we could have easily done that, but you know, the timing didn't work out, unfortunately. [00:18:50] Jeremy: And that was a compliance thing to, [00:18:53] David: Yeah. And compliance is weird cause they don't tell you what to do, but they give you some parameters that you need to meet. And so one of them is like how you control access. So, so going public, the compliance is around the financial data and. Ensuring that the financial data is accurate. So a lot of the systems at Stichfix were storing the financial data. We, you know, the warehouse management system was custom made. Uh, all the credit card processing was all done, like it was all in some databases that we had running in Heroku. And so those needed to be subject to stricter security than we could achieve with just a single password that we just had to remember to rotate when someone like left the team. So that was, you know, the kind of, the kind of impetus for, for all of that. [00:19:35] Jeremy: when you were using Heroku, Salesforce would have already owned it then. Did you, did you get any sense that you weren't really sure about the future of the platform while you're on it or, [00:19:45] David: At that time, no, it seemed like they were still innovating. So like, Heroku has a Redis product now. They didn't at the time we wish that they did. They told us they're working on it, but it wasn't ready. We didn't like using the third parties. Kafka was not a thing. We very much were interested in that. We would have totally used it if it was there. So they were still. Like doing bigger innovations then, then it seems like they are now. I don't know. It's weird. Like they're still there. They still make money, I assume for Salesforce. So it doesn't feel like they're going away, but they're not innovating at the pace that they were kind of back in the day. [00:20:20] Jeremy: it used to feel like when somebody's asking, I want to host a Rails app. Then you would say like, well, use Heroku because it's basically the easiest to get started. It's a known quantity and it's, it's expensive, but, it seemed for, for most people, it was worth it. and then now if I talk to people, it's like. Not what people suggest anymore. [00:20:40] David: Yeah, because there's, there's actual competitors. It's crazy to me that there was no competitors for years, and now there's like, Render and Fly. io seem to be the two popular alternatives. Um, I doubt they're any cheaper, honestly, but... You get a sense, right, that they're still innovating, still building those platforms, and they can build with, you know, all of the knowledge of what has come before them, and do things differently that might, that might help. So, I still use Heroku for personal things just because I know it, and I, you know, sometimes you don't feel like learning a new thing when you just want to get something done, but, yeah, I, I don't know if we were starting again, I don't know, maybe I'd look into those things. They, they seem like they're getting pretty mature and. Heroku's resting on its laurels, still. [00:21:26] Jeremy: I guess I never quite the mindset, right? Where you You have a platform that's doing really well and people really like it and you acquire it and then it just It seems like you would want to keep it rolling, right? (laughs) [00:21:38] David: Yeah, it's, it is wild, I mean, I guess... Why did you, what was Salesforce thinking they were going to get? Uh, who knows maybe the person at Salesforce that really wanted to purchase it isn't there. And so no one at Salesforce cares about it. I mean, there's all these weird company politics that like, who knows what's going on and you could speculate. all day. What's interesting is like, there's definitely some people in the Ruby community who work there and still are working there. And that's like a little bit of a canary for me. I'm like, all right, well, if that person's still working there, that person seems like they're on the level and, and, and, and seems pretty good. They're still working there. It, it's gotta be still a cool place to be or still doing something, something good. But, yeah, I don't know. I would, I would love to know what was going on in all the Salesforce meetings about acquiring that, how to manage it. What are their plans for it? I would love to know that stuff. [00:22:29] Jeremy: maybe you had some experience with this at Stitch Fix But I've heard with Heroku some of their support staff at least in the past they would, to some extent, actually help you troubleshoot, like, what's going on with your app. Like, if your app is, like, using a whole bunch of memory, and you're out of memory, um, they would actually kind of look into that, for you, which is interesting, because it's like, that's almost like a services thing than it is just a platform. [00:22:50] David: Yeah. I mean, they, their support, you would get, you would get escalated to like an engineer sometimes, like who worked on that stuff and they would help figure out what the problem was. Like you got the sense that everybody there really wanted the platform to be good and that they were all sort of motivated to make sure that everybody. You know, did well and used the platform. And they also were good at, like a thing that trips everybody up about Heroku is that your app restarts every day. And if you don't know anything about anything, you might think that is stupid. Why, why would I want that? That's annoying. And I definitely went through that and I complained to them a lot. And I'm like, if you only could not restart. And they very patiently and politely explained to me why that it needed to do that, they weren't going to remove that, and how to think about my app given that reality, right? Which is great because like, what company does that, right? From the engineers that are working on it, like No, nobody does that. So, yeah, no, I haven't escalated anything to support at Heroku in quite some time, so I don't know if it's still like that. I hope it is, but I'm not really, not really sure. Building a platform team [00:23:55] Jeremy: Yeah, that, uh, that reminds me a little bit of, I think it's Rackspace? There's, there's, like, another hosting provider that was pretty popular before, and they... Used to be famous for that type of support, where like your, your app's having issues and somebody's actually, uh, SSHing into your box and trying to figure out like, okay, what's going on? which if, if that's happening, then I, I can totally see where the, the price is justified. But if the support is kind of like dropping off to where it's just, they don't do that kind of thing, then yeah, I can see why it's not so much of a, yeah, [00:24:27] David: We used to think of Heroku as like they were the platform team before we had our own platform team and they, they acted like it, which was great. [00:24:35] Jeremy: Yeah, I don't have, um, experience with, render, but I, I, I did, talk to someone from there, and it does seem like they're, they're trying to fill that role, um, so, yeah, hopefully, they and, and other companies, I guess like Vercel and things like that, um, they're, they're all trying to fill that space, [00:24:55] David: Yeah, cause, cause building our own internal platform, I mean it was the right thing to do, but it's, it's a, you can't just, you have to have a team on it, it's complicated, getting all the stuff in AWS to work the way you want it to work, to have it be kind of like Heroku, like it's not trivial. if I'm a one person company, I don't want to be messing around with that particularly. I want to just have it, you know, push it up and have it go and I'm willing to pay for that. So it seems logical that there would be competitors in that space. I'm glad there are. Hopefully that'll light a fire under, under everybody. [00:25:26] Jeremy: so in your case, it sounds like you moved to having your own platform team and stuff like that, uh, partly because of the compliance thing where you're like, we need our, we need to be isolated from the internet. We're going to go to AWS. If you didn't have that requirement, do you still think like that would have been the time to, to have your own platform team and manage that all yourself? [00:25:46] David: I don't know. We, we were thinking an issue that we were running into when we got bigger, um, was that, I mean, Heroku, it, It's obviously not as flexible as AWS, but it is still very flexible. And so we had a lot of internal documentation about this is how you use Heroku to do X, Y, and Z. This is how you set up a Stitch Fix app for Heroku. Like there was just the way that we wanted it to be used to sort of. Just make it all manageable. And so we were considering having a team spun up to sort of add some tooling around that to sort of make that a little bit easier for everybody. So I think there may have been something around there. I don't know if it would have been called a platform team. Maybe we call, we thought about calling it like developer happiness or because you got developer experience or something. We, we probably would have had something there, but. I do wonder how easy it would have been to fund that team with developers if we hadn't had these sort of business constraints around there. yeah, um, I don't know. You get to a certain size, you need some kind of manageability and consistency no matter what you're using underneath. So you've got to have, somebody has to own it to make sure that it's, that it's happening. [00:26:50] Jeremy: So even at your, your architect level, you still think it would have been a challenge to, to. Come to the executive team and go like, I need funding to build this team. [00:27:00] David: You know, certainly it's a challenge because everybody, you know, right? Nobody wants to put developers in anything, right? There are, there are a commodity and I mean, that is kind of the job of like, you know, the staff engineer or the architect at a company is you don't have, you don't have the power to put anybody on anything you, you have the power to Schedule a meeting with a VP or the CTO and they will listen to you. And that's basically, you've got to use that power to convince them of what you want done. And they're all reasonable people, but they're balancing 20 other priorities. So it would, I would have had to, it would have been a harder case to make that, Hey, I want to take three engineers. And have them write tooling to make Heroku easier to use. What? Heroku is not easy to use. Why aren't, you know, so you really, I would, it would be a little bit more of a stretch to walk them through it. I think a case could be made, but, definitely would take some more, more convincing than, than what was needed in our case. [00:27:53] Jeremy: Yeah. And I guess if you're able to contrast that with, you were saying, Oh, I need three people to help me make Heroku easier. Your actual platform team on AWS, I imagine was much larger, right? [00:28:03] David: Initially it was, there was, it was three people did the initial move over. And so by the time we went public, we'd been on this new system for, I don't know, six to nine months. I can't remember exactly. And so at that time the platform team was four or five people, and I, I mean, so percentage wise, right, the engineering team was maybe almost 200, 150, 200. So percentage wise, maybe a little small, I don't know. but it kind of gets back to the power of like the rails and the one person framework. Like everything we did was very much the same And so the Rails app that managed the deployment was very simple. The, the command line app, even the Go one with all of its verbosity was very, very simple. so it was pretty easy for that small team to manage. but, Yeah, so it was sort of like for redundancy, we probably needed more than three or four people because you know, somebody goes out sick or takes a vacation. That's a significant part of the team. But in terms of like just managing the complexity and building it and maintaining it, like it worked pretty well with, you know, four or five people. Where Rails fits in vs other technology [00:29:09] Jeremy: So during the Keynote today, they were talking about how companies like GitHub and Shopify and so on, they're, they're using Rails and they're, they're successful and they're fairly large. but I think the thing that was sort of unsaid was the fact that. These companies, while they use Rails, they use a lot of other, technology as well. And, and, and kind of increasing amounts as well. So, I wonder from your perspective, either from your experience at StitchFix or maybe going forward, what is the role that, that Ruby and Rails plays? Like, where does it make sense for that to be used versus like, Okay, we need to go and build something in Java or, you know, or Go, that sort of thing? [00:29:51] David: right. I mean, I think for like your standard database backed web app, it's obviously great. especially if your sort of mindset bought into server side rendering, it's going to be great at that. so like internal tools, like the customer service dashboard or... You know, something for like somebody who works at a company to use. Like, it's really great because you can go super fast. You're not going to be under a lot of performance constraints. So you kind of don't even have to think about it. Don't even have to solve it. You can, but you don't have to, where it wouldn't work, I guess, you know, if you have really strict performance. Requirements, you know, like a, a Go version of some API server is going to use like percentages of what, of what Rails would use. If that's meaningful, if what you're spending on memory or compute is, is meaningful, then, then yeah. That, that becomes worthy of consideration. I guess if you're, you know, if you're making a mobile app, you probably need to make a mobile app and use those platforms. I mean, I guess you can wrap a Rails app sort of, but you're still making, you still need to make a mobile app, that does something. yeah. And then, you know, interestingly, the data science part of Stitch Fix was not part of the engineering team. They were kind of a separate org. I think Ruby and Rails was probably the only thing they didn't use over there. Like all the ML stuff, everything is either Java or Scala or Python. They use all that stuff. And so, yeah, if you want to do AI and ML with Ruby, you, it's, it's hard cause there's just not a lot there. You really probably should use Python. It'll make your life easier. so yeah, those would be some of the considerations, I guess. [00:31:31] Jeremy: Yeah, so I guess in the case of, ML, Python, certainly, just because of the, the ecosystem, for maybe making a command line application, maybe Go, um, Go or Rust, perhaps, [00:31:44] David: Right. Cause you just get a single binary. Like the problem, I mean, I wrote this book on Ruby command line apps and the biggest problem is like, how do I get the Ruby VM to be anywhere so that it can then run my like awesome scripts? Like that's kind of a huge pain. (laughs) So [00:31:59] Jeremy: and then you said, like, if it's Very performance sensitive, which I am kind of curious in, in your experience with the companies you've worked at, when you're taking on a project like that, do you know up front where you're like, Oh, the CPU and memory usage is going to be a problem, or is it's like you build it and you're like, Oh, this isn't working. So now I know. [00:32:18] David: yeah, I mean, I, I don't have a ton of great experience there at Stitch Fix. The biggest expense the company had was the inventory. So like the, the cost of AWS was just de minimis compared to all that. So nobody ever came and said, Hey, you've got to like really save costs on, on that stuff. Cause it just didn't really matter. at the, the mental health startup I was at, it was too early. But again, the labor costs were just far, far exceeded the amount of money I was spending on, on, um, you know, compute and infrastructure and stuff like that. So, Not knowing anything, I would probably just sort of wait and see if it's a problem. But I suppose you always take into account, like, what am I actually building? And like, what does this business have to scale to, to make it worthwhile? And therefore you can kind of do a little bit of planning ahead there. But, I dunno, I think it would kind of have to depend. [00:33:07] Jeremy: There's a sort of, I guess you could call it a meme, where people say like, Oh, it's, it's not, it's not Rails that's slow, it's the, the database that's slow. And, uh, I wonder, is that, is that accurate in your experience, or, [00:33:20] David: I mean, most of the stuff that we had that was slow was the database, because like, it's really easy to write a crappy query in Rails if you're not, if you're not careful, and then it's really easy to design a database that doesn't have any indexes if you're not careful. Like, you, you kind of need to know that, But of course, those are easy to fix too, because you just add the index, especially if it's before the database gets too big where we're adding indexes is problematic. But, I think those are just easy performance mistakes to make. Uh, especially with Rails because you're not, I mean, a lot of the Rails developers at Citrix did not know SQL at all. I mean, they had to learn it eventually, but they didn't know it at all. So they're not even knowing that what they're writing could possibly be problematic. It's just, you're writing it the Rails way and it just kind of works. And at a small scale, it does. And it doesn't matter until, until one day it does. [00:34:06] Jeremy: And then in, in the context of, let's say, using ActiveRecord and instantiating the objects, or, uh, the time it takes to render templates, that kinds of things, to, at least in your experience, that wasn't such of an issue. [00:34:20] David: No, and it was always, I mean, whenever we looked at why something was slow, it was always the database and like, you know, you're iterating over some active records and then, and then, you know, you're going into there and you're just following this object graph. I've got a lot of the, a lot of the software at Stitch Fix was like internal stuff and it was visualizing complicated data out of the database. And so if you didn't think about it, you would just start dereferencing and following those relationships and you have this just massive view and like the HTML is fine. It's just that to render this div, you're. Digging into some active record super deep. and so, you know, that was usually the, the, the problems that we would see and they're usually easy enough to fix by making an index or. Sometimes you do some caching or something like that. and that solved most of the, most of the issues [00:35:09] Jeremy: The different ways people learn [00:35:09] Jeremy: so you're also the author of the book, Sustainable Web Development with Ruby on Rails. And when you talk to people about like how they learn things, a lot of them are going on YouTube, they're going on, uh, you know, looking for blogs and things like that. And so as an author, what do you think the role is of, of books now? Yeah, [00:35:29] David: I have thought about this a lot, because I, when I first got started, I'm pretty old, so books were all you had, really. Um, so they seem very normal and natural to me, but... does someone want to sit down and read a 400 page technical book? I don't know. so Dave Thomas who runs Pragmatic Bookshelf, he was on a podcast and was asked the same question and basically his answer, which is my answer, is like a long form book is where you can really lay out your thinking, really clarify what you mean, really take the time to develop sometimes nuanced, examples or nuanced takes on something that are Pretty hard to do in a short form video or in a blog post. Because the expectation is, you know, someone sends you an hour long YouTube video, you're probably not going to watch that. Two minute YouTube video is sure, but you can't, you can't get into so much, kind of nuanced detail. And so I thought that was, was right. And that was kind of my motivation for writing. I've got some thoughts. They're too detailed. It's, it's too much set up for a blog post. There's too much of a nuanced element to like, really get across. So I need to like, write more. And that means that someone's going to have to read more to kind of get to it. But hopefully it'll be, it'll be valuable. one of the sessions that we're doing later today is Ruby content creators, where it's going to be me and Noel Rappin and Dave Thomas representing the old school dudes that write books and probably a bunch of other people that do, you know, podcasts videos. It'd be interesting to see, I really want to know how do people learn stuff? Because if no one reads books to learn things, then there's not a lot of point in doing it. But if there is value, then, you know. It should be good and should be accessible to people. So, that's why I do it. But I definitely recognize maybe I'm too old and, uh, I'm not hip with the kids or, or whatever, whatever the case is. I don't know. [00:37:20] Jeremy: it's tricky because, I think it depends on where you are in the process of learning that thing. Because, let's say, you know a fair amount about the technology already. And you look at a book, in a lot of cases it's, it's sort of like taking you from nothing to something. And so you're like, well, maybe half of this isn't relevant to me, but then if I don't read it, then I'm probably missing a lot still. And so you're in this weird in be in between zone. Another thing is that a lot of times when people are trying to learn something, they have a specific problem. And, um, I guess with, with books, it's, you kind of don't know for sure if the thing you're looking for is going to be in the book. [00:38:13] David: I mean, so my, so my book, I would not say as a beginner, it's not a book to learn how to do Rails. It's like you already kind of know Rails and you want to like learn some comprehensive practices. That's what my book is for. And so sometimes people will ask me, I don't know Rails, should I get your book? And I'm like, no, you should not. but then you have the opposite thing where like the agile web development with Rails is like the beginner version. And some people are like, Oh, it's being updated for Rails 7. Should I get it? I'm like, probably not because How to go from zero to rails hasn't changed a lot in years. There's not that much that's going to be new. but, how do you know that, right? Hopefully the Table of Contents tells you. I mean, the first book I wrote with Pragmatic, they basically were like, The Table of Contents is the only thing the reader, potential reader is going to have to have any idea what's in the book. So, You need to write the table of contents with that in mind, which may not be how you'd write the subsections of a book, but since you know that it's going to serve these dual purposes of organizing the book, but also being promotional material that people can read, you've got to keep that in mind, because otherwise, how does anybody, like you said, how does anybody know what's, what's going to be in there? And they're not cheap, I mean, these books are 50 bucks sometimes, and That's a lot of money for people in the U. S. People outside the U. S. That's a ton of money. So you want to make sure that they know what they're getting and don't feel ripped off. [00:39:33] Jeremy: Yeah, I think the other challenge is, at least what I've heard, is that... When people see a video course, for whatever reason, they, they set, like, a higher value to it. They go, like, oh, this video course is, 200 dollars and it's, like, seems like a lot of money, but for some people it's, like, okay, I can do that. But then if you say, like, oh, this, this book I've been researching for five years, uh, I want to sell it for a hundred bucks, people are going to be, like no. No way., [00:40:00] David: Yeah. Right. A hundred bucks for a book. There's no way. That's a, that's a lot. Yeah. I mean, producing video, I've thought about doing video content, but it seems so labor intensive. Um, and it's kind of like, It's sort of like a performance. Like I was mentioning before we started that I used to play in bands and like, there's a lot to go into making an even mediocre performance. And so I feel like, you know, video content is the same way. So I get that it like, it does cost more to produce, but, are you getting more information out of it? I, that, I don't know, like maybe not, but who knows? I mean, people learn things in different ways. So, [00:40:35] Jeremy: It's just like this perception thing, I think. And, uh, I'm not sure why that is. Um, [00:40:40] David: Yeah, maybe it's newer, right? Maybe books feel older so they're easier to make and video seems newer. I mean, I don't know. I would love to talk to engineers who are like... young out of college, a few years into their career to see what their perception of this stuff is. Cause I mean, there was no, I mean, like I said, I read books cause that's all there was. There was no, no videos. You, you go to a conference and you read a book and that was, that was all you had. so I get it. It seems a whole video. It's fancier. It's newer. yeah, I don't know. I would love to hear a wide variety of takes on it to see what's actually the, the future, you know? [00:41:15] Jeremy: sure, yeah. I mean, I think it probably can't just be one or the other, right? Like, I think there are... Benefits of each way. Like, if you have the book, you can read it at your own pace without having to, like, scroll through the video, and you can easily copy and paste the, the code segments, [00:41:35] David: Search it. Go back and forth. [00:41:36] Jeremy: yeah, search it. So, I think there's a place for it, but yeah, I think it would be very interesting, like you said, to, to see, like, how are people learning, [00:41:45] David: Right. Right. Yeah. Well, it's the same with blogs and podcasts. Like I, a lot of podcasters I think used to be bloggers and they realized that like they can get out what they need by doing a podcast. And it's way easier because it's more conversational. You don't have to do a bunch of research. You don't have to do a bunch of editing. As long as you're semi coherent, you can just have a conversation with somebody and sort of get at some sort of thing that you want to talk about or have an opinion about. And. So you, you, you see a lot more podcasts and a lot less blogs out there because of that. So it's, that's kind of like the creators I think are kind of driving that a little bit. yeah. So I don't know. [00:42:22] Jeremy: Yeah, I mean, I can, I can say for myself, the thing about podcasts is that it's something that I can listen to while I'm doing something else. And so you sort of passively can hopefully pick something up out of that conversation, but... Like, I think it's maybe not so good at the details, right? Like, if you're talking code, you can talk about it over voice, but can you really visualize it? Yeah, yeah, yeah. I think if you sit down and you try to implement something somebody talked about, you're gonna be like, I don't know what's happening. [00:42:51] David: Yeah. [00:42:52] Jeremy: So, uh, so, so I think there's like these, these different roles I think almost for so like maybe you know the podcast is for you to Maybe get some ideas or get some familiarity with a thing and then when you're ready to go deeper You can go look at a blog post or read a book I think video kind of straddles those two where sometimes video is good if you want to just see, the general concept of a thing, and have somebody explain it to you, maybe do some visuals. that's really good. but then it can also be kind of detailed, where, especially like the people who stream their process, right, you can see them, Oh, let's, let's build this thing together. You can ask me questions, you can see how I think. I think that can be really powerful. at the same time, like you said, it can be hard to say, like, you know, I look at some of the streams and it's like, oh, this is a three hour stream and like, well, I mean, I'm interested. I'm interested, but yeah, it's hard enough for me to sit through a, uh, a three hour movie, [00:43:52] David: Well, then that, and that gets into like, I mean, we're, you know, we're at a conference and they, they're doing something a little, like, there are conference talks at this conference, but there's also like. sort of less defined activities that aren't a conference talk. And I think that could be a reaction to some of this too. It's like I could watch a conference talk on, on video. How different is that going to be than being there in person? maybe it's not that different. Maybe, maybe I don't need to like travel across the country to go. Do something that I could see on video. So there's gotta be something here that, that, that meets that need that I can't meet any other way. So it's all these different, like, I would like to think that's how it is, right? All this media all is a part to play and it's all going to kind of continue and thrive and it's not going to be like, Oh, remember books? Like maybe, but hopefully not. Hopefully it's like, like what you're saying. Like it's all kind of serving different purposes that all kind of work together. Yeah. [00:44:43] Jeremy: I hope that's the case, because, um, I don't want to have to scroll through too many videos. [00:44:48] David: Yeah. The video's not for me. Large Language Models [00:44:50] Jeremy: I, I like, I actually do find it helpful, like, like I said, for the high level thing, or just to see someone's thought process, but it's like, if you want to know a thing, and you have a short amount of time, maybe not the best, um, of course, now you have all the large language model stuff where you like, you feed the video in like, Hey, tell, tell, tell me, uh, what this video is about and give me the code snippets and all that stuff. I don't know how well it works, but it seems [00:45:14] David: It's gotta get better. Cause you go to a support site and they're like, here's how to fix your problem, and it's a video. And I'm like, can you just tell me? But I'd never thought about asking the AI to just look at the video and tell me. So yeah, it's not bad. [00:45:25] Jeremy: I think, that's probably where we're going. So it's, uh, it's a little weird to think about, but, [00:45:29] David: yeah, yeah. I was just updating, uh, you know, like I said, I try to keep the book updated when new versions of Rails come out, so I'm getting ready to update it for Rails 7. 1 and in Amazon's, Kindle Direct Publishing as their sort of backend for where you, you know, publish like a Kindle book and stuff, and so they added a new question, was AI used in the production of this thing or not? And if you answer yes, they want you to say how much, And I don't know what they're gonna do with that exactly, but I thought it was pretty interesting, cause I would be very disappointed to pay 50 for a book that the AI wrote, right? So it's good that they're asking that? Yeah. [00:46:02] Jeremy: I think the problem Amazon is facing is where people wholesale have the AI write the book, and the person either doesn't review it at all, or maybe looks at a little, a little bit. And, I mean, the, the large language model stuff is very impressive, but If you have it generate a technical book for you, it's not going to be good. [00:46:22] David: yeah. And I guess, cause cause like Amazon, I mean, think about like Amazon scale, like they're not looking at the book at all. Like I, I can go click a button and have my book available and no person's going to look at it. they might scan it or something maybe with looking for bad words. I don't know, but there's no curation process there. So I could, yeah. I could see where they could have that, that kind of problem. And like you as the, as the buyer, you don't necessarily, if you want to book on something really esoteric, there are a lot of topics I wish there was a book on that there isn't. And as someone generally want to put it on Amazon, I could see a lot of people buying it, not realizing what they're getting and feeling ripped off when it was not good. [00:47:00] Jeremy: Yeah, I mean, I, I don't know, if it's an issue with the, the technical stuff. It probably is. But I, I know they've definitely had problems where, fiction, they have people just generating hundreds, thousands of books, submitting them all, just flooding it. [00:47:13] David: Seeing what happens. [00:47:14] Jeremy: And, um, I think that's probably... That's probably the main reason why they ask you, cause they want you to say like, uh, yeah, you said it wasn't. And so now we can remove your book. [00:47:24] David: right. Right. Yeah. Yeah. [00:47:26] Jeremy: I mean, it's, it's not quite the same, but it's similar to, I don't know what Stack Overflow's policy is now, but, when the large language model stuff started getting big, they had a lot of people answering the questions that were just. Pasting the question into the model [00:47:41] David: Which because they got it from [00:47:42] Jeremy: and then [00:47:43] David: The Got model got it from Stack Overflow. [00:47:45] Jeremy: and then pasting the answer into Stack Overflow and the person is not checking it. Right. So it's like, could be right, could not be right. Um, cause, cause to me, it's like, if, if you generate it, if you generate the answer and the answer is right, and you checked it, I'm okay with that. [00:48:00] David: Yeah. Yeah. [00:48:01] Jeremy: but if you're just like, I, I need some karma, so I'm gonna, I'm gonna answer these questions with, with this bot, I mean, then maybe [00:48:08] David: I could have done that. You're not adding anything. Yeah, yeah. [00:48:11] Jeremy: it's gonna be a weird, weird world, I think. [00:48:12] David: Yeah, no kidding. No kidding. [00:48:15] Jeremy: that's a, a good place to end it on, but is there anything else you want to mention, [00:48:19] David: No, I think we covered it all just yeah, you could find me online. I'm Davetron5000 on Ruby. social Mastodon, I occasionally post on Twitter, but not that much anymore. So Mastodon's a place to go. [00:48:31] Jeremy: David, thank you so much [00:48:32] David: All right. Well, thanks for having me.
undefined
Nov 15, 2023 • 44min

ChaelCodes on The Joy of Programming Games and Streaming (RubyConf 2023)

Episode Notes Rachael Wright-Munn (ChaelCodes) talks about her love of programming games (games with programming elements in them, not how to make games!), starting her streaming career with regex crosswords, and how streaming games and open source every week led her to a voice acting role in one of her favorite programming games. Recorded at RubyConf 2023 in San Diego. mastodon twitch Personal website Programming Games mentioned: Regex Crossword SHENZHEN I/O EXAPUNKS 7 Billion Humans One Dreamer Code Rom@ntic Bitburner Transcript You can help edit this transcript on GitHub. Jeremy: I'm here at RubyConf San Diego with Rachel Wright-Munn, and she goes by Chaelcodes online. Thanks for joining me today. Rachael: Hi, everyone. Hi, Jeremy. Really excited to be here. Jeremy: So probably the first thing I'll ask about is on your web page, and I've noticed you have streams, you say you have an interest in not just regular games, but programming games, so. Rachael: Oh my gosh, I'm so glad you asked about this. Okay, so I absolutely love programming games. When I first started streaming, I did it with Regex Crossword. What I really like about it is the fact that you have this joyful environment where you can solve puzzles and work with programming, and it's really focused on the experience and the joy. Are you familiar with Zach Barth of Zachtronics? Jeremy: Yeah. So, I've tried, what was it? There's TIS-100. And then there's the, what was the other one? He had one that's... Rachael: Opus Magnum? Shenzhen I/O? Jeremy: Yeah, Shenzhen I/O. Rachael: Oh, my gosh. Shenzhen I/O is fantastic. I absolutely love that. The whole conceit of it, which is basically that you're this electronics engineer who's just moved to Shenzhen because you can't find a job in the States. And you're trying to like build different solutions for these like little puzzles and everything. It was literally one of the, I think that was the first programming game that really took off just because of the visuals and everything. And it's one of my absolute favorites. I really like what he says about it in terms of like testing environments and the developer experience. Cause it's built based on assembly, right? He's made a couple of modifications. Like he's talked about it before where it's like The memory allocation is different than what it would actually look like in assembly and the way the registers are handled I believe is different, I wouldn't think of assembly as something that's like fun to write, but somehow in this game it is. How far did you get in it? Jeremy: Uh, so I didn't get too far. So, because like, I really like the vibe and sort of the environment and the whole concept, right, of you being like, oh, you've been shipped off to China because that's the only place that these types of jobs are, and you're working on these problems with bad documentation and stuff like that. And I like the whole concept, but then the actual writing of the software, I was like, I don't know. Rachael: And it's so hard, one of the interesting things about that game is you have components that you drop on the board and you have to connect them together and wire them, but then each component only has a specific number of lines. So like half the time I would be like, oh, I have this solution, but I don't have enough lines to actually run it or I can't fit enough components, then you have to go in and refactor it and everything. And it's just such a, I don't know, it's so much fun for me. I managed to get through all of the bonus levels and actually finish it. Some of them are just real, interesting from both a story perspective and interesting from a puzzle perspective. I don't wanna spoil it too much. You end up outside Shenzhen, I'll just say that. Jeremy: OK. That's some good world building there. Rachael: Yeah. Jeremy: Because in your professional life, you do software development work. So I wonder, what is it about being in a game format where you're like, I'm in it. I can do it more. And this time, I'm not even being paid. I'm just doing it for fun. Rachael: I think for me, software development in general is a very joyful experience. I love it. It's a very human thing. If you think about it like math, language, all these things are human concepts and we built upon that in order to build software in our programs and then on top of that, like the entire purpose of everything that we're building is for humans, right? Like they don't have rats running programs, you know what I mean? So when I think about human expression and when I think about programming, these two concepts are really closely linked for me and I do see it as joyful, But there are a lot of things that don't spark joy in our development processes, right? Like lengthy test suites, or this exhausting back and forth, or sometimes the designs, and I just, I don't know how to describe it, but sometimes you're dealing with ugly code, sometimes you're dealing with code smells, and in your professional developer life, sometimes you have to put up with that in order to ship features. But when you're working in a programming game, It's just about the experience. And also there is a correct solution, not necessarily a correct solution, but like there's at least one correct solution. You know for a fact that there's, that it's a solvable problem. And for me, that's really fun. But also the environment and the story and the world building is fun as well, right? So one of my favorite ones, we mentioned Shenzhen, but Zachtronics also has Exapunks. And that one's really fun because you have been infected by a disease. And like a rogue AI is the only one that can provide you with the medicine you need to prevent it. And what this disease is doing is it is converting parts of your body into like mechanical components, like wires and everything. So what you have to do as an engineer is you have to write the code to keep your body running. Like at one point, you were literally programming your heart to beat. I don't have problems like that in my day job. In my day job, it's like, hey, can we like charge our customers more? Like, can we put some banners on these pages? Like, I'm not hacking anybody's hearts to keep them alive. Jeremy: The stakes are a little more interesting. Yeah, yeah. Rachael: Yeah, and in general, I'm a gamer. So like having the opportunity to mix two of my passions is really fun. Jeremy: That's awesome. Yeah, because that makes sense where you were saying that there's a lot of things in professional work where it's you do it because you have to do it. Whereas if it's in the context of a game, they can go like, OK, we can take the fun problem solving part. We can bring in the stories. And you don't have to worry about how we're going to wrangle up issue tickets. Rachael: Yeah, there are no Jira tickets in programming games. Jeremy: Yeah, yeah. Rachael: I love what you said there about the problem solving part of it, because I do think that that's an itch that a lot of us as engineers have. It's like we see a problem, and we want to solve it, and we want to play with it, and we want to try and find a way to fix it. And programming games are like this really small, compact way of getting that dopamine hit. Jeremy: For sure. Yeah, it's like. Sometimes when you're doing software for work or for an actual purpose, there may be a feeling where you want to optimize something or make it look really nice or perform really well. And sometimes it just doesn't matter, right? It's just like we need to just put it out and it's good enough. Whereas if it's in the context of a game, you can really focus on like, I want to make this thing look pretty. I want to feel good about this thing I'm making. Rachael: You can make it look good, or you can make it look ugly. You don't have to maintain it. After it runs, it's done. Right, right, right. There's this one game. It's 7 Billion Humans. And it's built by the creators of World of Goo. And it's like this drag and drop programming solution. And what you do is you program each worker. And they go solve a puzzle. And they pick up blocks and whatever. But they have these shredders, right? And the thing is, you need to give to the shredder if you have like a, they have these like little data blocks that you're handing them. If you're not holding a data block and you give to the shredder, the worker gives themself to the shredder. Now that's not ideal inside a typical corporate workplace, right? Like we don't want employees shredding themselves. We don't want our workers terminating early or like anything like that. But inside the context of a game, in order to get the most optimal solution, They have like a lines of code versus fastest execution and sometimes in order to win the end like Lines of code. You just kind of have to shred all your workers at the, When I'm on stream and I do that when I'm always like, okay everybody close your eyes That's pretty good it's Yeah, I mean cuz like in the context of the game. Jeremy: I think I've seen where they're like little They're like little gray people with big eyes Yes, yes, yes, yes. Yeah, so it's like, sorry, people. It's for the good of the company, right? Rachael: It's for my optimal lines of code solution. I always draw like a, I always write a humane solution before I shred them. Jeremy: Oh, OK. So it's, you know, I could save you all, but I don't have to. Rachael: I could save you all, but I would really like the trophy for it. There's like a dot that's going to show up in the elevator bay if I shred you. Jeremy: It's always good to know what's important. But so at the start, you mentioned there was a regular expression crossword or something like that. Is that how you got started with all this? Rachael: My first programming game was Regex Crossword. I absolutely loved it. That's how I learned Regex. Rachael: I love it a lot. I will say one thing that's been kind of interesting is I learned Regex through Regex Crossword, which means there's actually these really interesting gaps in my knowledge. What was it? at Link Tech Retreat, they had like a little Regex puzzle, and it was like forward slash T and then a plus, right? And I was like, I have no idea what that character is, right? Like, I know all the rest of them. But the problem is that forward slash T is tab, and Regex crossword is a browser game. So you can't have a solution that has tab in it. And have that be easy for users. Also, the idea of like greedy evaluation versus lazy evaluation doesn't apply, because you're trying to find a word that satisfies the regex. So it's not necessarily about what the regex is going to take. So it's been interesting finding those gaps, but I really think that some of the value there was around how regex operates and the rules underlying it and building enough experience that I can now use the documentation to fill in any gaps. Jeremy: So the crossword, is it where you know the word and you have to write a regular expression to match it? Or what's the? Rachael: They give you regex. And there's a couple of different versions, right? The first one, you have two regex patterns. There's one going up and down, and there's one going left and right. And you have to fill the crossword block with something that matches both regular expressions. Rachael: Then we get into hexagonal ones. Yeah, where you have angles and a hexagon, and you end up with like three regular expressions. What's kind of interesting about that one is I actually think that the hexagonal regex crosswords are a little bit easier because you have more rules and constraints, which are more hints about what goes in that box. Jeremy: Interesting. OK, so it's the opposite of what I was thinking. They give you the regex rules, and then you put in a word that's going to satisfy all the regex you see. Rachael: Exactly. When I originally did it, they didn't have any sort of hints or anything like that. It was just empty. Now it's like you click a box, and then they've got a suggestion of five possible letters that could go in there. And it just breaks my heart. I liked the old version that was plainer, and didn't have any hints, and was harder. But I acknowledge that the new version is prettier, and probably easier, and more friendly. But I feel like part of the joy that comes from games, that comes from puzzles, It comes from the challenge, and I miss the challenge. Jeremy: I guess someone, it would be interesting to see people who are new to it, if they had tried the old way, if they would have bounced off of it. Rachael: I think you're probably right. I just want them to give me a toggle somewhere. Jeremy: Yeah, oh, so they don't even let you turn off the hints, they're just like, this is how it is. Rachael: Yep. Jeremy: Okay. Well, we know all about feature flags. Rachael: And how difficult they are to maintain in perpetuity. Jeremy: Yeah, but no, that sounds really cool because I think some things, like you can look up a lot of stuff, right? You can look up things about regex or look up how to use them. But I think without the repetition and without the forcing yourself to actually go through the motion, without that it's really hard to like learn and pick it up. Rachael: I completely agree with you. I think the repetition, the practice, and learning the paradigm and patterns is huge. Because like even though I didn't know what forward slash t plus was, I knew that forward slash t was going to be some sort of character type. Jeremy: Yeah, it kind of reminds me of, there was, I'm not sure if you've heard of Vim Adventures, but... Rachael: I did! I went through the free levels. I had a streamerversary and my chat had completed a challenge where I had to go learn Vim. So I played a little bit of Vim Adventures. Jeremy: So I guess it didn't sell you. Rachael: Nope, I got Vim Extensions turned on. Jeremy: Oh, you did? Rachael: Yeah, I have the Vim extension turned on in VS Code. So I play with a little bit of sprinkling of Vim in my everyday. Jeremy: It's kind of funny, because I am not a Vim user in the sense that I don't use it as my daily editor or anything like that. But I do the same thing with the extensions in the browser. I like being able to navigate with the keyboard and all that stuff. Rachael: Oh, that is interesting. That's interesting. You have a point like memorizing all of the different patterns when it comes to like Keyboard navigation and things like that is very similar to navigating in Vim. I often describe writing code in Vim is kind of like solving a puzzle in order to write your code So I think that goes back to that Puzzle feeling that puzzle solving feeling we were having we were talking about before. Jeremy: Yeah, I personally can't remember, but whenever I watch somebody who's, really good at using Vim, it is interesting to see them go, oh, yes, I will go to the fifth word, and I will swap out just this part. And it's all just a few keystrokes, yeah. Rachael: Very impressive. Can be done just as well with backspace and, like, keyboard, like, little arrows and everything. But there is something fun about it and it is... Faster-ish. Jeremy: Yeah, I think it's like I guess it depends on the person, but for some people it's like they, they can think and do things at the speed that they type, you know, and so for them, I guess the the flow of, I'm doing stuff super fast using all these shortcuts is probably helpful to them. Rachael: I was talking to someone last night who was saying that they don't even think about it in Vim anymore. They just do it. I'm not there yet. (laughs) Jeremy: Yeah, I'll probably never be there (laughs) But yeah, it is something to see when you've got someone who's really good at it. Rachael: Definitely. I'm kind of glad that my chat encouraged and pressured me to work with Vim. One of the really cool things is when I'm working on stuff, I'll sometimes be like, oh, I want to do this. Is there a command in Vim for that? And then I'll get multiple suggestions or what people think, and ideas for how I can handle things better. Someone recently told me that if you want to delete to the end of a line, you can use capital D. And this whole time I was doing lowercase d dollar sign. Jeremy: Oh, right, right, right. Yeah. Yeah, it's like there's so many things there that, I mean, we should probably talk about your experiences streaming. But that seems like a really great benefit that you can be working through a problem or just doing anything, really. And then there's people who they're watching, and they're like, I know how to do it better. And they'll actually tell you, yeah. Rachael: I think that being open to that is one of the things that's most important as a streamer. A lot of people get into this cycle where they're very defensive and where they feel like they have to be the expert. But one of the things that I love about my chat is the fact that they do come to me with these suggestions. And then I can be open to them, and I can learn from them. And what I can do is I can take those learnings from one person and pass it on to the other people in chat. I can become a conduit for all of us to learn. Jeremy: So when you first decided to start streaming, I guess what inspired you to give it a shot? Like, what were you thinking? Rachael: That's a great question. It's also kind of a painful question. So the company that I was working for, I found out that there were some pay issues with regards to me being a senior, promotion track, things like that. And it wasn't the first time this had happened, right? Like, I often find that I'm swapping careers every two to three years because of some miserable experience at the company. Like you start and the first year is great. It's fantastic. It's awesome. But at the end of it, you're starting to see the skeletons and that two to three years later you're burnt out. And what I found was that every two to three years I was losing everything, right? Like all of my library of examples, the code that I would reference, like that's in their private repo. When it came to my professional network, the co -workers that liked and respected me, we had always communicated through the workplace Slack. So it's really hard to get people to move from the workplace Slack to like Instagram or Twitter or one of those other places if that's not where, if that's not a place where you're already used to talking to them. And then the other thing is your accomplishments get wiped out, right? Like when you start at the next company and you start talking about promotion and things like that, the work that you did at previous companies doesn't matter. They want you to be a team lead at that company. They want you to lead a massive project at that company and that takes time. It takes opportunities and Eventually, I decided that I wanted to exist outside my company. Like I wanted to have a reputation that went beyond that and that's what originally inspired me to stream And it's pretty hard to jump from like oh. I got really frustrated and burnt out at my company to I've got it I'm gonna do some regex crossword on stream, but honestly, that's what it was right was I just wanted to slowly build this reputation in this community outside of of my company and it's been enormously valuable in terms of my confidence, in terms of my opportunities. I've been able to pick up some really interesting jobs and I'm able to leverage some of those experiences in really clear professional ways and it's really driven me to contribute more to open source. I mentioned that I have a lot of people like giving me advice and suggestions and feedback. That's enormously helpful when you're going out there and you're trying to like get started in open source and you're trying to build that confidence and you're trying to build that reputation. I often talk about having a library of examples, right? Like your best code that you reference again and again and again. If I'm streaming on Twitch, everything that I write has to be open source because I'm literally showing it on video, right? So it's really encouraged me to build that out. And now when I'm talking to my coworkers and companies, I can be like, oh, we need to talk about single table inheritance. I did that in Hunter's Keepers. Why don't we go pull that up and we'll take a look at it. Or are we building a Docker image? I did that in Hunter's Keepers and Conf Buddies. Why don't we look at these, compare them, and see if we can get something working here, right? Like I have all of these examples, and I even have examples from other apps as well. Like I added Twitch Clips to 4M. So when I want to look at how to build a liquid tag, because Jekyll uses liquid tags as well. So when I'm looking at that, I can hop to those examples and hop between them, and I'm never going to lose access to them. Jeremy: Yeah, I mean, that's a really good point where I think a lot of people, they do their work at their job and it's never going to be seen by anyone and you can sort of talk about it, but you can't actually show anybody what you did. So it's like, and I think to that point too, is that there's some knowledge that is very domain specific or specific to that company. And so when you're actually doing open source work, it's something that anybody can pick up and use and has utility way beyond just your company. And the whole point of creating this record, that makes a lot of sense too, because if I wanna know if you know how to code, I can just see like, wow, she streams every Thursday. She's clearly she knows what she's doing and you know, you have these also these open source contributions as well So it's it's sort of like it's not this question of if I interview you It's it's not I'm just going off of your word that and I believe what you're saying. But rather it's kind of the proof is all it's all out there. Rachael: Oh, definitely if I were to think about my goals and aspirations for the future I've been doing this for four years still continuing But I think I would like to get to the point where I don't really have to interview. Where an interview is more of a conversation between me and somebody who already knows they want to hire me. Jeremy: Have you already started seeing a difference? Like you've been streaming for about four years I think Rachael: I had a really interesting job for about eight months doing developer relations with New Relic. That was a really interesting experience. And I think it really pushed the boundaries of what I understood myself to be capable of because I was able to spend 40 hours a week really focused on content creation, on blogging, on podcasting, on YouTube videos and things like that. Obviously there was a lot of event organization and things like that as well. But a lot of the stuff that came out of that time is some of my best work. Like I, I'm trying to remember exactly what I did while I was at New Relic, but I saw a clear decrease afterwards. But yeah, I think that was probably close to the tipping point. I don't for sure know if I'm there yet, right? Like you never know if you're at the point where you don't have to interview anymore until you don't have to interview. But the last two jobs, no, I haven't had to interview. Jeremy: So, doing it full -time, how did you feel about that versus having a more traditional lead or software developer role? Rachael: It was definitely a trade-off. So I spent a lot less time coding and a lot more time with content, and I think a little bit of it was me trying to balance the needs and desires of my audience against the needs and desires of my company. For me, and this is probably going to hurt my chances of getting one of those jobs where I don't have to interview in the future, but my community comes first, right? They're the people who are gonna stick with me when I swap between jobs, but that was definitely something that I constantly had to think about is like, how do I balance what my company wants from me with the responsibility that I have to my community? But also like my first talk, your first open source contribution, which was at RubyConf Denver, Like, that was written while I was at New Relic. Like, would I have had the time to work on a talk in addition to the streaming schedule and everything else? Um, for a period of time, I was hosting Ruby Galaxy, which was a virtual meetup. It didn't last very long, and we have deprecated it. Um, I deprecated it before I left the company because I wanted to give it, like, a good, clean ending versus, um, necessarily having it, like, linger on and be a responsibility for other people. but... I don't think I would have done those if I was trying to balance it with my day job. So, I think that that was an incredible experience. That said, I'm very glad it's over. I'm very glad that the only people I'm beholden to are my community now. Jeremy: So, is it the sheer amount that you had to do that was the main issue? Or is it more that that tension between, like you said, serving your audience and your community versus serving your employer? Rachael: Oh, a lot of it was tension. A lot of it was hectic, event management in general. I think if you're like planning and organizing events, that's a very challenging thing to do. And it's something that kind of like goes down to the deadline, right? And it's something where everybody's trying to like scramble and pull things together and keep things organized. And that was something that I don't think I really enjoyed. I like to have everything like nice and planned out and organized and all that sort of stuff, and I don't think that that's Something that happens very often in event management at least not from my experience So these were like in -person events or what types of events like I actually skipped out before the in -person events. They would have been in -person events. We had future stack at New Relic, which is basically like this big gathering where you talk about things you can do with New Relic and that sort of stuff. We all put together talks for that. We put together an entire like. Oh gosh, I'm trying to remember the tool that we use, but it was something similar to gather round where you like interact with people. And there's just a lot that goes into that from marketing to event planning to coordinating with everyone. I'm grateful for my time at New Relic and I made some incredible friends and some incredible connections and I did a lot, but yeah, I'm very glad I'm not in DevRel anymore. I don't, if you ask any DevRel, They'll tell you it's hectic, they'll tell you it's chaotic, and they'll tell you it's a lot of work. Jeremy: Yeah. So it sounds like maybe the streaming and podcasting or recording videos, talks, that part you enjoy, but it's the I'm responsible for planning this event for all these people to, you know. That's the part where you're like, OK, maybe not for me. Rachael: Yeah, kind of. I describe myself as like a content creator because I like to just like dabble and make things, right? Like I like to think about like, what is the best possible way to craft this tweet or this post or like to sit there and be like, okay, how can I structure this blog post to really communicate what I want people to understand? When it comes to my streams, what I actually do is I start with the hero's journey as a concept. So every single stream, we start with an issue in the normal world, right? And then what we do is we get drawn into the chaos realm as we're like debugging and trying to build things and going Back and forth and there's code flying everywhere and the tests are red and then they're green and then they're red and then they're green and then finally at the end we come back to the normal world as we create this PR and, Submit it neither merge it or wait for maintainer feedback. And for me that Story arc is really key and I like I'm a little bit of an artist. I like the artistry of it. I like the artistry of the code, and I like the artistry of creating the content. I think I've had guests on the show before, and sometimes it's hard to explain to them, like, no, no, no, this is a code show. We can write code, and that's great, but that's not what it's about. It's not just about the end product. It's about bringing people along with us on the journey. And sometimes it's been three hours, and I'm not doing a great job of bringing people along on the journey so like you know I'm tooting my own horn a little bit here but like that is important to me. Jeremy: So when you're working through a problem, When you're doing it on stream versus you're doing it by yourself, what are the key differences in how you approach the problem or how you work through it? Rachael: I think it's largely the same. It's like almost exactly the same. What I always do is, when I'm on stream, I pause, I describe the problem, I build a test for it, and then I start working on trying to fix what's wrong. I'm a huge fan of test -driven development. The way I see it, you want that bug to be reproducible, and a test gives you the easiest way to reproduce it. For me, it's about being easy as much as it is about it being the right way or not. But yeah, I would say that I approach it largely in the same way. I was in the content creator open space a little bit earlier, and I had to give them a bit of a confession. There is one small difference when I'm doing something on stream versus when I'm doing something alone. Sometimes, I have a lot of incredible senior staff, smart, incredible people in my chat. I'll describe the problem in vivid detail, and then I'll take my time writing the test, and by the time I'm done writing the test, somebody will have figured out what the problem is, and talk back to me about it. I very rarely do that. It's more often when it's an ops or an infrastructure or something like that. A great example of this is like the other day I was having an issue, I mentioned the Vim extensions. If I do command P on the code section, Vim extensions was capturing that, and so it wasn't opening the file. So one of my chatters was like, oh, you know, you can fix that if you Google it. I was like, oh, I don't know. I mean, I could Google it, but it will take so long and distract from the stream. Literally less than 15 minutes later a chatter had replied with like, here's exactly what to add to your VS Code extension, and I knew that was gonna happen. So that's my little secret confession. That's the only difference when I'm debugging things on stream is sometimes I'll let chat do it for me. Jeremy: Yeah, that's a superpower right there. Rachael: It is, and I think that happens because I am open to feedback and I want people to engage with me and I support that and encourage that in my community. I think a lot of people sometimes get defensive when it comes to code, right? Like when it comes to the languages or the frameworks that we use, right? There's a little bit of insecurity because you dive so deep and you gain so much knowledge that you're kind of scared that there might be something that's just as good because it means you might not have made the right decision. And I think that affects us when it comes to code reviews. I think it affects us when we're like writing in public. And I think, yeah, and I think it affects a lot of people when they're streaming, where they're like, if I'm not the smartest person in the room, and why am I the one with a camera and a microphone? But I try to set that aside and be like, we're all learning here. Jeremy: And when people give that feedback, and it's good feedback, I think it's really helpful when people are really respectful about it and kind about it. Have you had any issues like having to moderate that or make sure it stays positive in the context of the stream? Rachael: I have had moderation issues before, right? Like, I'm a woman on the internet, I'm going to have moderation issues. But for me, when it comes to feedback and suggestions, I try to be generous with my interpretation and my understanding of what they're going with. Like people pop in and they'll say things like, Ruby is dead, Rails is dead. And I have commands for that to like remind them, no, actually Twitch is a Rails app. So like, no, it's definitely not dead. You just used it to send a message. But like, I try to be understanding of where people are coming from and to meet them where they are, even if they're not being the most respectful. And I think what I've actually noticed is that when I do that, their tone tends to change. So I have two honorary trolls in my chat, Kego and John Sugar, and they show up and they troll me pretty frequently. But I think that that openness, that honesty, like that conversation back and forth it tends to defuse any sort of aggressive tension or anything. Jeremy: Yeah, and it's probably partly a function of how you respond, and then maybe the vibe of your stream in general probably brings people that are. Rachael: No, I definitely agree. I think so. Jeremy: Yeah. Rachael: It's the energy, you get a lot of the energy that you put out. Jeremy: And you've been doing this for about four years, and I'm having trouble picturing what it's even like, you know, you've never done a stream and you decide I'm gonna turn on the camera and I'm gonna code live and, you know, like, what was kind of going through your mind? How did you prepare? And like, what did, like, what was that like? Rachael: Thank you so much. That's a great question. So, actually, I started with Regex Crossword because it was structured, right? Like, I didn't necessarily know what I wanted to do and what I wanted to work on, but with Regex Crossword, you have a problem and you're solving it. It felt very structured and like a very controlled environment, and that gave me the confidence to get comfortable with, like, I'm here, I have a moderator, right? Like we're talking back and forth, I'm interacting with chatters, and that allowed me to kind of build up some skills. I'm actually a big fan of Hacktoberfest. I know a lot of people don't like it. I know a lot of people are like, oh, there are all these terrible spam PRs that show up during Hacktoberfest and open source repositories. But I'm a really big fan because I've always used it to push my boundaries, right? Like every single year, I've tried to take a new approach on it. So the first year that I did it, I decided that what I wanted to do to push my boundaries was to actually work on an application. So this one was called Hunter's Keepers. It was an app for managing characters in Monster of the Week and it was a Reels app because that's what I do professionally and that's what I like to work on. So I started just building that for Hacktoberfest and people loved it. It got a ton of engagement, way more than Regex Crossword and a little bit, like those open source streams continue to do better than the programming games, but I love the programming games so much that I don't wanna lose them, but that's where it kind of started, right? Was me sitting there and saying like, oh, I wanna work on these Rails apps. The Hacktoberfest after that one, And I was like, OK, I worked on my own app in the open, and I've been doing that for basically a year. I want to work on somebody else's app. So I pushed myself to contribute to four different open source repositories. One of the ones I pushed myself to work on was 4M. They did not have Twitch clips as embeds. They had YouTube videos and everything else. And I looked into how to do it, and I found out how liquids tags work, and I had a ton of other examples. I feel like extensions like that are really great contributions to open source because it's an easy way with a ton of examples that you can provide value to the project, and it's the sort of thing where, like, if you need it, other people probably need it as well. So I went and I worked on that, and I made some Twitch clips. And that was like one of my first like external open source project contributions. And that kind of snowballed, right? Because I now knew how to make a liquid tag. So when I started working on my Jekyll site, and I found out that they had liquid tags that were wrapped in gems, I used that as an opportunity to learn how to build a gem. And like how to create a gem that's wrapped around a liquid tag. And that exists now and is a thing that I've done. And so it's all of these little changes and moments that have stacked on top of each other, right? Like it's me going in and saying, OK, today I'd like to customize my alerts. Or like, today I'd like to buy a better microphone and set it up and do these changes. It's not something that changed all at once, right? It's just this small putting in the time day by day, improving. I say like the content gears are always grinding. You always need something new to do, right? And that's basically how my stream has gone for the last four years, is I'm just always looking for something new to do. We haven't talked about this yet, but I'm a voice actress in the programming video game, One Dreamer. And I actually collaborated with the creator of another one, Compressor, who like reached out to me about that Steam key. But the reason that I was able to talk to these people and I was able to reach out to them is rooted in Regex Crossword, right? Cause I finished Regex Crossword and Thursday night was like my programming game stream. And I loved them, so I kept doing them. And I kept picking up new games to play, and I kept exploring new things. So at the end of it, I ended up in this place where I had this like backlog in knowledge and history around programming games. So when Compressor was developed, I think he's like the creator, Charlie Bridge is like a VP at Arm or something. And okay, I should back up a little bit. Compressor is this game where you build CPUs with Steam. So it's like Steam Punk, like, electrical engineering components. Ah, it's so much fun. And like, the characters are all cool, because it's like you're talking to Nikola Tesla, and like Charles Babbage, and Ada Lovelace, and all this sort of stuff. It's just super fun. But the reason he reached out to me was because of that reputation, that backlog, that feedback. Like, when you think about how you became a developer, right, it's day by day, right? when you develop your experience. There's a moment where you look back and you're like, I just have all of these tools in my toolkit. I have all of these experiences. I've done all these things, and they just stack to become something meaningful. And that's kind of how it's gone with my stream, is just every single day I was trying to push, do something new. Well, not every day. Sometimes I have a lazy day, but like, but like I am continuously trying to find new ground to tread. Jeremy: Yeah, I mean that's really awesome thinking about how it went from streaming you solving these regex crosswords to all the way to ending up in one of these games that you play. Yeah, that's pretty pretty cool. Rachael: By the way, that is my absolute favorite game. So the whole reason that I'm in the game is because I played the demo on stream. Jeremy: Oh, nice. Rachael: And I loved it. Like I immediately was like, I'm going to go join the creators discord. This is going to be my game of the year. I can't wait to like make a video on this game. What's really cool about this one is that it uses programming as a mechanic and the story is the real driver. It's got this emotional impact and story. The colors are gorgeous and the way you interact with the world, like it is a genuine puzzle game where the puzzles are small, little, simple programming puzzles. And not like I walk up to this and like I solve a puzzle and the door opens. No, it's like you're interacting with different components in the world and wiring them together in order to get the code working. The whole premise is that there's an indie game developer who's gone through this really traumatic experience with his game, and now he's got the broken game, and he's trying to fix it in time for a really important game demo. I think it's like, it's like Vig something. Video game indie gaming. But what happened is I started following the creator, and I was super interested in them. And then he actually reached out to me about like the Steve workshop and then he was looking for people to voice act and I was like me please yes so yeah that's how I got involved with it yeah that's awesome it's like everything came full circle I guess it's like where you started and yeah no absolutely it's amazing. Jeremy: And so what was that experience like the voice acting bit? I'm assuming you didn't have professional experience with that before. Rachael: No, no, no, no. I had to do a lot of research into like how to voice act. My original ones were tossed out. I just, OK, so there's one line in it. This is going to this is so embarrassing. I can't believe I'm saying this on a podcast. There's one line that's like, it's a beautiful day to code. It's like a, because I'm an NPC, right? So like you can keep interacting with me and one of the like cycling ones is like, it's a beautiful day to code. Well, I tried to deliver it wistfully. Like I was staring out a window and I was like, it's a beautiful day to code. And every single person who heard it told me that it sounded like somewhat sensual, sexy. And I was dying because I had just sent this to this like indie game developer that like I appreciated and he replies back and he's like, I'm not sure if there was an audio issue with some of these, but could you like rerecord some of these? So I was very inexperienced. I did a lot of practicing, a lot of vocal exercises, but I think that it turned out well. Jeremy: That's awesome. So you kind of just kept trying and sending samples, or did they have anybody like try and coach you? Rachael: No, I just kept sending samples. I did watch some YouTube videos from like real voice actors. To try and like figure out what the vocal exercises were. One of the things that I did at first was I sent him like one audio, like the best one in my opinion. And he replied back being like, no, just record this like 10, 20 times. Send it to me and I'll chop the one I want. Jeremy: So the, anytime you did that, the one they picked, was it ever the one you thought was the best one? Rachael: Oh gosh, I don't think I actually like, Wow, I don't think I've gone back over the recordings to figure out which one I thought was the best one. Or like checked which one he picked out of the ones that I recorded. Oh, that's interesting. I'm going to have to do that after this. Jeremy: You're going to listen to all the, it's a beautiful day to code. Rachael: The final version is like a nice, neutral like, it's a beautiful day to code. One of the really cool things about that, though, is my character actually triggers the end of game scene, which is really fun. You know how you get a little hint that's like, oh, this is where the end of the game is, my character gets to do that. Jeremy: That's a big responsibility. Rachael: It is. I was so excited when I found out. Jeremy: That's awesome. Cool. Well, I think that's probably a good place to wrap it up on. But is there anything else you want to mention, or any games you want to recommend? Rachael: Oh, I think I mentioned all of them. I think if you look at Code Romantic, AXA Punks, Bitburner, is an idle JavaScript game that can be played in the browser where you write the custom files and build it and you're going off and hacking servers and stuff like that. It's a little light on story. One Dreamer, yeah. I think if you look at those four to five games, you will find one you like. Oh, it's 7 Billion Humans. Jeremy: Oh, right, yeah. Rachael: I haven't written the blog post yet, but that's my five programming video games that you should try if you've never done one before. 7 million humans is on mobile, so if you've got a long flight back from RubyConf, it might be a great choice. Jeremy: Oh, there you go. Rachael: Yeah. Other than that, it can be found at chael.codes, chael.codes/links for the socials, chael.codes/about for more information about me. And yeah, thank you so much for having me. This has been so much fun. Jeremy: Awesome. Well, Rachel, thank you so much for taking the time. Rachael: Thank you.
undefined
Sep 20, 2023 • 1h 1min

Daniel Zingaro and Leo Porter on learning to program with LLMs

Dr. Daniel Zingaro and Dr. Leo Porter are co-authors of the book Learn AI-Assisted Python Programming. Leo will teach an introductory computer science course this quarter at UCSD using this book. We discuss how tools like GitHub Copilot let people new to programming focus on breaking down problems instead of language syntax. Dr. Zingaro is an Associate Professor of Computer Science at University of Toronto Mississauga and Dr. Porter is an Associate Professor at University of California San Diego. This episode was originally posted on Software Engineering Radio. Topics covered: Making programming more accessible Teaching problem decomposition instead of language syntax The importance of reading and testing untrusted generated code The rise of throwaway or one-off code Concerns about relying on commercial tools Rethinking how to assess students Related Links Learn AI-Assisted Python Programming Leo Porter Daniel Zingaro GitHub Copilot Transcript You can help edit this transcript on GitHub. Note the timestamps and audio for this transcript will not completely match. Intro [00:00:00] Jeremy: Today I'm talking to Dr. Leo Porter. He's an associate teaching professor of computer science at the University of California San Diego, and he co-founded the computing education research laboratory there. I'm also joined by Dr. Daniel Zingaro who is an associate teaching professor of computer science at the University of Toronto. And he's also the author of the book, learn to Code by Solving Problems and the Book, Algorithmic Thinking. They are co-authors of the book, learn AI Assisted Python programming. Leo and Dan, welcome to Software Engineering Radio. [00:00:37] Leo: Thank you for having us, Jeremy. I really appreciate your podcast, so thanks. Great to be here. [00:00:41] Dan: Thanks Jeremy. Writing a book for Leo's CS1 class [00:00:43] Jeremy: The first thing we could start with is, is why this book? And, and why now? How did you decide on like, okay, this is the thing we need to do now. [00:00:51] Leo: So, uh, this is Dan. Uh, so Dan, um, like really early when LLMs first kind of were coming out and being seen on the scene for programming, uh, he started playing with them, uh, for programming projects. And I think Dan really quickly realized that they'd had this, a big impact on how we teach programming. so he reached out to me, uh, and said, I really need to give em a try. And, uh, after I played with them for a little while, I had the exact same realization that this is gonna change, uh, how we teach programming, uh, in a pretty dramatic way. So having realized that, having realized that we had to change our, uh, introductory CS1 courses, we knew we needed to do that, but in order to teach that class, we'd have to have a book that we could assign our students that that would go along with the class. And so we knew we had to change the class, but we also knew we had to have a book for it. And given the, the timeline to write books, we started in the book first. Um, and so that's how it got started. LLMs for Syntax, Humans for breaking down problems [00:01:45] Dan: I guess we figured out that our course had to change first, before we knew exactly, um, how it had to change. One thing we, um, learned early on was that the kinds of assignments we give in our introductory courses, they're just solved by, by these tools like ChatGPT and copilot. So, uh, we knew something had to change, and then it is just a matter of figuring out what. And so we spent, um, quite a bit of time with these tools and we started to realize that what's gonna change is the skills that our students need to learn, uh, to be effective using these tools. So like b before these tools, we would spend a lot of time teaching syntax. Um, and students struggle quite a bit with learning syntax, which I mean, it's very, it's, it's very frustrating, right? Cuz you can't even do anything until you get the syntax right? And you're getting all these errors like missing colons and, you know, mismatched braces and stuff like that. Uh, so it's actually good, that, the LLMs are doing the syntax for the students. But you know, just because that skill's, uh, not needed as much, uh, doesn't mean that there aren't still skills for students to learn. So instead of syntax, other things become more important. Uh, so for example, uh, Leo and I, realize that reading code is gonna be extremely important even more so than before. I think if, if that, if that's even possible. Uh, and that's because sometimes you're gonna get back code that just doesn't work. And so we realized that students are gonna need to be able to read, the response that they get to see if the code looks reasonable, or not, right? And then if the code, uh, I is unreasonable, then they need to read more code, uh, and look at other solutions, right, that they get from the, uh, LLM. Uh, there are other, uh, things they can do as well, like messing around with the prompt and so on. But they're gonna need to be able to read code, uh, throughout the process. And then, so we just kind of kept on using these tools and documenting the skills that students are gonna need. And we just kinda realized that all the skills students are gonna need are skills we would want to teach anyway. So like, uh, one more example is testing, right? So, students may now not have, uh, an understanding of every last detail of, you know, the Python language like they would before. And so then that makes testing even more important, right? Than it was they need to verify that the code they're getting is correct. And so they have to be very good at writing test cases. and, and, you know, similar, similar for debugging, we need our students to have strong debugging skills, again, even potentially stronger than before, right? Because if the code isn't working, they need to first determine what the code is doing to be able to fix it. And then I guess one more I'll mention is problem decomposition. And this is a big one. I think this is gonna come up a couple times probably in our talk today, but LLMs struggle when you give them tasks that are too large and students need to know how to break problems down into small components so that, that, LLM can solve each one and, you know, have a good chance of getting it right. [00:04:56] Leo: Yeah, I, I think, um, kind of to, to piggyback off of that, you, you may be hearing these skills and saying, oh, these are absolutely essential skills. Every software engineer should know, uh, these are being taught right now. Right? Um, and the answer is not really, like these aren't core topics in a lot of introductory CS classes because so much time is spent on syntax. And so fairly early on when we kind of realized these skills would be so essential, Uh, we got really excited because these are skills we want to teach in our classes, and the LLMs are now giving us the ability to do that more. [00:05:27] Dan: Mm-hmm. [00:05:28] Jeremy: I think that's interesting about the syntax comment because you were saying how reading is gonna be more important than ever because you have LLM generating the code. Um, and you need to understand that code that's being generated and understand that it does what it, uh, you think it does. And so I wonder if when you say you spend less time on syntax, is it because you feel like they're gonna generate this code and they're sort of organically gonna pick up syntax that way versus having to focus on it at the start? I'm just trying to picture what you see changing there. [00:06:05] Dan: Yeah, Jeremy. So, uh, I, I was, I guess speaking specifically about syntax errors, which don't generally happen when you're using LLMs, and I also agree with you, you need to know what the code is doing, but, um, you can do that without worrying about each specific piece of syntax. Like, um, you're gonna need to know what the keywords do for sure, but, missing, you know, brackets and colons and, uh, oh, there needs to be like a blank line here. indentation, uh, a lot of this kind of thing. Is done for the most part, correctly by the LLMs. So yeah, I agree with you. You need to be able to identify the structures. So in our, in our book actually, Leo and I have, um, a couple of chapters on reading code and, I don't think we ever break breakdown, a line of code into its individual tokens. We do talk about the main structures, like ifs and loops and functions and all that. but compared to other books, I, I think or other, uh, other ways of teaching where you would focus on the micro level, we try to focus on the line level now, cuz we want our students to be able to grasp what each line is doing, I guess more than each token. [00:07:27] Leo: Yeah, maybe to, to add to that a bit, it's almost, uh, if you think about the advent of block-based languages, it was to make sure that the, essentially the, the author can't make syntax mistakes, right? Is the whole purpose of kind of block-based languages. And they're, they're huge for introductory programming, especially in like K through 12. in a sense, LLMs do this because they'd never give you back wrong syntax, or they almost, almost never give you back wrong syntax. And so it takes away that kind of cognitive burden of making sure you handle the, the token level. as uh Dan was saying LLM generated code needs test cases to catch logical errors [00:08:00] Jeremy: I, I'm curious, so you said the syntax is correct, but what are the, the typical mistakes you see coming back from these LLMs? Is it a, a logical mistake or is it ever something that. Actually doesn't compile. I'm, I'm kind of curious what your experience has been. [00:08:19] Leo: I think the, uh, more common errors that we've been seeing are logical. So it misinterprets the prompt that you're giving it. It essentially tries to solve a problem that's different than what you're trying to solve. It may have bugs in it, so it is in fact trying to solve the right problem, but it, it's off by one, um, is maybe replicating some mistake that it found in, in the large code base. And so most mistakes are gonna be you need to write test cases, run it. That mistake is then gonna show up when the test cases catch it, and then you'll have to try to fix it. if the students can read the code, uh, if we train them well to read the code, often you'll look at the response. And if the response is just not even trying to solve the right problem, you can usually pick that up pretty quick. Uh, and I think, I think the students will be learn to do that and then they can just say, okay, this is clearly not the right answer. And, and use the different tools in say vscode to find another answer, and then pick one that's right or change their prompt to get a response that's right. Go through that whole flow. But then some point or other it will give an answer that looks right. And then I think all of us as software engineers know that even the code looks right, it may not be. And so then they have to actually write the test cases, get some level of confidence that's actually working right before they'll know. And so sometimes, sometimes, you know, really quick is that it's just clearly wrong at solving the wrong problem. And sometimes it looks right, but it actually has some bugs that need to be fixed. [00:09:49] Dan: I guess one thing that struck me is how much a change in the prompt can, can matter. Uh, Leo, you know, um, we've, we've seen this over and over again where we'll write a prompt. It seems fine to us. And then we'll realize, oh, there are actually two different ways of interpreting this. and, uh, the ambiguity of, of English strikes again, right? And so it's just amazing to me how clarifying the prompts, how many times that fixes the code. Not always. We've definitely have examples where that's not the case, but, um, more, more often than not, in my experience, changing the prompt, uh, appropriately has a bigger than, than, um, anticipated effect on the, on the code. It's amazing. [00:10:36] Leo: And for thinking of the prompt, uh, in terms of like doc strings for functions, uh, adding the test cases certainly help. Um, sometimes it is, surprising sometimes that you can add the test cases to the prompt and it'll still give you back code that does not actually pass that test case because it, vscode and copilot doesn't actually run the code that comes back from the LLM. Uh, but I do find the test cases do tend to help with the quality response you get back. [00:11:01] Jeremy: As a part of your prompt, you're asking it to implement some functionality, and you're also asking it to write these tests for that same functionality? [00:11:11] Leo: Oh no, sorry. I, I, it's more the, um, doc test kind of format. So it, it, um, you're writing, let's say you, you've written your function signature and then you have the description of the function in a doc string. And then at towards the end of the doc string, I'm articulating the test cases that I intend to use. Um, and the articulating the test cases that I intend to use helps it come with a better prompt. Um, I haven't found it to be great at writing test cases. I haven't spent a ton of time with this, but the time that I have spent, it tends to want to do almost like a brute force search of all possible inputs, uh, as opposed to doing, okay, well here's a couple common. Here are the edge cases. Now I can feel fairly good about it. It doesn't seem to have that, um, intuition yet. [00:11:55] Jeremy: [00:11:55] Leo: For the most part, we're writing the test cases our ourselves, and we're gonna be teaching the students how to write the test cases themselves [00:12:01] Dan: Yeah, Yeah. So Leo and I have actually made a conscious decision to have students write test cases from scratch. Even though you could play around with the LLM and have it, you know, try to generate test cases, whether it's flawed or not, we still want students to do this from scratch. We think that writing test cases is a skill we want our students to have. [00:12:23] Jeremy: Sometimes what these models will generate, like you were saying, has logical errors. And hopefully if you're writing the test cases, you've put some thought into 'em, and your test cases are actually checking the correct behavior. So then you have the LLM generate the implementation. It's running against tests where you know what the correct answer should be. And so if it generates something that's incorrect, you've, you've kind of caught it. You're not totally relying on it. Telling you everything is, is good, you know? Um, It's confidence in something that's like you personally can't see. It's just what the machine gave you. [00:13:05] Dan: Maybe it takes away one layer of uncertainty too, Jeremy, right? Like, so the code could be wrong, right? And then if it generates test cases, okay, the test cases could be wrong too. And maybe you get unlucky and two wrongs make a right and then your test cases pass for the wrong reason. So yeah, we really wanna hone this skill in our students. And, and like Leo said earlier, these intro courses used to be so full of low level syntax concerns that we, we didn't do testing properly. I mean, you know, we all try to cover testing, but I think we're gonna be able to cover it a lot more, detailed now. LLMs could encourage students to test more since their output is untrusted [00:13:41] Leo: And I, I think we're enthusiastic about, uh, how students will approach testing when you're working with the LLM is what we. This is fairly anecdotal, but uh, when they interact with us talking about testing, often students aren't testing their code because they wrote it. And so of course it's Right. Right. This is like this really famous, uh, kind of bug in human thinking, right? Is that if you write it, of course the computer's gonna interpret what you're saying, right? Um, and so students tend to trust their code in a way that professional software engineers never would. and I think because it's coming from this third party that you know is wrong, it's coming from the LLM that can, that can often make mistakes. I think they're gonna be more inclined to actually engage in those testing practices. Uh, kind of knowing about the fallibility of the LLM, [00:14:27] Jeremy: You're shifting the order. I mean, there is test driven development that some people practice, but I feel like probably what's most common is you write the implementation yourself and then, then you'll go and see like, oh, did this thing I, I wrote. Did it do what I thought it should do? Um, whereas this is kind of flipping it, where it's the large language model is gonna write my code, so I'm just gonna start with the test and then I'll ask it to, to write me the code. And maybe that will kind of make test driven development be the default. [00:15:02] Leo: So yeah, I, I, I think that students may wanna engage more in kind of test driven development because they wanna think more about, uh, what exactly should this function be doing? Uh, how should behave, what kind of inputs and output should it expect? And then it can kind of write the prompt to co-pilot or whatever LLM is using, uh, to express those inputs and outputs. Well, they're more apt to get good answer from the LLM and they've kind already got their test cases worked out as well, so they can immediately just go right into the testing agency if the prompt came back right. Using LLMs at the function level instead of a broader scope [00:15:35] Jeremy: And you mentioned writing a prompt to implement a specific function. Have you found that they work well at the function level? But if you try to ask it to build something more broad, that that's kind of when it has problems? [00:15:53] Dan: So, I think in general, LLMs do work best at the function level. We have tried to get it to generate bigger apps, collections of functions, and it can work, but sometimes it does, uh, it does do worse. But also we want students to do the problem decomposition for themselves and break up the problem into individual functions. Even though maybe the LLM could work, uh, with, uh, bigger chunks of code, we want students to do it. And one reason is so that they can customize what they get from the LLM. So, in the book, we have a bunch of examples where you could probably just throw it at the LLM and get an answer and, you know, eventually get it to work. But I think at that point, making changes to it might be trickier than it would be if you knew, uh, the architecture of what you were, what you were building. So in the book, we have a bunch of top-down design diagrams, and we want students to understand what they're building at that level, like at the function level instead of, like we said earlier, instead of like at the token level or the line level. Potential issues with outsourcing high level design to an LLM [00:17:03] Jeremy: And so like in this example, you're thinking more from a, a learning perspective. You want the student to look at the big picture, figure out, okay, what are all the different functions or parts of my application? Break that down and then feed those individually. To, um, these large language models. I, I'm wondering from like, let's say you're a, a professional software engineer and your interest is more in I want to make the thing and less so, in I want to learn how to make the thing. in that case, do you feel like you could feel confident in, in giving the large language model a larger piece of the design, or do you still feel like it's good to have that overall structure done by the, the developer and then just be very targeted about how you use the large language model? [00:18:03] Leo: I think that's a tricky question because we haven't worked with these tools heavily in a professional programming setting. I think often when we're thinking about large design of software, you're gonna be working on teams, talking with other members of the team about the interfaces and things like that. And so I'd be pretty hesitant to to outsource that, that thinking to the, the l lm cuz you, the communication between the teams still has to happen. Uh, even if it weren't for that. Um, I kinda think of it as a probabilities. So essentially whenever you ask co copilot or any of these LMS to, to do a task, the more it has to right, get the kind of more likely it's gonna make a mistake. Um, and so, uh, that's kind of why I like the functional level. It seems like I. Partially because it's not that much code that tends to write. Um, so you help to avoid kinda the probabilistic problem, but also because it's learned on a huge code base that has lots and lots of functions that have been implemented. It tends to do well at that, that solving the function kind of task. [00:19:10] Jeremy: Yeah. And I, I think the way you put it as outsourcing that designer, that decision is, is interesting because yeah, if you are working on a team and whether it's in code review or just in a discussion, often people will ask, well, well, why did you do it this way? Or Why, why is this the, you know, the good way to design it? And if you kind of handed that off to an l l m, maybe your answer is, I don't know. It's just what it it told me, which (laughs) [00:19:39] Dan: Yeah. [00:19:42] Leo: That isn't an answer I want to u use talking to my boss. Right. Well the chat GPT told me I should have it this way. That doesn't seem like a good answer. Choosing GitHub Copilot for CS1 [00:19:50] Jeremy: I think we, we've kind of been talking in more a general sense of working with LLMs and you've mentioned how you're gonna be teaching introductory computer science courses this coming, quarter or semester. And so when you teach these classes, what tools are you gonna recommend your students use? And yeah, maybe you could go into that a bit. [00:20:13] Leo: Absolutely. So we're gonna be recommending, um, At least, at least for my class, I'm gonna be recommending that they use, uh, vs code with copilot. Um, I just like the integration of the IDE with the, uh, interactions with the LLM uh, I think it avoids just a whole bunch of copy pasting from another interface into your IDE to then, uh, run it. I think it also reduces the barrier of them kinda immediately getting the code and then testing it right there in the environment. I'm sure any of the other tools would work, it's just, that seems to have worked well for us, uh, when we were writing the book. And that's, that's actually the technique we recommend in the book as well. Um, so that would be the primary tool for the students writing the code. In addition to having them using copilot with, uh, in the IDE for a lot of the code generation, depending on where things are at with copilot x, um, which is right now, um, available through wait list. Uh, if that's, that's available publicly, I think we're gonna be recommending that because it has a copilot chat feature, uh, which can be really nice to interact with. And, uh, the main use that, that we're gonna be encouraging students to use, whether it be co-pilot chat or a ChatGPT is in just a conversation with the LLM about, particularly modules and libraries. So if you are diving into, merging PDFs, which, uh, Dan did a great job in one of the chapters in our book talking about, if you wanna dive into that, well, what libraries should we be using in Python for that. Uh, and we found that the LLMs do a really good job at this, of actually saying, here are the different libraries you could use. Here are the pros and cons of them. These are the ones that, uh, need to be actually have additional install done. Or these ones that come in with, vanilla Python. they're actually really good at kind of giving you the what you should use for the various libraries. Um, and so that's, that's one other way that we were gonna be encouraging the students to use the LLM. Types of questions to ask the LLM [00:22:07] Dan: Yeah. So whenever the students or the junior programmer, doesn't know how or doesn't think they can, uh, do something in base Python, we have them interact with the chat and, and ask. So another example that comes to mind from the book is we have a chapter writing some games. And so for most games, including the two that, uh, we've got in the book, you need to be able to generate random numbers, right? So how do you do that? And so in the past you would've used a search engine stack overflow or something, and you would've found, some sample code and you would've pasted it in to your file and made variable name changes and things like that. And so what we do now is we ask chat, okay, I need to generate some random numbers. How do I do it? And then it will come back to you with a few options, and then you can systematically work through those options if you like. Uh, and you can ask, okay, is this one built into Python or not? And then it will tell you, oh, this one's not. We don't need to memorize API docs [00:23:11] Dan: And you say, oh, well, okay, so like, how do I install this? And then no, does it work on all OSS or just Windows? Right? So, uh, we guide the reader through these questions that you could have, uh, to help you make a decision. Um, and I think what I like the most about this is not having to learn. APIs, like yet another api. Like I don't, I don't think I have room, you know, in my, like, brain for any more APIs. And, and what's cool is I, I've forgotten like every API that, uh, we've used in the book. So we have like examples of emerging PDFs and, uh, removing duplicate images from directories, uh, from like people's phones, and, and stuff like that. And I don't know, I don't know which library it's using. Uh, and I'm, I'm totally okay with that, right? Like I just, I, I wanted to get the job done. I wanted to write a tool, and the tool got written and it used some sort of library and it worked great. And I didn't have to look through the documentation for that library and figure out like, which functions do I have to call and things like that. So, I, I know it, it can be fun, you know, it could be fun to really learn an API well, but a lot of people, they don't want to program for programming sake. Like, they just wanna get work done, right? So, you know, while I, I, I fully admit to, enjoying programming just for the sake of programming. I do a lot of competitive programming problems just for fun. You know, it's like Sunday morning and it's like, Hey, yeah, I got like an hour and I got an hour to work on something. Let me work on this little competitive programming problem. But, uh, a lot of people, they're not motivated by that. They're motivated by consequences of code. And this is one thing about LLMs that I'm very excited about, is you can just, make a lot more progress, without having to learn what these, people may believe is just useless knowledge, right? Like, does it really matter how I should invoke this api Right, to merge PDF files? I mean, the answer for many people is no. Like, they just want the result to happen. And I love how we can kinda match what they, uh, deem important, right? With the LLMs, it's like a new level of abstraction, for for many people. LLMs make building software possible for more people [00:25:28] Leo: There's a couple of audiences that come to our introductory classes, and what Dan's talking about here is one of the things I'm most excited about with this, and that's the students who come and take just one. Programming class. I know it's probably a different audience than, uh, a lot of the people listening right now. Um, but the people who just take one programming class, it's required for, for their major. They, I just wanted to explore it a little bit, but they, they don't go into this as a, as a career. I think a lot of those students right now, uh, if you ask them a year later to program something, do any of these tasks that we're talking about right now, I doubt they're able to, even if they did really well in that class. Uh, and that's really disappointing, right? If they've taken a programming class, they should be able to, to do something with that, a year or even five years later. And I really believe that if you teach them the skills of interacting with these LLMs, they'll be able to do these tasks later. They'll be able to come back and go, you know, I don't remember any of the Python syntax. I don't remember, uh, even how to get started with this. But you know what, I'm just gonna ask, uh, copilot, how do, how do I go about merging these PDFs, having this directory? And then, uh, the copilot chat comes back and says, oh, you might use this and that. And then they go, oh, I remember, I remember how to, how to write these functions. And I just said, you have to go over a prompt. I think they could really do it. And that, that's a bit of a game changer, right? That means a larger portion of our society will be able to, uh, write code and using a useful way. And I'm just really excited about that. I think it's gonna be really nice, uh, after the changes happen. More people might stick with Computer Science [00:26:58] Jeremy: I can totally see in the context of someone who's, not seeing it as a career, or someone who is like, hasn't done it in a while. It could be. These tools can be incredibly useful, right? Or it can even get you interested in this field at all, right? Like a lot of people, they, they struggle through the syntax and then they decide like, oh, this is not for me. Even though like they had something really cool they wanted to build and, and maybe these kind of tools can, can get them over that hump. [00:27:31] Leo: Exactly. I think there's a population of students, um, and it varies a bit by demographics, who come to computer science, with really the best motives in mind, right? They wanna make their goals in their life are to make the world a better place, and they want to achieve those goals. And if you spend the first three quarters or three semesters working with them and all they're seeing is syntax and they're not actually solving anything meaningful, um, it starts to create this disconnect of what their goals are for their life and what they think the goals of are, are career are. Of course as, as, as a computer science, I wanna say, stick it out. You know, if you, if you go into the fourth, fifth class, you'll start seeing how these are really useful tools that can make society a better place. But it'd be really nice to front load that and have them solving useful problems much earlier and seeing that, uh, computer science, uh, can be used in really nice ways. Efficency can be taught later [00:28:26] Jeremy: And, and so within the, the context of. People who are studying computer science will eventually, who may become professional software developers, things like that. Something more long term where it becomes more of a craft, the, the code that comes back from these large language models. Sometimes it could be something that's like not maybe the most easy to read or it may be doing something inefficiently. And I'm wondering from your perspective how users of these tools should, should think about that and, and recognize when that's a problem. [00:29:06] Dan: We in, in, in the first couple of courses, typically in the CS program, um, we don't spend much time on efficiency. the reason is that there's just so much to learn early on, and, um, we worry about overwhelming people with, know, too much, for them to, to process it at once. And we don't wanna prevent students from becoming interested, by. Giving them all of these requirements early on. So typically we, you know, we push efficiency, down the, down the road into like a data structures course, for example. But your question points to another reason why, we've decided to teach some of the skills we teach early on. So if, if a student, you know, came up to Leo or, or me and said, Hey, you know, like I wanna generate efficient code, how do I do it? My answer would, would be, so like, get, get familiar with programming first, but you are learning the skills necessary where you'll be able to look at code later because you know how to read it still, right? It's not, uh, something that you don't understand. You're gonna, you're gonna know it. We're gonna spend lots of time on code reading, and so later I think we can just teach efficiency the way we always did. Um, so, you know, doing, uh, time complexity analysis on, on the code and they're still gonna understand what the code is doing. So, um, I, I, I don't think this is going to, this is going to change much in, in the earliest courses. LLMs can expose students to different types of code [00:30:35] Leo: To the, to the point about code readability, I might add that, uh, certainly they're gonna get back some, some code that's maybe not the best style and it may not be as readable. Uh, but what's kinda interesting is that students aren't exposed to a lot of different styles kind of in our existing courses, right? They, they see the code that they write and they see the code that the professor writes and gives them, and there's not much else. And so, I mean, we're gonna need data and we're gonna need research to, to, to know this for sure, but it, it, I suspect them seeing lots of different code styles and having to read those different code styles may actually inform them better than we do now about what makes code more readable. Uh, and then they might be able to employ that as they go forward. [00:31:21] Jeremy: And, and when you're saying they're gonna read different styles and things like that, are you referring to code they're gonna see from the LLM or are you talking about them reading just other code bases in their classes or their professional work? [00:31:39] Leo: Oh, I'm sorry. Yeah, I was referring to the code. They'll see from the LLM Right [00:31:43] Jeremy: Oh I see [00:31:43] Leo: LLM will come back in all these different ways. They'll have different styles and they'll, uh, have different approaches to solving it. Right? Sometimes they'll, uh, come back with like this one line Lambda expression thing that solves it, and they'll have no idea how that works. And they'll, they'll ask for a different answer and they'll get, uh, a much more, uh, user-friendly first, uh, first programing experience kind of code back. And they'll be able to understand that and go, okay, this is the kind of code that I wanna see. Not this thing that was completely non-readable. [00:32:11] Dan: Yeah, Leo, I just thought of something. So, uh, so you know, by default you can get it to give you 10, uh, code segments to solve the problem, right? So it'd be kind of cool, if we ask students about each of them, right? Each of the 10, which ones are right, which ones have bugs, which ones have good style, which ones have bad style, it's like a built-in learning opportunity right there. So yeah. [00:32:34] Leo: Oh, it's true. Yeah. And, and so the 10 things that, uh, Dan I was referring to is if you do control, enter in vs code when you're working with a copilot, it'll give you back 10. Possible responses. And you're totally right Dan. You could just say of these 10, how readable are they? Are they right? Um, there's lots of fun things you can do to ask students questions. [00:32:51] Dan: and often many of them are right with just subtly different ways of, of, of, of solving the problem. I mean, I'll, I'll admit to having some fun looking through all of the suggestions just to kind of see what the variability is and when there's a lot of variability. I really like it because, uh, like Leo said, it exposes people to different styles they may not have seen before. And, um, may it may, it may, um, encourage you to ask questions, right? Like, why does this one work? Right? I've tested it. It doesn't look like it should work. Why does it work? I feel like that's the beginning of a pr pretty powerful learning experience right there. [00:33:30] Jeremy: Yeah, that makes sense to me because I, I think about how when a lot of people are doing software development before all these LLMs, they will search on the internet and go, okay, what's an existing answer for this thing I'm trying to do? They'll find a post on Stack Overflow and they'll find the accepted answer and it'll be like, okay, this is it. This is the solution. Whereas, at least in this case, it seems like you can go like, okay, well here's, here's 10, 10 potential solutions, and at least you get a little bit more exposure to, um, what are the different ways you could do it. [00:34:06] Leo: Exactly, and, and it's nice for 'em to see these different options. And I think there is, for professional software engineers seeing that stack overflow post, like, here's the accepted answer, integrating that into your code isn't a big jump for, for a lot of us. Um, but I do wanna stress that for the intro students, it often is a really big jump. Uh, just the, oh, how do I change around this? Oh, this was the interface for this function, but I'm been asked to have this other interface with a function and, and they really can struggle in that domain. And so I think copilot and these LLMs are nice in that they give back answers that are more tailored to the existing code that they're working with, um, and will reduce that barrier of them trying to incorporate the answer. Optimization can come later, most code is straightforward [00:34:50] Jeremy: So it seems kind of overall, when you're talking about people who are using programming in a more professional capacity, the code style and efficiency that will probably be taught very similarly to however it is now, where you basically have to get exposed to different styles and types of code, get exposed to the algorithms and and that will allow you to read the answers you get back better. So the answers you get back from the LLM with the knowledge you gain from these later courses, you'll be able to tell like, oh, okay, this is, this. Level of complexity, or this has like, you know, exponential, performance implications, that kind of thing. [00:35:43] Leo: So I think the performance piece is really important. Um, and I appreciate your, you bringing it up. I think, I'm, I'm kind of curious, uh, uh, what percentage of the time professional programmers are really spent, uh, are spending optimizing, uh, the code that they write? Um, I suspect a lot of the code that's written, uh, is pretty straightforward. Uh, you, you already know how to work with the database you're working with. You already know how to write the queries for that. You're, you're, you're just, uh, you're still doing something that, that's certainly thought provoking, but it's not the hard work of, oh, how am I gonna write design the right algorithm for this to get the exact best runtime? And so I think there are some times that that does matter, but those may be the times that the LLMs aren't as helpful and there's still gonna be a, a pretty big need for programmers who know how to do that, uh, themselves. [00:36:33] Jeremy: Yeah. I mean, I, I think that of course this is gonna vary from industry to industry, but Dan, you were talking about learning APIs and I feel like a lot of jobs are learning APIs and gluing them together. [00:36:49] Dan: Yeah. Um, I would agree, but I wonder what can happen if some of that's automated. Right? So maybe, people who are gluing APIs together will be able to. Get even more done, right? Incorporate even more, APIs in the same amount of time that they've been doing it. Now, I don't, I don't know if that job changes as dramatically as it, it seems, um, I guess there's this tension between people, having to change jobs or become more efficient in the current job. And, you know, obviously I, I hope it's the latter and there is some recent evidence that it could end up being, the latter, just more productive people overall, building, know, bigger software in incorporating more APIs than, than before and, and not overloading yourself. So, we'll, we'll see, you know, how it, how it all, um, how it all turns out. But I'm, I'm hopeful that we'll just be doing our jobs better. Reading code as a skill [00:37:51] Jeremy: In that, that context, sometimes people will say that the, the reading of code and comprehending code can sometimes be more difficult than writing the, the code. And in fact, can sometimes take you more time, like, let's say you've built out a project and now you need to add new features. Well, to add the feature, you have to understand the, the code base that existed before and so. When we talk about LLMs and the context of not programming, but just general writing, people talk about the fact that it's easy to generate more writing, right? We can generate more documents, blog posts, more articles, that sort of thing. And with code, it sounds like it'll be similar, right? Where it'll be easier for us to write more code, generate more code. Um, but I wonder if either of you have thought or, or think it's a concern that we'll be generating so much code that now we'll have so much we won't be able to even have the time to understand all of it, [00:38:55] Leo: I haven't thought much about the generating so much code that you can't understand. I mean, I think if, if we're generating code, I, I'm really hoping someone's testing and making sure it works right and stuff. And so I guess it depends on what kind of, uh, what level of the interface are we, we looking at. Um, but I have thought about a fair bit about the, the, what you described early on in your question, which was. Diving into a big code base, figuring out what needs to be changed and changing it, that is a really common task, especially for like new software engineers, uh, in their, their first jobs. Right. And it is also one that's really well documented in the, the education literature, uh, education literature, uh, that we aren't teaching them to do. Like we almost always are giving them, uh, right, these functions are really well defined or, uh, write the code all from yourself, but we rarely ever give them large code bases to learn from. Now I don't think diving into a large code base and trying to understand how it works is the right thing for like an intro class. And then we're mainly talking about, uh, students first learning your program here. Uh, but I am encouraged that we are teaching code reading as kind of a first level skill when I think current programming courses teach code reading right? In parallel with writing. So a lot of the writing's happening very early before they even know how to read well. Um, and so I think there's some optimism here that if we teach code reading first and make it a core skill, they'll be better set up in the later classes to maybe take on those large projects where they tackle the exact problem you're describing, which is also the exact thing they're gonna have to do when they get to, to their jobs. The amount of code we throw away may increase exponentionally [00:40:37] Jeremy: Yeah, it, it also kind of, I wonder sometimes when you're writing code, you'll write it in a certain way because it's tedious to write a lot of code, right? Like you'll, you'll make something generic in such a way where you can reuse it, and maybe reduce the amount of lines of code. But then when you have something, generate that code, maybe it'll be a solution that. Is a lot more code than you would've written personally, and it works. But, by nature, the fact that it was easy to generate, you chose that solution versus one that, that maybe was more generic and um, had less code. I, I'm not sure if that makes sense, but I'm kind of curious if the use of these models will sort of change maybe how we write code [00:41:30] Dan: I'm kind of wondering if the amount of code we throw away is going to increase exponentially. Because, because, um, you spend time working on something, you're probably gonna keep it. But I, I wonder because, uh, Jeremy, like what you said, it's, it's so easy to generate code now. so I, I've had this thought where, what, not sure how, how, um, how much I believe myself here, but, uh, should we be storing the, the prompt, like not the dot py file, right? Like just store the prompt and then if you do have to regenerate the code later, maybe you gotta make some tweaks or something. You just change the prompt and then, and then rerun it. So, because, because, because code is, um, It's not there yet, but it's, it's becoming free, right? It's becoming, you can generate as much of it as you want. And so I, I wonder how much, how much of it is, so there's, there's a lot of code already that you write once, and you run it once and then, and then you get rid of it or lose it or whatever. And I wonder if that, that practice will increase. So it's like, okay, you know, I wanna do this data analysis. Okay. So you write a prompt, you get some code, you generate some graph, and then you just don't even think about it. You just get rid of it, and then maybe later you want another similar analysis and you just do it again. Right. So I kind of wonder, because there's maybe less ownership now of code, right? You didn't like sweat as much to write the code. So maybe, maybe more of it gets thrown away. [00:43:03] Leo: I, I completely see what you're saying, Dan. So you have the prompt and you had it perform some form of data analysis and you wanna tweak it to do a slightly different data analysis. Uh, I wouldn't go into the, I mean, right now if I wrote the code from scratch, I would go into the code and find that one spot that I need to change and I would tweak it. But if I'm just generating the code, I would just tweak the prompt and then get a new piece of code that does exactly what I want there without having to, to [00:43:26] Dan: yeah. You know, how, how, it can take a, a long time to re-familiarize yourself with a program that you wrote six months ago. You know, it's like, oh, I, I called this variable temp one. Like, what's this for again? Right. you know, maybe, yeah, [00:43:41] Leo: Wait, I think we've all been there. Keeping the prompt instead of the code [00:43:43] Dan: Uh, but yeah, I don't know. It's just, just a thought I've been having. It's like, it, so, so when, when, now when, when I hear people talking about code maintenance, for example, like using, you know, good variable names and consistent style and stuff, in my head I'm thinking, well, you know, is, is the code the artifact now? Is it still the artifact? And right now, you know, of course it is. But, um, but, you know, fast forward a little while, maybe, maybe some of what I just said, uh, sort of becomes true eventually. [00:44:11] Leo: That's getting to perhaps kind a larger issue about what is the interface that we're, we work with as programmers. I've been thinking about this a lot, uh, just because I, I teach my, my background's. I have a PhD in computer architecture, and so I teach the classes that do machine code and assembly code, and they're, they're, they're core classes for computer scientists because you need to know how computers work. And, um, I think that's a core component, understanding that, But we don't start by teaching the students machine code. Like no one wants to learn how to program a machine. Um, at least I can't imagine anyone wanting to learn that. Um, and we've kind of cognitively picked Python or Java right now, the most common two programming language to learn from. Because they're easy to learn, they're easy to, to read. The code tends to be more understandable when you read it. It tends to be a little bit more forgiving when you write it. Um, and so we picked these because we think they're nice interfaces. They're, they're convenient for programmers and they're convenient for, for new learners. And it just seems to make sense that the LLM may be that next step of interface that we start choosing. The, the catch is because it can be wrong. It's not like a compiler. A compiler is deterministic. It's gonna be, uh, shy of that. Maybe one time in your career you find a compiler bug, like the compiler's always right. This time the LLM isn't always right and so I, I'm not sure how this is all gonna play out. Um, you can imagine the LLM as the new interface and all we ever store is, is code prompts and we don't ever even see the code, perhaps as one scenario. And the other is we, we do in fact still interact with the LLMs and still interact after the code. Um, but I think it's too early to kind of know where this is all gonna fall. But, um, we could see some big shifts, I think, in the field over the next few years. [00:45:52] Jeremy: Yeah, I think that's pretty interesting to think about what, what Dan had mentioned where yeah, you could check in your prompt and maybe a set of test cases for the app that's supposed to come out and yeah, maybe that's your alternative to the actual source code. Um, especially for things that, like you were saying, are, are used not that frequently or maybe you only use it once and so the, um, the quality of the actual code is. Maybe less so important in terms of readability and things like that. And as long as you can reliably reproduce that thing, yeah, maybe, maybe that does make sense. [00:46:39] Leo: The reliable reproduction could be the tricky part. And you there may be even saying that you, you start doing where you tag don't, don't try to reproduce this. Like, we actually spend a whole bunch of time on this. It's super optimized. Like, don't think the LLMs gonna give you this answer again. So, uh, keep the code along with the prompt. Keep the code too. Don't, don't scratch that because the LLMs not gonna do better. Um, and then in some cases you're like, yeah, the LLM's gonna do a pretty good job on this and [00:47:07] Dan: Yeah. Leo, maybe we have to Maybe we have to distinguish between code that you can just get out of an LLM no problem. And code that people have spent time working on. I like that. Yeah. Yeah, [00:47:21] Leo: some you're like, hashtag don't change. [00:47:23] Dan: Humans were here. [00:47:25] Leo: exactly. The concerns about relying on commercial tools [00:47:27] Jeremy: Yeah. this is the 30th iteration of this code we generated and we verified that this one's good. So just, just, it's a interesting, interesting future. We, we might be heading into, so, so one thing you, you mentioned a little bit earlier is that the tools that you're gonna recommend to your students, it sounds like it's primarily going to be GitHub copilot and GitHub copilot X for the, the chat interface. And one thing about these tools is these are tools by commercial companies, right? These are tools by OpenAI and Microsoft. They're tools that you have to pay a subscription fee to use. You have to send your code to a commercial server. And I wonder if that aspect concerns you at all. The, the fact that the foundations that our students are learning on is kind of reliant on these companies and these cloud services. [00:48:31] Leo: I think it's an amazing question. Uh, I think to some degree these are the tools that professional software engineers are using, and so we need, there's, there's a bit of an obligation as instructors to teach them the tools that they're gonna be using as professionals going forward. I think right now they're free. Uh, to use for, for education's sake. and so as long as that stays the case, I'm a little, more comfortable with it. If it started to move to a pay model for education, I think there could be some really big problems with equity. and I think it's not just true for, for computer science, but I'll start with computer science. I mean, if it's computer science and we start making it where you would have to pay to get access to these models or use these models, then whether we tell the students they can use it or not, they still can use them. And so there's gonna be some students that, the wealthier students who may have access to these, who are being able to learn better from these, being able to solve better homeworks with these, that's super scary. And you could imagine the same thing for even just K through 12 education, right? If you're thinking about them writing essays for homeworks or anything else, if it's a pay model, then the students who have, uh, the money will pay for it and get access to these tools. And the students who don't, won't. You could imagine the, all these kind of socioeconomic, uh, divides that already exist, only being exacerbated by these tools if they switch to this pay model. Um, so that has me very worried. Um, and there's some real ethical issues we have to think about when we're, we're using them. Yeah. Um, the other ethical issue I kinda wanna mention is just the, the copyright and the notion of ownership. Um, and I think it's important for us as instructors to engage students in the conversation about what it means to create content and intellectual property and how these models are built and what they're building off of. Um, and just engage in that ethical conversation with the students. I don't think we as a society have figured this out. I don't, I think there's gonna be some time both legally and ethically before we have the right answers. but at the very least, you need to talk to the students about, uh, these challenges so they know what's going on and they can engage in the debate. [00:50:45] Dan: Yeah, just to underscore that, Leo, this is the reason we're doing research on the first version of the course that Leo's teaching. We need research on the impact of LLMs, on students. especially, we need to know if students benefit from this, in what ways they benefit. How are these benefits distributed across demographic groups? We have a long and sad history in, computer science of inequities, in who takes our courses, who succeeds in our courses. we're very aware of this and it's, uh, unacceptable to make that situation, uh, worse than it already is. So, um, we're, we're gonna be carefully doing our research on this, uh, first offering of the course. A downside is students might bypass fundamentals [00:51:30] Jeremy: So we've mostly been talking about the benefits of using these tools in classes and in education. we just mentioned the possible inequities if you don't have access to those things, I, I wonder if from either of you, if there are negatives you see to this technology, whether that's the impact on what people learn or in anything else. Like are there downsides you see to the use of this technology? [00:52:04] Dan: Yeah. So in addition to, uh, the important, uh, inequity concerns that, uh, we just talked about, I have a concern about students using the tools in ways that. Don't help them learn the skills we think they need. So it's a, it's a, it's a power tool and you can, uh, you can get pretty far, I think with, without, um, being systematic in, in how you work with it and without testing, without debugging, um, it's, you know, it's, it's kind of magic right now. And so I can imagine, a lot of students just taking off at, you know, a hundred miles an hour. and so I'm one, one of, one of, uh, the things we have to worry about in these initial courses is, convincing students that there really are principles to using this technology. You can't just type something and get an answer and then go party. and, and, and so that, that is one of my concerns. That's one of the negatives. It's super powerful. And, like, like, so before you, you can't just type some Python and make it work and, but now you can sort of type in whatever you want and kind of get something back. and so part of our job as educators is to help students use these tools, in in a way that. Will ensure their long-term success with, with these tools, right? So, I, I'm not saying that they can't just do whatever they want and, and make some of their first assignments work. I, I think they could, I think they could be like un principled with the prompts and just throw it in there and get code and, you know, submit that, submit that code. But, uh, we're, we're going, you know, we're going for longer term, uh, effectiveness here, right? We have students who may not take another CS course. We need to keep them in mind. We have students who are gonna wanna eventually be software engineers, uh, security experts, PhDs in computer science, right? So we have a number of audiences that we're talking about, and we think they all need to know the fundamental skills of programming still. Even though, you know, they have this, this power tool at their expense now. [00:54:07] Leo: Speaking of the fundamental skills for programming, I, because of my, my hardware background, I'm this huge fan of teaching mental models in classes. Like what is the mental model of computation? Like, how, how do you imagine the computer is executing as you write the code? And, uh, ideally a professional computer scientist should be able to take, okay, well this is kind of the, my interpretation, this is my mental model for when I'm working at Python. If I really, really wanna drill this down, I can turn that into assembly. And if I really had to and turn to machine and even think about how this is working within the cash subsystems and virtual memory and all these things, I want 'em to be able to play those things out. We are changing the first class, and I think the first class is gonna be doing some things much better than before, like teaching problem decomposition and things like that. I'll, I'll mention that in a second, but, we are doing some things better. but we may not be teaching at how is the computer working as well. And so you can't just change one course and think the rest of the curriculum's gonna work. And so I think the entire curriculum's gonna need to adjust some, um, in, in a way of just adapting to these LLMs. Rethinking how to assess students [00:55:10] Leo: Um, the second piece for things getting potentially more challenging, uh, is instructors, we're in a good place right now as instructors, uh, in terms of how we assign and grade homework. Um, so grading, uh, this probably isn't gonna be a shock, is not one of our favorite things to do as faculty. I mean, it's actually really important. Uh, it's, it's central to us understanding how our students have learned, but it's generally not the most favorite thing that we do. And what a lot of instructors have done, myself included, is for much the introductory sequence. We have created assignments that can be entirely auto grade. So we define functions incredibly well. Like, write a really good description, this is exactly what it needs to do, and the students write that one piece of code and, uh, whether we like it or not. That is exactly when copilot does very well, and the LLMs do really well. And so the LLMs are gonna solve those very easily already. So we have to fix our assignments just like it, it's a given. Um, but it means that we're probably gonna have to rethink how we do assessment. Um, and so we're probably gonna be writing assignments that are much more open-ended and we're probably gonna have to be grading those, uh, putting more care and time integrating those potentially by hand. Uh, but I think these are all good things for the community and for the field. Um, but you can imagine how it's gonna be a bit of a, a shift for faculty and, and may take some time, uh, to be adopted as a result. [00:56:41] Jeremy: And, and so if you're shifting to homework that is more broad in scope, has more code, needs more human eyes on it, how how does that scale within the educators side? Right. You were, you were talking about how you've got, um, things that could be auto graded before and then now you're letting somebody generate this whole project. How does that work from your end? [00:57:09] Leo: I, I think there's a few things that are at play. Um, we, at, at large institutions like Dan and I are at, we have kind of armies of, uh, instructor assistants, instructional assistants that help us, uh, and so we can engage 'em in, in various tasks. And so, uh, one of the roles they heavily have now is helping students in the labs solve these auto grade assignments. and so you can imagine they will still be in the labs helping the students with these creative assignments, but now they're gonna have to have potentially a larger role in assessing the success of those. Um, there's been some really creative work, uh, in, in assessment and so I'll, I'll, I'll mention a couple of the ones, but there's, I, I'm sure I'm gonna be omitting some. But, uh, one is, Students could complete their project, and then they have to record a short video of them explaining the code that was in their project and how it worked, and you actually assess them on that video and their explanation of the code and how it works. Right? Because those can be perhaps shorter than trying to go through a really big project and, and see how it works. Um, there's a tool out of a UIUC, um, called Prairie Learn that helps with, um, uh, these are still auto graded, but uh, it helps with the, the test setting where you can write questions and have them, uh, graded kinda in a, in a exam or homework setting. the, the neat feature of that is that it can be randomized and so you don't have to worry as much about students kind of leaking information to each other about, test content from quarter to quarter. And so, because the randomization, they have to learn, actually learn the skills, and so you can, um, kinda engage with 'em in these test centers. And so right now a big grading burden on, on faculty is exams. And so you can actually give more exams, give more frequent feedback to the students and with, without the same grading burden. and so that, that's the other kinda exciting assessment piece. [00:59:01] Dan: Current assessment is not effective [00:59:01] Jeremy: In the different types of assessments, like the example of the video you gave, I'm just thinking to myself, well, the person could ask copilot or ChatGPT to give 'em a script, right? And they can rehearse that when they, um, send you a video. [00:59:18] Leo: I think, but I think that's, um, I think this is a philosophical shift in assessment that's kind of been gaining momentum over the years and that's that the assignments are all formative and they should all be. Pretty low stakes and the students should be doing them for the process of learning. and then, and, and it's unfair in some ways. There's a, there's a lot of things right now where you kind of grade them on, were you present at this time? Did you, did you meet this deadline at this time? Which if you're thinking about the, a diverse population of students, like you can imagine like a, a working mother who's also trying to do this, grading them on where you here at this time doesn't feel very equitable to me. And so there's this whole movement for grading for equity that shifts much of the assessment onto the exams. And so, yeah, the students could, uh, find multiple ways to cheat on the homeworks, but that's not the point of the homework and the homework's just to learn. It's a small scale, the grade, so. But you still then have those kinda controlled environments where they're taking these tests and that's where the grade actually comes from. Um, it's gonna take some time to make that shift, at, at, at least at a number of schools, my own included assess that those ho take home assignments are a huge portion of the grade. And students will love that because they can get all this help. And they can, especially with the auto graders, that they don't even write their own test cases. They just use the auto graders, the test cases. Right. Um, which is really depressing. Um, and they go to the, the, the instructional staff. The instructional staff tends to, to give away the answers. That's actually a paper that we, uh, published a few years ago. Um, and so the students love this high stakes, but tons of help version of assessment, but that may not actually measure their, their level of knowledge. And so it's gonna take a little bit of adjustment, for students and for faculty to do the shift, uh, to where the, as assess the, the exams are the Give students something interesting to build and don't worry about cheating [01:01:09] Dan: Yeah. Also, I'm, I'm not convinced that cheating is gonna be a problem here. it's very possible, for example, that students cheat on our previous assignments because the assignments were not authentic. Um, you know, in industry you're never going to, no one's gonna come up to you and say, Hey, like, from scratch, you know, write this exact function, takes two lists and determines, you know, how many values are equal between them. It's like, it's like, that's not gonna happen, right? You're gonna be doing something that has some sort of business purpose. And I kind of wonder, um, and this, this will, you know, this will play out, um, one way or another in the next, in the next, uh, few months. But I kind of wonder if we give students authentic tasks. Now you're cheating yourself right out of doing some, some something of value, right? Like before you were. You were probably cheating yourself out of a learning opportunity, but how, how can, you know, how can students know that? Right. The assignments boring, right? It's like, write all these functions and then something, something happens because of the magic, you know, starter glue code we wrote. So I don't, I don't know. I feel like if you give students opportunities to learn what they want to learn, um, there's, I don't, I don't know. I don't, I just don't think there's a reason to cheat. And, and also, I mean, um, I, I've been much happier in my career recently when I don't worry about it. So it's like, okay, I've got a bunch of students, some of them are gonna cheat, some of them are not. And I'm here to talk to the ones who, who wanna learn. So, I don't know. A lot of people were on some email lists, for example, and a lot of people seem to be panicking about it. And I, I kind of think, you know, buddy, you had a huge cheating problem before. I don't think it's gonna become worse now that you're giving students authentic work to do. Right? They, they all want to be using, uh, you know, programming to, you know, to do their jobs better or make their lives better, or the world better. They don't wanna waste their own time. But if you give them a decontextualized task, it's like, it's super tempting to just cheat, right? Because what's the point? Right? And so, um, I, I, I'm, I'm very hopeful. I, I, I am not convinced that that cheating is gonna be a problem. [01:03:23] Jeremy: Yeah, that's a good point, and I think it's very motivating for any student or anybody who's learning a thing to, to be able to see a clear, connection to like an actual thing that I made, versus I'm writing functions to pass these test cases is like not very, not very interesting, uh, intellectually. So I think if you structure the, the projects where it's like, oh, am I gonna actually make this thing that does this thing That seems pretty cool, then yeah, that's definitely more motivating to, to actually go through with it. [01:04:00] Dan: Like, just off the top of my head, imagine if every student had to make a landing page, like a website who's gonna cheat? Like what? I want a landing page. Like, I, I want that. And, and all student and students are gonna want that too. And so it's like, well, okay. Like I, I, I may as well make it right. Like this has a, this has a purpose. So, Leo. Leo, I'm curious, you've been, you've been, uh, patiently listening to, to that. I'm curious what you think about [01:04:29] Leo: Oh, I, I, I, I can't agree more. I think the, um, I mean, we can leverage the research, right? The computing and context is kinda this well established thing that if you teach computing in a context that's meaningful to the students, they tend to learn more and engage more, and wanna stay in the major more. Um, and I think we're just gonna be able to do, we do this right? We, for convenience sake, and because of the scale of the number of students that we've had in our classes, we've kind of moved away from that and gone to these auto graded nots of exciting assignments. And I think we're, this is the impetus we need to go back to fun, creative, interesting assignments that the students are gonna put time and care into because they want to, not because they have to. Problem Decomposition [01:05:10] Jeremy: So it, it sounds like through our discussion, you're, you're really excited about, bringing large language models into the classroom and kind of what that means for you and your, your students. And I wonder if there's anything we didn't really touch on or maybe something that was unexpected that you think is gonna make a really big difference, to you and your students. [01:05:33] Leo: I think one of the things that we haven't touched on yet that I'm, I'm really excited about is, the piece of problem decomposition. And so over the years, because of this trend towards auto grading, uh, what's happened is, all the cognitive work of taking a, a big, computing task and breaking into smaller pieces, deciding what classes should exist, what functions should exist, all those interfaces, all that work that I think is really interesting and exciting. It is now done for students because the auto grading structure just makes it so you have to have these functions and they just code the functions. and so I think that's really concerning just from a software engineer in perspective, that students are, are learning how to program without learning those core abilities as, as software engineers to take a large problem, break it down, figure out what the right interfaces are, and that's a lot of, that's actually more art than science, I'd argue. And so the more time you have to practice it, the better. And I am incredibly excited that LLMs are kind of forcing our hand to make us step back, give larger programming tasks to them, and teach them the process of problem decomposition explicitly in a way that, in a way that we've never really, never done before. I think that's, uh, that's a good place to, to wrap it up on so if people want to hear more about your upcoming book or maybe even enroll in in your class, Leo, where can they get some more information? I. Both Dan and I have active LinkedIn pages and we're happy to have folks, uh, follow us there. Manning Publishing is the, publisher for our book. Um, and so we have that book out on early access right now. Um, it should be available, uh, entirely electronically by August in time for the start of the fall quarter. Um, and then it should be out in print, uh, shortly thereafter. [01:07:25] Jeremy: Cool. Well, this has been an interesting discussion. I mean, large language models are kind of that's the thing right now. Everybody's trying to, to stuff it into every single product. And I think getting both of your perspective on where it fits in in education has been, has been very interesting. So thank you. Thank you very much for coming on the show [01:07:46] Leo: Thank you Jeremy, for inviting us and for running such great podcast. We really appreciate it. [01:07:52] Dan: Thanks Jeremy.

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode