The Bike Shed

thoughtbot
undefined
Mar 12, 2024 • 43min

418: Mental Models For Reduce Functions

Joël talks about his difficulties optimizing queries in ActiveRecord, especially with complex scopes and unions, resulting in slow queries. He emphasizes the importance of optimizing subqueries in unions to boost performance despite challenges such as query duplication and difficulty reusing scopes. Stephanie discusses upgrading a client's app to Rails 7, highlighting the importance of patience, detailed attention, and the benefits of collaborative work with a fellow developer. The conversation shifts to Ruby's reduce method (inject), exploring its complexity and various mental models to understand it. Joël and Stephanie discuss when it's preferable to use reduce over other methods like each, map, or loops and the importance of understanding the underlying operation you wish to apply to two elements before scaling up with reduce. The episode also touches on monoids and how they relate to reduce, suggesting that a deep understanding of functional programming concepts can help simplify reduce expressions. Rails 7 EXPLAIN ANALYZE Blocks, symbol to proc, and symbols arguments for reduce Ruby tally Performance considerations for reduce in JavaScript Persistant data structures Avoid passing a block to map and reduce Functional Programming with Bananas, Lenses, Envelopes and Barbed Wire monoids iteration anti-patterns Joël’s talk on “constructor replacement” Transcript:  STEPHANIE: Hello and welcome to another episode of The Bike Shed, a weekly podcast from your friends at thoughtbot about developing great software. I'm Stephanie Minn. JOËL: And I'm Joël Quenneville. And together, we're here to share a bit of what we've learned along the way. STEPHANIE: So, Joël, what's new in your world? JOËL: I've been doing a bunch of fiddling with query optimization this week, and I've sort of run across an interesting...but maybe it's more of an interesting realization because it's interesting in the sort of annoying way. And that is that, using ActiveRecord scopes with certain more complex query pieces, particularly unions, can lead to queries that are really slow, and you have to rewrite them differently in a way that's not reusable in order to make them fast. In particular, if you have sort of two other scopes that involve joins and then you combine them using a union, you're unioning two sort of joins. Later on, you want to change some other scope that does some wares or something like that. That can end up being really expensive, particularly if some of the underlying tables being joined are huge. Because your database, in my case, Postgres, will pull a lot of this data into the giant sort of in-memory table as it's, like, building all these things together and to filter them out. And it doesn't have the ability to optimize the way it would on a more traditional relation. A solution to this is to make sure that the sort of subqueries that are getting unioned are optimized individually. And that can mean moving conditions that are outside the union inside. So, if I'm chaining, I don't know, where active is true on the outer query; on the union itself, I might need to move that inside each of the subqueries. So, now, in the two or three subqueries that I'm unioning, each of them needs to have a 'where active true' chained on it. STEPHANIE: Interesting. I have heard this about using ActiveRecord scopes before, that if the scopes are quite complex, chaining them might not lead to the most performant query. That is interesting. By optimizing the subqueries, did you kind of change the meaning of them? Was that something that ended up happening? JOËL: So, the annoying thing is that I have a scope that has the union in it, and it does some things sort of on its own. And it's used in some places. There are also other places that will try to take that scope that has the union on it, chain some other scopes that do other joins and some more filters, and that is horribly inefficient. So, I need to sort of rewrite the sort of subqueries that get union to include all these new conditions that only happen in this one use case and not in the, like, three or four others that rely on that union. So, now I end up with some, like, awkward query duplication in different call sites that I'm not super comfortable about, but, unfortunately, I've not found a good way to make this sort of nicely reusable. Because when you want to chain sort of more things onto the union, you need to shove them in, and there's no clean way of doing that. STEPHANIE: Yeah. I think another way I've seen this resolved is just writing it in SQL if it's really complex and it becoming just a bespoke query. We're no longer trying to use the scope that could be reusable. JOËL: Right. Right. In this case, I guess, I'm, like, halfway in between in that I'm using the ActiveRecord DSL, but I am not reusing scopes and things. So, I sort of have the, I don't know, naive union implementation that can be fine in all of the simpler use cases that are using it. And then the query that tries to combine the union with some other fancy stuff it just gets its own separate implementation different than the others that it has optimized. So, there are sort of two separate paths, two separate implementations. I did not drop down to writing raw SQL because I could use the ActiveRecord DSL. So, that's what I've been working with. What's new in your world this week? STEPHANIE: So, a couple of weeks ago, I think, I mentioned that I was working on a Rails 7 upgrade, and we have gotten it out the door. So, now the client application I'm working on is on Rails 7, which is exciting for the team. But in an effort to make the upgrade as incremental as possible, we did, like, back out of a few of the new application config changes that would have led us down a path of more work. And now we're kind of following up a little bit to try to turn some of those configs on to enable them. And it was very exciting to kind of, like, officially be on Rails 7. But I do feel like we tried to go for, like, the minimal amount of work possible in that initial big change. And now we're having to kind of backfill a little bit on some of the work that was a little bit more like, oh, I'm not really sure, like, how big this will end up being. And it's been really interesting work, I think, because it requires, like, two different mindsets. Like, one of them is being really patient and focused on tedious work. Like, okay, what happens when we enable this config option? Like, what changes? What errors do we see? And then having to turn it back off and then go in and fix them. But then another, I think, like, headspace that we have to be in is making decisions about what to do when we come to a crossroads around, like, okay, now that we are starting to see all the changes that are coming about from enabling this config, is this even what we want to do? And it can be really hard to switch between those two modes of thinking. JOËL: Yeah. How do you try to balance between the two? STEPHANIE: So, I luckily have been pairing with another dev, and I've actually found that to be really effective because he has, I guess, just, like, a little bit more of that patience to do the more tedious, mundane [laughs] aspects of, like, driving the code changes. And I have been riding along. But then I can sense, like, once he gets to the point of like, "Oh, I'm not sure if we should keep going down this road," I can step in a little bit more and be like, "Okay, like, you know, I've seen us do this, like, five times now, and maybe we don't want to do that." Or maybe being like, "Okay, we don't have a really clear answer, but, like, who can we talk to to find out a little bit more or get their input?" And that's been working really well for me because I've not had a lot of energy to do more of that, like, more manual or tedious labor [chuckles] that comes with working on that low level of stuff. So yeah, I've just been pleasantly surprised by how well we are aligning our superpowers. JOËL: To use some classic business speech, how does it feel to be in the future on Rails 7? STEPHANIE: Well, we're not quite up, you know, up to modern days yet, but it does feel like we're getting close. And, like, I think now we're starting to entertain the idea of, like, hmm, like, could we be even on main? I don't think it's really going to happen, but it feels a little bit more possible. And, in general, like, the team thinks that that could be, like, really exciting. Or it's easier, I think, once you're a little bit more on top of it. Like, the worst is when you get quite behind, and you end up just feeling like you're constantly playing catch up. It just feels a little bit more manageable now, which is good. JOËL: I learned this week a fun fact about Rails 7.1, in particular, which is that the analyze method on ActiveRecord queries, which allowed you to sort of get SQL EXPLAIN statements, now has the ability to pass in a couple of extra parameters. So, there are symbols, and you can pass in things like analyze or verbose, which allows you to get sort of more data out of your EXPLAIN query, which can be quite nice when you're debugging for performance. So, if you're in the future and you're on Rails 7.1 and you want sort of the in-depth query plans, you don't need to copy the SQL into a Postgres console to get access to the sort of fully developed EXPLAIN plan. You can now do it by passing arguments to EXPLAIN, which I'm very happy for. STEPHANIE: That's really nice. JOËL: So, we've mentioned before that we have a developers' channel on Slack here at thoughtbot, and there's all sorts of fun conversations that happen there. And there was one recently that really got me interested, where people were talking about Ruby's reduce method, also known as inject. And it's one of those methods that's kind of complicated, or it can be really confusing. And there was a whole thread where people were talking about different mental models that they had around the reduce method and how they sort of understand the way it works. And I'd be curious to sort of dig into each other's mental models of that today. To kick us off, like, how comfortable do you feel with Ruby's reduce method? And do you have any mental models to kind of hold it in your head? STEPHANIE: Yeah, I think reduce is so hard to wrap your head around, or it might be one of the most difficult, I guess, like, functions a new developer encounters, you know, in trying to understand the tools available to them. I always have to look up the order of the arguments [laughs] for reduce. JOËL: Every time. STEPHANIE: Yep. But I feel like I finally have a more intuitive sense of when to use it. And my mental model for it is collapsing a collection into one value, and, actually, that's why I prefer calling it reduce rather than the inject alias because reduce kind of signals to me this idea of going from many things to one canonical thing, I suppose. JOËL: Yeah, that's a very common use case for reducing, and I guess the name itself, reducing, kind of has almost that connotation. You're taking many things, and you're going to reduce that down to a single thing. STEPHANIE: What was really interesting to me about that conversation was that some people kind of had the opposite mental model where it made a bit more sense for them to think about injecting and, specifically, like, the idea of the accumulator being injected with values, I suppose. And I kind of realized that, in some ways, they're kind of antonyms [chuckles] a little bit because if you're focused on the accumulator, you're kind of thinking about something getting bigger. And that kind of blew my mind a little bit when I realized that, in some ways, they can be considered opposites. JOËL: That's really fascinating. It is really interesting, I think, the way that we can take the name of a method and then almost, like, tell ourselves a story about what it does that then becomes our way of remembering how this method works. And the story we tell for the same method name, or in this case, maybe there's a few different method names that are aliases, can be different from person to person. I know I tend to think of inject less in terms of injecting things into the accumulator and more in terms of injecting some kind of operator between every item in the collection. So, if we have an array of numbers and we're injecting plus, in my mind, I'm like, oh yeah, in between each of the numbers in the collection, just inject a little plus sign, and then do the math. We're summing all the items in the collection. STEPHANIE: Does that still hold up when the operator becomes a little more complex than just, you know, like, a mathematical operator, like, say, a function? JOËL: Well, when you start passing a block and doing custom logic, no, that mental model kind of falls apart. In order for it to work, it also has to be something that you can visualize as some form of infix operator, something that goes between two values rather than, like, a method name, which is typically in prefix position. I do want to get at this idea, though: the difference between sort of the block version versus passing. There are ways where you can just do a symbol, and that will call a method on each of the items. Because I have a bit of a hot take when it comes to writing reduce blocks or inject blocks that are more accessible, easier to understand. And that is, generally, that you shouldn't, or more specifically, you should not have a big block body. In general, you should be either using the symbol version or just calling a method within the block, and it's a one-liner. Which means that if you have some complex behavior, you need to find a way to move that out of this sort of collection operation and into instance methods on the objects being iterated. STEPHANIE: Hmm, interesting. By one-liner do you mean passing the name of the method as a proc or actually, like, having your block that then calls the method? Because I can see it becoming even simpler if you have already extracted a method. JOËL: Yeah, if you can do symbol to proc, that's amazing, or even if you can use just the straight-up symbol way of invoking reduce or inject. That typically means you have to start thinking about the types of objects that you are working with and what methods can be moved onto them. And sometimes, if you're working with hashes or something like that that don't have domain methods for what you want, that gets really awkward. And so, then maybe that becomes maybe a hint that you've got some primitive obsession happening and that this hash that sort of wants a domain object or some kind of domain method probably should be extracted to its own object. STEPHANIE: I'll do you with another kind of spicy take. I think, in that case, maybe you don't want a reduce at all. If you're starting to find that...well, okay, I think it maybe could depend because there could be some very, like, domain-specific logic. But I have seen reduce end up being used to transform the structure of the initial collection when either a different higher-order function can be used or, I don't know, maybe you're just better off writing it with a regular loop [laughs]. It could be clearer that way. JOËL: Well, that's really interesting because...so, you mentioned the idea that we could use a different higher-order function, and, you know, higher-order function is that fancy term, just a method that accepts another method as an argument. In Ruby, that just means your method accepts a block. Reduce can be used to implement pretty much the entirety of enumerable. Under the hood, enumerable is built in terms of each. You could implement it in terms of reduce. So, sometimes it's easy to re-implement one of the enumerable methods yourself, accidentally, using reduce. So, you've written this, like, complex reduce block, and then somebody in review comes and looks at it and is like, "Hey, you realize that's just map. You've just recreated map. What if we used map here?" STEPHANIE: Yeah. Another one I've seen a lot in JavaScript land where there are, you know, fewer utility functions is what we now have in Ruby, tally. I feel like that was a common one I would see a lot when you're trying to count instances of something, and I've seen it done with reduce. I've seen it done with a for each. And, you know, I'm sure there are libraries that actually provide a tally-like function for you in JS. But I guess that actually makes me feel even more strongly about this idea that reduce is best used for collapsing something as opposed to just, like, transforming a data structure into something else. JOËL: There's an interesting other mental model for reduce that I think is hiding under what we're talking about here, and that is the idea that it is a sort of mid-level abstraction for dealing with collections, as opposed to something like map or select or some of those other enumerable helpers because those can all be implemented in terms of reduce. And so, in many cases, you don't need to write the reduce because the library maintainer has already used reduce or something equivalent to build these higher-level helpers for you. STEPHANIE: Yeah, it's kind of in that weird point between, like, very powerful [chuckles] so that people can start to do some funky things with it, but also sometimes just necessary because it can feel a little bit more concise that way. JOËL: I've done a fair amount of functional programming in languages like Elm. And there, if you're building a custom data structure, the sort of lowest-level way you have of looping is doing a recursion, and recursions are messy. And so, what you can do instead as a library developer is say, "You know what, I don't want to be writing recursions for all of these." I don't know; maybe I'm building a tree library. I don't want to write a recursion for every different function that goes over trees if I want to map or filter or whatever. I'm going to write reduce using recursion, and then everything else can be written in terms of reduce. And then, if people want to do custom things, they don't need to recurse over my tree. They can use this reduce function, which allows them to do most of the traversals they want on the tree without needing to touch manual recursion. So, there's almost, like, a low-level, mid-level, high-level in the library design, where, like, lowest level is recursion. Ideally, nobody touches that. Mid-level, you've got reducing that's built out on top of recursion. And then, on top of that, you've got all sorts of other helpers, like mapping, like filtering, things like that. STEPHANIE: Hmm. I'm wondering, do you know of any performance considerations when it comes to using reduce built off a recursion? JOËL: So, one of the things that can be really nice is that writing a recursion yourself is dangerous. It's so easy to, like, accidentally introduce Stack Overflow. You could also write a really inefficient one. So, ideally, what you do is that you write a reduce that is safe and that is fast. And then, everybody else can just use that to not have to worry about the sort of mechanics of traversing the collection. And then, just use this. It already has all of the safety and speed features built in. You do have to be careful, though, because reduce, by nature, traverses the entire collection. And if you want to break out early of something expensive, then reduce might not be the tool for you. STEPHANIE: I was also reading a little bit about how, in JavaScript, a lot of developers like to stick to that idea of a pure function and try to basically copy the entire accumulator for every iteration and creating a new object for that. And that has led to some memory issues as well. As opposed to just mutating the accumulator, having, especially when you, you know, are going through a collection, like, really large, making that copy every single time and creating, yeah [chuckles], just a lot of issues that way. So, that's kind of what prompted that question. JOËL: Yeah, that can vary a lot by language and by data structure. In more functional languages that try to not mutate, they often have this idea of what they call persistent data structures, where you can sort of create copies that have small modifications that don't force you to copy the whole object under the hood. They're just, like, pointers. So, like, hey, we, like, are the same as this other object, but with this extra element added, or something like that. So, if you're growing an array or something like that, you don't end up with 10,000 copies of the array with, like, a new element every time. STEPHANIE: Yeah, that is interesting. And I feel like trying to adopt different paradigms for different tools, you know, is not always as straightforward as some wish it were [laughs]. JOËL: I do want to give a shout-out to an academic paper that is...it is infamously dense. The title of it is Functional Programming with Bananas, Lenses, and Barbed Wire. STEPHANIE: It doesn't sound dense; it sounds fun. Well, I don't about barbed wire. JOËL: It sounds fun, right? STEPHANIE: Yeah, but certainly quirky [laughs]. JOËL: It is incredibly dense. And they've, like, created this custom math notation and all this stuff. But the idea that they pioneered there is really cool, this idea that kind of like I was talking about sort of building libraries in different levels. Their idea is that recursion is generally something that's unsafe and that library and language designers should take care of all of the recursion and instead provide some of these sort of mid-level helper methods to do things. Reducing is one of them, but their proposal is that it's not the only one. There's a whole sort of family of similar methods that are there that would be useful in different use cases. So, reduce allows you to sort of traverse the whole thing. It does not allow you to break out early. It does not allow you to keep sort of track of a sort of extra context element if you want to, like, be traversing a collection but have a sort of look forward, look back, something like that. So, there are other variations that could handle those. There are variations that are the opposite of reduce, where you're, like, inflating, starting from a few parameters and building a collection out of them. So, this whole concept is called recursion schemes, and you can get, like, really deep into some theory there. You'll hear fancy words like catamorphisms and anamorphisms. There's a whole world to explore in that area. But at its core, it's this idea that you can sort of slice up things into this sort of low-level recursion, mid-level helpers, and then, like, kind of userland helpers built on top of that. STEPHANIE: Wow. That is very intense; it sounds like [chuckles]. I'm happy not to ever have to write a recursion ever again, probably [laughs]. Have you ever, as just a web developer in your day-to-day programming, found a really good use case for dropping down to that level? Or are you kind of convinced that, like, you won't really ever need to? JOËL: I think it depends on the paradigm of the language you're working in. In Ruby, I've very rarely needed to write a recursion. In something like Elm, I've had to do that, eh, not infrequently. Again, it depends, like, if I'm doing more library-esque code versus more application code. If I'm writing application code and I'm using an existing, let's say, tree library, then I typically don't need to write a recursion because they've already written traversals for me. If I'm making my own and I have made my own tree libraries, then yes, I'm writing recursions myself and building those traversals so that other people don't have to. STEPHANIE: Yeah, that makes sense. I'd much rather someone who has read that paper [laughs] write some traversal methods for me. JOËL: And, you know, for those who are curious about it, we will put a link to this paper in the description. So, we've talked about a sort of very academic mental model way of thinking about reducing. I want to shift gears and talk about one that I have found is incredibly practical, and that is the idea that reduce is a way to scale an operation that works on two objects to an operation that works on sort of an unlimited number of objects. To make it more concrete, take something like addition. I can add two numbers. The plus operator allows me to take one number, add another, get a sum. But what if I want to not just add two numbers? I want to add an arbitrary number of numbers together. Reduce allows me to take that plus operator and then just scale it up to as many numbers as I want. I can just plug that into, you know, I have an array of numbers, and I just call dot reduce plus operator, and, boom, it can now scale to as many numbers as I want, and I can sum the whole thing. STEPHANIE: That dovetails quite nicely with your take earlier about how you shouldn't pass a block to reduce. You should extract that into a method. Don't you think? JOËL: I think it does, yes. And then maybe it's, like, sort of two sides of a coin because I think what this leads to is an approach that I really like for reducing because sometimes, you know, here, I'm starting with addition. I'm like, oh, I have addition. Now, I want to scale it up. How do I do that? I can use reduce. Oftentimes, I'm faced with sort of the opposite problem. I'm like, oh, I need to add all these numbers together. How do I do that? I'm like, probably with a reduce. But then I start writing the block, and, like, I get way too into my head about the accumulator and what's going to happen. So, my strategy for writing reduce expressions is to, instead of trying to figure out how to, like, do the whole thing together, first ask myself, how do I want to combine any two elements that are in the array? So, I've got an array of numbers, and I want to sum them all. What is the thing I need to do to combine just two of those? Forget the array. Figure that out. And then, once I have that figured out, maybe it's an existing method like plus. Maybe it's a method I need to define on it if it's a custom object. Maybe it's a method that I write somewhere. Then, once I have that, I can say, okay, I can do it for two items. Now, I'm going to scale it up to work for the whole array, and I can plug it into reduce. And, at that point, the work is already basically done, so I don't end up with a really complex block. I don't end up, like, almost ending in, like, a recursive infinite loop in my head because I do that. STEPHANIE: [laughs]. JOËL: So, that approach of saying, start by figuring out what is the operation you want to do to combine two elements, and then use reduce as a way to scale that to your whole array is a way that I've used to keep things simple in my mind. STEPHANIE: Yeah, I like that a lot as a supplement to the model I shared earlier because, for me, when I think about reducing as, like, collapsing into a value, you kind of are just like, well, okay, I start with the collection, and then somehow I get to my single value. But the challenge is figuring out how that happens [laughs], like, the magic that happens in between that. And I think another alias that we haven't mentioned yet for reduce that is used in a lot of other languages is fold. And I actually like that one a lot, and I think it relates to your mental model. Because when I think about folding, I'm picturing folding up a paper like an accordion. And you have to figure out, like, what is the first fold that I can make? And just repeating that over and over to get to your little stack of accordion paper [laughs]. And if you can figure out just that first step, then you pretty much, like, have the recipe for getting from your initial input to, like, your desired output. JOËL: Yeah. I think fold is interesting in that some languages will make a distinction between fold and reduce. They will have both. And typically, fold will require you to pass an initial value, like a starting accumulator, to start it off. Whereas reduce will sort of assume that your array can use the first element of the array as the first accumulator. STEPHANIE: Oh, I just came up with another visual metaphor for this, which is, like, folding butter into croissant pastry when the butter is your initial value [laughs]. JOËL: And then the crust is, I guess, the elements in the array. STEPHANIE: Yeah. Yeah. And then you get a croissant out of it [laughs]. Don't ask me how it gets to a perfectly baked, flaky, beautiful croissant, but somehow that happens [laughs]. JOËL: So, there's an interesting sort of subtlety here that I think happens because there are sort of two slightly different ways that you can interact with a reduce. Sometimes, your accumulator is of the same type as the elements in your array. So, you're summing an array of numbers, and your accumulator is the sum, but each of the elements in the array are also numbers. So, it's numbers all the way through. And sometimes, your accumulator has a different type than the items in the array. So, maybe you have an array of words, and you want to get the sum of all of the characters and all the words. And so, now your accumulator is a number, but each of the items in the array are strings. STEPHANIE: Yeah, that's an interesting distinction because I think that's where you start to see the complex blocks being passed and reduced. JOËL: The complex blocks, definitely; I think they tend to show up when your accumulator has a different type than the individual items. So, maybe that's, like, a slightly more complicated use case. Oftentimes, too, the accumulator ends up being some, like, more complex, like, hash or something that maybe would really benefit from being a custom object. STEPHANIE: I've never done that before, but I can see why that would be really useful. Do you have an example of when you used a custom object as the accumulator? JOËL: So, I've done it for situations where I'm working with objects that are doing tally-like operations, but I'm not doing just a generic tally. There's some domain-specific stuff happening. So, it's some sort of aggregate counter on multiple dimensions that you can use, and that can get really ugly. And you can either do it with a reduce or you can have some sort of, like, initial version of the hash outside and do an each and mutate the hash and stuff like that. All of these tend to be a little bit ugly. So, in those situations, I've often created some sort of custom object that has some instance methods that allow to sort of easily add new elements to it. STEPHANIE: That's really interesting because now I'm starting to think, what if the elements in the collection were also a custom object? [chuckles] And then things could, I feel like, could be really powerful [laughs]. JOËL: There's often a lot of value, right? Because if the items in the collection are also a custom object, you can then have methods on them. And then, again, the sort of complexity of the reduce can sort of, like, fade away because it doesn't own any of the logic. All it does is saying, hey, there's a thing you can do to combine two items. Let's scale it up to work on a collection of items. And now you've sort of, like, really simplified what logic is actually owned inside the reduce. I do want to shout out for those listeners who are theory nerds and want to dig into this. When you have a reduce, and you've got an operation where all the values are of the same type, including the accumulator, typically, what you've got here is some form of monoid. It may be a semigroup. So, if you want to dig into some theory, those are the words to Google and to go a deep dive on. The main thing about monoids, in particular, is that monoids are any objects that have both a sort of a base case, a sort of empty version of themselves, and they have some sort of combining method that allows you to combine two values of that type. If your object has these things and follows...there's a few rules that have to be true. You have a monoid. And they can then be sort of guaranteed to be folded nicely because you can plug in their base case as your initial accumulator. And you can plug in their combining method as just the value of the block, and everything else just falls into place. A classic here is addition for numbers. So, if you want to add two numbers, your combining operator is a plus. And your sort of empty value is a zero. So, you would say, reduce initial value is zero, array of numbers. And your block is just plus, and it won't sum all of the numbers. You could do something similar with strings, where you can combine strings together with plus, and, you know, your empty string is your base case. So, now you're doing sort of string concatenation over arbitrary number of strings. Turns out there's a lot of operations that fall into that, and you can even define some of those on your custom object. So, you're like, oh, I've got a custom object. Maybe I want some way of, like, combining two of them together. You might be heading in the direction of doing something that is monoidal, and if so, that's a really good hint to know that it can sort of, like, just drop into place with a fold or a reduce and that that is a tool that you have available to you. STEPHANIE: Yeah, well, I think my eyes, like, widened a little bit when you first dropped the term monoid [laughs]. I do want to spend the last bit of our time talking about when not to use reduce, and, you know, we did talk a lot about recursion. But when do you think a regular old loop will just be enough? JOËL: So, you're suggesting when would you want to use something like an each rather than a reduce? STEPHANIE: Yeah. In my mind, you know, you did offer, like, a lot of ways to make reduce simpler, a lot of strategies to end up with some really nice-looking syntax [chuckles], I think. But, oftentimes, I think it can be equally as clear storing your accumulator outside of the iteration and that, like, is enough for me to understand. And reduce takes a little bit of extra overhead to figure out what I'm looking at. Do you have any thoughts about when you would prefer to do that? Or do you think that you would usually reach for something else? JOËL: Personally, I generally don't like the pattern of using each to iterate over a collection and then mutate some external accumulator. That, to me, is a bit of a code smell. It's a sign that each is not quite powerful enough to do the thing that I want to do and that I'm probably needing some sort of more specialized form of iteration. Sometimes, that's reduce. Oftentimes, because each can suffer from the same problem you mentioned from reduce, where it's like, oh, you're doing this thing where you mutate an external accumulator. Turns out what you're really doing is just map. So, use map or use select or, you know, some of the other built-in iterators from the enumerable library. There's a blog post on the thoughtbot blog that I continually link to people. And when I see the pattern of, like, mutating an external variable with each, yeah, I tend to see that as a bit of a code smell. I don't know that I would never do it, but whenever I see that, it's a sign to me to, like, pause and be like, wait a minute, is there a better way to do this? STEPHANIE: Yeah, that's fair. I like the idea that, like, if there's already a method available to you that is more specific to go with that. But I also think that sometimes I'd rather, like, come across that pattern of mutating a variable outside of the iteration over, like, someone trying to do something clever with the reduce. JOËL: Yeah, I guess reduce, especially if it's got, like, a giant block and you've got then, like, things in there that break or call next to skip iterations and things like that, that gets really mind-bending really quickly. I think a case where I might consider using an each over a reduce, and that's maybe generally when I tend to use each, is when I'm doing side effects. If I'm using a reduce, it's because I care about the accumulated value at the end. If I'm using each, it's typically because I am trying to do some amount of side effects. STEPHANIE: Yeah, that's a really good call out. I had that written down in my notes, and I'm glad you brought it up because I've seen them get conflated a little bit, and perhaps maybe that's the source of the pain that I'm talking about. But I really like that heuristic of reduce as, you know, you're caring about the output, as opposed to what's going on inside. Like, you don't want any unexpected behavior. JOËL: And I think that applies to something like map as well. My sort of heuristic is, if I'm doing side effects, I want each. If I want transformed values that are sort of one-to-one in the collection, I want map. If I want a single sort of aggregate value, then I want reduce. STEPHANIE: I think that's the cool thing about mixing paradigms sometimes, where all the strategies you talked about in terms of, you know, using custom, like, objects for your accumulator, or the elements in your collection, like, that's something that we get because, you know, we're using an object-oriented language like Ruby. But then, like, you also are kind of bringing the functional programming lens to, like, when you would use reduce in the first place. And yeah, I am just really excited now [chuckles] to start looking for some places I can use reduce after this conversation and see what comes out of it. JOËL: I think I went on a bit of an interesting journey where, as a newer programmer, reduce was just, like, really intense. And I struggled to understand it. And I was like, ban it from code. I don't want to ever see it. And then, I got into functional programming. I was like, I'm going to do reduce everywhere. And, honestly, it was kind of messy. And then I, like, went really deep on a lot of functional theory, and I think understood some things that then I was able to take back to my code and actually write reduce expressions that are much simpler so that now my heuristic is like, I love reduce; I want to use it, but I want as little as possible in the reduce itself. And because I understand some of these other concepts, I have the ability to know what things can be extracted in a way that will feel very natural, in a way that myself from five years ago would have just been like, oh, I don't know. I've got this, you know, 30-line reduce expression that I know is complicated, but I don't know how to improve. And so, a little bit of the underlying theory, I don't think it's necessary to understand these simplified reduces, but as an author who's writing them, I think it helps me write reduces that are simpler. So, that's been my journey using reduce. STEPHANIE: Yeah. Well, thanks for sharing. And I'm really excited. I hope our listeners have learned some new things about reduce and can look at it from a different light. JOËL: There are so many different perspectives. And I think we keep discovering new mental models as we talk to different people. It's like, oh, this particular perspective. And there's one that we didn't really dig into but that I think makes more sense in a functional world that's around sort of deconstructing a structure and then rebuilding it with different components. The shorthand name of this mental model, which is a fairly common one, is constructor replacement. For anyone who's interested in digging into that, we'll link it in the show notes. I gave a talk at an Elm meetup where I sort of dug into some of that theory, which is really interesting and kind of mind-blowing. Not as relevant, I think, for Rubyists, but if you're in a language that particularly allows you to build custom structures out of recursive types or what are sometimes called algebraic data types, or tagged unions, or discriminated unions, this thing goes by a bajillion names, that is a really interesting other mental model to look at. And, again, I don't think the list that we've covered today is exhaustive. You know, I would love it for any of our listeners; if you have your own mental models for how to think about folding, injecting, reducing, send them in: hosts@bikeshed.fm. We'd love to hear them. STEPHANIE: And on that note, shall we wrap up? JOËL: Let's wrap up. STEPHANIE: Show notes for this episode can be found at bikeshed.fm. JOËL: This show has been produced and edited by Mandy Moore. STEPHANIE: If you enjoyed listening, one really easy way to support the show is to leave us a quick rating or even a review in iTunes. It really helps other folks find the show. JOËL: If you have any feedback for this or any of our other episodes, you can reach us @_bikeshed, or you can reach me @joelquen on Twitter. STEPHANIE: Or reach both of us at hosts@bikeshed.fm via email. JOËL: Thanks so much for listening to The Bike Shed, and we'll see you next week. ALL: Byeeeeeeee!!!!!!! AD: Did you know thoughtbot has a referral program? If you introduce us to someone looking for a design or development partner, we will compensate you if they decide to work with us. More info on our website at: tbot.io/referral. Or you can email us at referrals@thoughtbot.com with any questions.Support The Bike Shed
undefined
Mar 5, 2024 • 40min

417: Module Docs

Stephanie shares about her vacation at Disney World, particularly emphasizing the technological advancements in the park's mobile app that made her visit remarkably frictionless. Joël had a conversation about a topic he loves: units of measure, and he got to go deep into the idea of dimensional analysis with someone this week. Together, Joël and Stephanie talk about module documentation within software development. Joël shares his recent experience writing module docs for a Ruby project using the YARD documentation system. He highlights the time-consuming nature of crafting good documentation for each public method in a class, emphasizing that while it's a demanding task, it significantly benefits those who will use the code in the future. They explore the attributes of good documentation, including providing code examples, explaining expected usage, suggesting alternatives, discussing edge cases, linking to external resources, and detailing inputs, outputs, and potential side effects. Multidimensional numbers episode YARD docs New factory_bot documentation Dash Solargraph Transcript:  JOËL: Hello and welcome to another episode of The Bike Shed, a weekly podcast from your friends at thoughtbot about developing great software. I'm Joël Quenneville. STEPHANIE: And I'm Stephanie Minn, and together, we're here to share a bit of what we've learned along the way. JOËL: So, Stephanie, what's new in your world? STEPHANIE: So, I recently was on vacation, and I'm excited [chuckles] to tell our listeners all about it. I went to Disney World [laughs]. And honestly, I was especially struck by the tech that they used there. As a person who works in tech, I always kind of have a little bit of a different experience knowing a bit more about software, I suppose, than just your regular person [laughs], citizen. And so, at Disney World, I was really impressed by how seamlessly the like, quote, unquote, "real life experience" integrated with their use of their branded app to pair with, like, your time at the theme park. JOËL: This is, like, an app that runs on your mobile device? STEPHANIE: Yeah, it's a mobile app. I haven't been to Disney in a really long time. I think the last time I went was just as a kid, like, this was, you know, pre-mobile phones. So, I recall when you get into the line at a ride, you can skip the line by getting what's called a fast pass. And so, you kind of take a ticket, and it tells you a designated time to come back so that you could get into the fast line, and you don't have to wait as long. And now all this stuff is on your mobile app, and I basically did not wait in [laughs] a single line for more than, like, five minutes to go on any of the rides I wanted. It just made a lot of sense that all these things that previously had more, like, physical touchstones, were made a bit more convenient. And I hesitate to use the word frictionless, but I would say that accurately describes the experience. JOËL: That's kind of amazing; the idea that you can use tech to make a place that's incredibly busy also feel seamless and where you don't have to wait in line. STEPHANIE: Yeah and, actually, I think the coolest part was it blended both your, like, physical experience really well with your digital one. I think that's kind of a gripe I have as a technologist [laughs] when I'm just kind of too immersed in my screen as opposed to the world around me. But I was really impressed by the way that they managed to make it, like, a really good supplement to your experience being there. JOËL: So, you're not hyped for a future world where you can visit Disney in VR? STEPHANIE: I mean, I just don't think it's the same. I rode a ride [laughs] where it was kind of like a mini roller coaster. It was called Expedition Everest. And there's a moment, this is, like, mostly indoors, but there's a moment where the roller coaster is going down outside, and you're getting that freefall, like, drop feeling in your stomach. And it also happened to be, like, drizzling that day that we were out there, and I could feel it, you know, like, pelting my head [laughs]. And until VR can replicate that experience [chuckles], I still think that going to Disney is pretty fun. JOËL: Amazing. STEPHANIE: So, Joël, what's new in your world? JOËL: I'm really excited because I had a conversation about a topic that I like to talk about: units of measure. And I got to go deep into the idea of dimensional analysis with someone this week. This is a technique where you can look at a calculation or a function and sort of spot-check whether it's correct by looking at whether the unit for the measure that would come out match what you would expect. So, you do math on the units and ignore the numbers coming into your formula. And, you know, let's say you're calculating the speed of something, and you get a distance and the amount of time it took you to take to go that distance. And let's say your method implements this as distance times time. Forget about doing the actual math with the numbers here; just look at the units and say, okay, we've got our meters, and we've got our seconds, and we're multiplying them together. The unit that comes out of this method is meters times seconds. You happen to know that speeds are not measured in meters times seconds. They're measured in meters divided by seconds or meters per second. So, immediately, you get a sense of, like, wait a minute, something's wrong here. I must have a bug in my function. STEPHANIE: Interesting. I'm curious how you're representing that data to, like, know if there's a bug or not. In my head, when you were talking about that, I'm like, oh yeah, I definitely recall doing, like, math problems for homework [laughs] where I had, you know, my meters per second. You have your little fractions written out, and then when you multiply or divide, you know how to, like, deal with the units on your piece of paper where you're showing your work. But I'm having a hard time imagining what that looks like as a programmer dealing with that problem. JOËL: You could do it just all in your head based off of maybe some comments that you might have or the name of the variable or something. So, you're like, okay, well, I have a distance in meters and a time in seconds, and I'm multiplying the two. Therefore, what should be coming out is a value that is in meters times seconds. If you want to get fancier, you can do things with value objects of different types. So, you say, okay, I have a distance, and I have a time. And so, now I have sort of a multiplication of a distance and a time, and sort of what is that coming out as? That can sometimes help you prevent from having some of these mistakes because you might have some kind of error that gets raised at runtime where it's like, hey, you're trying to multiply two units that shouldn't be multiplied, or whatever it is. You can also, in some languages, do this sort of thing automatically at the type level. So, instead of looking at it yourself and sort of inferring it all on your own based off of the written code, languages like F# have built-in unit-of-measure systems where once you sort of tag numbers as just being of a particular unit of measure, any time you do math with those numbers, it will then tag the result with whatever compound unit comes from that operation. So, you have meters, and you have seconds. You divide one by the other, and now the result gets tagged as meters per second. And then, if you have another calculation that takes the output of the first one and it comes in, you can tell the compiler via type signature, hey, the input for this method needs to be in meters per second. And if the other calculation sort of automatically builds something that's of a different unit, you'll get a compilation error. So, it's really cool what it can do. STEPHANIE: Yeah, that is really neat. I like all of those built-in guardrails, I suppose, to help you, you know, make sure that your answer is correct. Definitely could have used that [chuckles]. Turns out I just needed a calculator to take my math test with [laughs]. JOËL: I think what I find valuable more than sort of the very rigorous approach is the mindset. So, anytime you're dealing with numbers, thinking in your mind, what is the unit of this number? When I do math with it with a different number, is it the same unit? Is it a different unit? What is the unit of the thing that's coming out? Does this operation make sense in the domain of my application? Because it's easy to sometimes think you're doing a math operation that makes sense, and then when you look at the unit, you're like, wait a minute, this does not make sense. And I would go so far as to say that, you know, you might think, oh, I'm not doing a physics app. I don't care about units of measure. Most numbers in your app that are actually numbers are going to have some kind of unit of measure associated to them. Occasionally, you might have something where it's just, like, a straight-up, like, quantity or something like that. It's a dimensionless number. But most things will have some sort of unit. Maybe it's a number of dollars. Maybe it is an amount of time, a duration. It could be a distance. It could be all sorts of things. Typically, there is some sort of unit that should attach to it. STEPHANIE: Yeah. That makes sense that you would want to be careful about making sure that your mathematical operations that you're doing when you're doing objects make sense. And we did talk about this in the last episode about multidimensional numbers a little bit. And I suppose I appreciate you saying that because I think I have mostly benefited from other people having thought in that mindset before and encoding, like I mentioned, those guardrails. So, I can recall an app where I was working with, you know, some kind of currency or money object, and that error was raised when I would try to divide by zero because rather than kind of having to find out later with some, not a number or infinite [laughs] amount of money bug, it just didn't let me do that. And that wasn't something that I had really thought about, you know, I just hadn't considered that zero value edge case when I was working on whatever feature I was building. JOËL: Yeah, or even just generally the idea of dividing money. What does that even mean? Are you taking an amount of money and splitting it into two equivalent piles to split among multiple people? That kind of makes sense. Are you dividing money by another money value? That's now asking a very different kind of question. You're asking, like, what is the ratio between these two, I guess, piles of money if we want to make it, you know, in the physical world? Is that a thing that makes sense in your application? But also, realize that that ratio that you get back is not itself an amount of money. And so, there are some subtle bugs that can happen around that when you don't keep track of what your quantities are. So, this past week, I've been working on a project where I ended up having to write module docs for the code in question. This is a Ruby project, so I'm writing docs using the YARD documentation system, where you effectively just write code comments at the sort of high level covering the entire class and then, also, individual documentation comments on each of the methods. And that's been really interesting because I have done this in other languages, but I'd never done it in Ruby before. And this is a piece of code that was kind of gnarly and had been tricky for me to figure out. And I figured that a couple of these classes could really benefit from some more in-depth documentation. And I'm curious, in your experience, Stephanie, as someone who's writing code, using code from other people, and who I assume occasionally reads documentation, what are the things that you like to see in good sort of method-level docs? STEPHANIE: Personally, I'm really only reading method-level docs when, you know, at this point, I'm, like, reaching for a method. I want to figure out how to use it in my use case right now [laughs]. So, I'm going to search API documentation for it. And I really am just scanning for inputs, especially, I think, and maybe looking at, you know, some potential various, like, options or, like, variations of how to use the method. But I'm kind of just searching for that at a glance and then moving on [laughs] with my day. That is kind of my main interaction with module docs like that, and especially ones for Ruby and Rails methods. JOËL: And for clarity's sake, I think when we're talking about module docs here, I'm generally thinking of, like, any sort of documentation that sort of comments in code meant to document. It could be the whole modular class. It could be on a per-method level, things like RDoc or YARD docs on Ruby classes. You used the word API docs here. I think that's a pretty similar idea. STEPHANIE: I really haven't given the idea of writing this kind of documentation a lot of thought because I've never had to do too much of it before, but I know, recently, you have been diving deep into it because, you know, like you said, you found these classes that you were working with a bit ambiguous, I suppose, or just confusing. And I'm wondering what kind of came out of that journey. What are some of the most interesting aspects of doing this exercise? JOËL: And one of the big ones, and it's not a fun one, but it is time-consuming. Writing good docs per method for a couple of classes takes a lot of time, and I understand why people don't do it all the time. STEPHANIE: What kinds of things were you finding warranted that time? Like, you know, you had to, at some point, decide, like, whether or not you're going to document any particular method. And what were some of the things you were looking out for as good reasons to do it? JOËL: I was making the decisions to document or not document on a class level, and then every public method gets documentation. If there's a big public API, that means every single one of those methods is getting some documentation comments, explaining what they do, how they're meant to be used, things like that. I think my kind of conclusion, having worked with this, is that the sort of sweet spot for this sort of documentation is for anything that is library-like, so a lot of things that maybe would go into a Rails lib directory might make sense. Anything you're turning into a gem that probably makes sense. And sometimes you have things in your Rails codebase that are effectively kind of library-like, and that was the case for the code that I was dealing with. It was almost like a mini ORM style kind of ActiveRecord-inspired series of base classes that had a bunch of metaprogramming to allow you to write models that were backed by not a database but a headless CMS, a content management system. And so, these classes are not extracted to the lib directory or, like, made into a gem, but they feel very library-esque in that way. STEPHANIE: Library-like; I like that descriptor a lot because it immediately made me think of another example of a time when I've used or at least, like, consumed this type of documentation in a, like, SaaS repo. Rather, you know, I'm not really seeing that level of documentation around domain objects, but I noticed that they really did a lot of extending of the application record class because they just had some performance needs that they needed to write some, like, custom code to handle. And so, they ended up kind of writing a lot of their own ORM-like methods for just some, like, custom callbacks on persisting and some just, like, bulk insertion functionality. And those came with a lot of different ways to use them. And I really appreciated that they were heavily documented, kind of like you would expect those ActiveRecord methods to be as well. JOËL: So, I've been having some conversations with other members at thoughtbot about when they like to use the style of module doc. What are some of the alternatives? And one that kept coming up for different people that they would contrast with this is what they would call the big README approach, and this could be for a whole gem, or it could be maybe some directory with a few classes in your application that's got a README in the root of the directory. And instead of documenting each method, you just write a giant README trying to answer sort of all of the questions that you anticipate people will ask. Is that something that you've seen, and how do you feel about that as a tool when you're looking for help? STEPHANIE: Yes. I actually really like that style of documentation. I find that I just want examples to get me started, especially; I guess this is especially true for libraries that I'm not super familiar with but need to just get a working knowledge about kind of immediately. So, I like to see examples, the getting started, the just, like, here's what you need to know. And as I start to use them, that will get me rolling. But then, if I find I need more details, then I will try to seek out more specific information that might come in the form of class method documentation. But I'm actually thinking about how FactoryBot has one of the best big README-esque [laughs] style of documentation, and I think they did a really big refresh of the docs not too long ago. It has all that high-level stuff, and then it has more specific information on how to use, you know, the most common methods to construct your factories. But those are very detailed, and yet they do sit, like, separately from inline, like, code documentation in the style of module docs that we're talking about. So, it is kind of an interesting mix of both that I think is helpful for me personally when I want both the “what do I need to know now?” And the, “like, okay, I know where to look for if I need something a little more detailed.” JOËL: Yeah. The two don't need to be mutually exclusive. I thought it was interesting that you mentioned how much examples are valuable to you because...I don't know if this is controversial, but an opinion that I have about sort of per-method documentation is that you should always default to having a code example for every method. I don't care how simple it is or how obvious it is what it does. Show me a code example because, as a developer, examples are really, really helpful. And so, seeing that makes documentation a lot more valuable than just a couple of lines that explain something that was maybe already obvious from the title of the method. I want to see it in action. STEPHANIE: Interesting. Do you want to see it where the method definition is? JOËL: Yes. Because sometimes the method definition, like, the implementation, might be sort of complex. And so, just seeing a couple of examples, like, oh, you call with this input, you get that. Call with this other input; you get this other thing. And we see this in, you know, some of the core docs for things like the enumerable methods where having an example there to be like, oh, so that's how map works. It returns this thing under these circumstances. That sort of thing is really helpful. And then, I'll try to do it at a sort of a bigger level for that class itself. You have a whole paragraph about here's the purpose of the class. Here's how you should use it. And then, here's an example of how you might use it. Particularly, if this is some sort of, like, base class you're meant to inherit from, here's the circumstances you would want to subclass this, and then here's the methods you would likely want to override. And maybe here are the DSLs you might want to have and to kind of package that in, like, a little example of, in this case, if you wanted a model that read from the headless CMS, here's what an example of such a little model might look like. So, it's kind of that putting it all together, which I think is nice in the module docs. It could probably also live in the big README at some level. STEPHANIE: Yeah. As you are saying that, I also thought about how I usually go search for tests to find examples of usage, but I tend to get really overwhelmed when I see inline, like, that much inline documentation. I have to, like, either actively ignore it, choose to ignore it, or be like, okay, I'm reading this now [laughs]. Because it just takes up so much visual space, honestly. And I know you put a lot of work into it, a lot of time, but maybe it's because of the color of my editor theme where comments are just that, like, light gray [laughs]. I find them quite easy to just ignore. But I'm sure there will be some time where I'm like, okay, like, if I need them, I know they're there. JOËL: Yeah, that is, I think, a downside, right? It makes it harder to browse the code sometimes because maybe your entire screen is almost taken up by documentation, and then, you know, you have one method up, and you've got to, like, scroll through another page of documentation before you hit the next method, and that makes it harder to browse. And maybe that's something that plays into the idea of that separation between library-esque code versus application code. When you browse library-esque code, when you're actually browsing the source, you're probably doing it for different reasons than you would for code in your application because, at that point, you're effectively source diving, sometimes being like, oh, I know this class probably has a method that will do the thing I want. Where is it? Or you're like, there's an edge case I don't understand on this method. I wonder what it does. Let me look at the implementation. Or even some existing code in the app is using this library method. I don't know what it does, but they call this method, and I can't figure out why they're using it. Let me look at the source of the library and see what it does under the hood. STEPHANIE: Yeah. I like the distinction of it is kind of a different mindset that you're reading the code at, where, like, sometimes my brain is already ready to just read code and try to figure out inputs and outputs that way. And other times, I'm like, oh, like, I actually can't parse this right now [chuckles]. Like, I want to read just English, like, telling me what to expect or, like, what to look out for, especially when, like you said, I'm not really, like, trying to figure out some strange bug that would lead me to diving deep in the source code. It's I'm at the level where I'm just reaching for a method and wanting to use it. We're writing these YARD docs. I think I also heard you mention that you gave some, like, tips or maybe some gotchas about how to use certain methods. I'm curious why that couldn't have been captured in a more, like, self-documenting way. Or was there a way that you could have written the code for that not to have been needed as a comment or documented as that? And was there a way that method names could have been clear to signal, like, the intention that you were trying to convey through your documentation? JOËL: I'm a big fan of using method names as a form of documentation, but they're frequently not good enough. And I think comments, whether they're just regular inline comments or more official documentation, can be really good to help avoid sort of common pitfalls. And one that I was working with was, there were two methods, and one would find by a UID, so it would search up a document by UID. And another one would search by ID. And when I was attempting to use these before I even started documenting, I used the wrong one, and it took me a while to realize, oh wait, these things have both UIDs and IDs, and they're slightly different, and sometimes you want to use one or the other. The method names, you know, said like, "Find by ID" or "Find by UID." I didn't realize there were both at the time because I wasn't browsing the source. I was just seeing a place where someone had used it. And then, when I did find it in the source, I'm like, well, what is the difference? And so, something that I did when I wrote the docs was sort of call out on both of those methods; by the way, there is also find by UID. If you're searching by UID, consider using the other one. If you don't know what the difference is, here's a sentence summarizing the difference. And then, here's a link to external documentation if you want to dive into the nitty gritty of why there are two and what the differences are. And I think that's something you can't capture in just a method name. STEPHANIE: Yeah, that's true. I like that a lot. Another use case you can think of is when method names are aliased, and it's like, I don't know how I would have possibly known that until I, you know, go through the journey of realizing [laughs] that these two methods do the same thing or, like, stumbling upon where the aliasing happens. But if that were captured in, like, a little note when I'm in, like, a documentation viewer or something, it's just kind of, like, a little tidbit of knowledge [laughs] that I get to gain along the way that ends up, you know, being useful later because I will have just kind of...I will likely remember having seen something like that. And I can at least start my search with a little bit more context than when you don't know what you don't know. JOËL: I put a lot of those sorts of notes on different methods. A lot of them are probably based on a personal story where I made a mistaken assumption about this method, and then it burned me. But I'm like, okay, nobody else is going to make that mistake. By the way, if you think this is what the method does, it does something slightly different and, you know, here's why you need to know that. STEPHANIE: Yeah, you're just looking out for other devs. JOËL: And, you know, trying to, like, take my maybe negative experience and saying like, "How can I get value out of that?" Maybe it doesn't feel great that I lost an hour to something weird about a method. But now that I have spent that hour, can I get value out of it? Is the sort of perspective I try to have on that. So, you mentioned kind of offhand earlier the idea of a documentation viewer, which would be separate than just reading these, I guess, code comments directly in your code editor. What sort of documentation viewers do you like to use? STEPHANIE: I mostly search in my browser, you know, just the official documentation websites for Rails, at least. And then I know that there are also various options for Ruby as well. And I think I had mentioned it before but using DuckDuckGo as my search engine. I have nice bang commands that will just take me straight to the search for those websites, which is really nice. Though, I have paired with people before who used various, like, macOS applications to do something similar. I think Alfred might have some built-in workflows for that. And then, a former co-worker used to use one called Dash, that I have seen before, too. So, it's another one of those just handy just, like, search productivity extensions. JOËL: You mentioned the Rails documentation, and this is separate from the guides. But the actual Rails docs are generated from comments like this inline in code. So, all the different ActiveRecord methods, when you search on the Rails documentation you're like, oh yeah, how does find_by work? And they've got a whole, like, paragraph explaining how it works with a couple of examples. That's this kind of documentation. If you open up that particular file in the source code, you'll find the comments. And it makes sense for Rails because Rails is more of, you know, library-esque code. And you and I search these docs pretty frequently, although we don't tend to do it, like, by opening the Rails gem and, like, grepping through the source to find the code comment. We do it through either a documentation site that's been compiled from that source or that documentation that's been extracted into an offline tool, like you'd mentioned, Dash. STEPHANIE: Yeah, I realized how conflicting, I suppose, it is for me to say that I find inline documentation really overwhelming or visually distracting, whereas I recognize that the only reason I can have that nice, you know, viewing experience is because documentation viewers use the code comments in that format to be generated. JOËL: I wonder if there's like a sort of...I don't know what this pattern is called, but a bit of a, like, middle-quality trap where if you're going to source dive, like, you'd rather just look at the code and not have too much clutter from sort of mediocre comments. But if the documentation is really good and you have the tooling to read it, then you don't even need to source dive at all. You can just read the documentation, and that's sufficient. So, both extremes are good, but that sort of middle kind of one foot in each camp is sort of the worst of both worlds experience. Because I assume when you look for Rails documentation, you never open up the actual codebase to search. The documentation is good enough that you don't even need to look at the files with the comments and the code. STEPHANIE: Yeah, and I'm just recalling now there's, like, a UI feature to view the source from the documentation viewer page. JOËL: Yes. STEPHANIE: I use that actually quite a bit if the comments are a little bit sparse and I need just the code to supplement my understanding, and that is really nice. But you're right, like, I very rarely would be source diving, unless it's a last resort [laughs], let's be honest. JOËL: So, we've talked about documentation viewers and how that can make things nice, and you're able to read documentation for things. But a lot of other tooling can benefit from this sort of model documentation as well, and I'm thinking, in particular, Solargraph, which is Ruby's language server protocol. And it has plugins for VS Code, for Vim, for a few different editors, takes advantage of that to provide all sorts of things. So, you can get smart expansion of code and good suggestions. You can get documentation for what's under your cursor. Maybe you're reading somebody else's code that they've written, and you're like, why are they calling this parameterized method here? What does that even do? Like, in VS Code, you could just hover over it, and it will pop up and show you documentation, including the, like, inputs and return types, and things like that. That's pretty nifty. STEPHANIE: Yeah, that is cool. I use VS Code, but I've not seen that too much yet because I don't think I've worked in enough codebases with really comprehensive [laughs] YARD docs. I'm actually wondering, tooling-wise, did you use any helpful tools when you were writing them or were you hand-documenting each? JOËL: I was hand-documenting everything. STEPHANIE: Class. Okay. JOËL: The thing that I did use is the YARD gem, which you don't need to have the gem to write YARD-style documentation. But if you have the gem, you can run a local server and then preview a documentation site that is generated from your comments that has everything in there. And that was incredibly helpful for me as I was trying to sort of see an overview of, okay, what would someone who's looking at the docs generated from this see when they're trying to look for what the documentation of a particular method does? STEPHANIE: Yeah, and that's really nice. JOËL: Something that I am curious about that I've not really had a lot of experience with is whether or not having extra documentation like that can help AI tools give us better suggestions. STEPHANIE: Yeah, I don't know the answer to that either, but I would be really curious to know if that is already something that happens with something like Copilot. JOËL: Do better docs help machines, or are they for humans only? STEPHANIE: Whoa, that's a very [laughs] philosophical question, I think. It would make sense, though, that if we already have ways to parse and compile this kind of documentation, then I can see that incorporating them into the types of, like, generative problems that AI quote, unquote "solves" [chuckles] would be really interesting to find out. But anyone listening who kind of knows the answer to that or has experience working with AI tools and various types of code comment documentation would be really curious to know what your experience is like and if it improves your development workflow. So, for people who might be interested in getting better at documenting their code in the style of module docs, what would you say are some really great attributes of good documentation in this form? JOËL: I think, first of all, you have to write from the motivation of, like, if you were confused and wanting to better understand what a method does, what would you like to see? And I think coming from that perspective, and that was, in my case, I had been that person, and then I was like, okay, now that I've figured it out, I'm going to write it so that the next person is not confused. I have five or six things that I think were really valuable to add to the docs, a few of which we've already mentioned. But rapid fire, first of all, code example. I love code examples. I want a code example on every method. An explanation of expected usage. Here's what the method does. Here's how we expect you to use this method in any extra context about sort of intended use. Callouts for suggested alternatives. If there are methods that are similar, or there's maybe a sort of common mistake that you would reach for this method, put some sort of call out to say, "Hey, you probably came here trying to do X. If that's what you were actually trying to do, you should use method Y." Beyond that, a discussion of edge cases, so any sort of weird ways the method behaves. You know, when you pass nil to it, does it behave differently? If you call it in a different context, does it behave differently? I want to know that so that I'm not totally surprised. Links to external resources–really great if I want to, like, dig deeper. Is this method built on some sort of, like, algorithm that's documented elsewhere? Please link to that algorithm. Is this method integrating with some, like, third-party API? You know, they have some documentation that we could link to to go deeper into, like, what these search options do. Link to that. External links are great. I could probably find it by Googling myself, but you are going to make me very happy as a developer if you already give me the link. You'd mentioned capturing inputs and outputs. That's a great thing to scan for. Inputs and outputs, though, are more sometimes than just the arguments and return values. Although if we're talking about arguments, any sort of options hash, please document the keys that go in that because that's often not obvious from the code. And I've spent a lot of time source diving and jumping between methods trying to figure out like, what are the options I can pass to this hash? Beyond the explicit inputs and outputs, though, anything that is global state that you rely on. So, do you need to read something from an environment variable or even a global variable or something like that that might make this method behave differently in different situations? Please document that. Any situations where you might raise an error that I might not expect or that I might want to rescue from, let me know what are the potential errors that might get raised. And then, finally, any sorts of side effects. Does this method make a network call? Are you writing to the file system? I'd like to know that, and I'd have to, like, figure it out by trial and error. And sometimes, it will be obvious in just the description of the method, right? Oh, this method pulls data from a third-party API. That's pretty clear. But maybe it does some sort of, like, caching in the background or something to a file that's not really important. But maybe I'm trying to do a unit test that involves this, and now, all of a sudden, I have to do some weird stubbing. I'd like to know that upfront. So, those are kind of all the things I would love to have in my sort of ideal documentation comment that would make my life easier as a developer when trying to use some code. STEPHANIE: Wow. What a passionate plea [laughs]. I was very into listening to you list all of that. You got very animated. And it makes a lot of sense because I feel like these are kind of just the day-to-day developer issues we run into in our work and would be so awesome if, especially as the, you know, author where you have figured all of this stuff out, the author of a, you know, a method or a class, to just kind of tell us these things so we don't have to figure it out ourselves. I guess I also have to respond to that by saying, on one hand, I totally get, like, you want to be saved [chuckles] from those common pitfalls. But I think that part of our work is just going through that and playing around and exploring with the code in front of us, and we learn all of that along the way. And, ultimately, even if that is all provided to you, there is something about, like, going through it yourself that gives you a different perspective on it. And, I don't know, maybe it's just my bias against [laughs] all the inline text, but I've also seen a lot of that type of information captured at different levels of documentation. So, maybe it is a Confluence doc or in a wiki talking about, you know, common gotchas for this particular problem that they were trying to solve. And I think what's really cool is that, you know, everyone can kind of be served and that people have different needs that different styles of documentation can meet. So, for anyone diving deep in the source code, they can see all of those examples inline. But, for me, as a big Googler [laughs], I want to see just a nice, little web app to get me the information that I need to find. I'm happy having that a little bit more, like, extracted from my source code. JOËL: Right. You don't want to have to read the source code with all the comments in it. I think that's a fair criticism and, yeah, probably a downside of this. And I'm wondering, there might be some editor tooling that allows you to just collapse all comments and hide them if you wanted to focus on just the code. STEPHANIE: Yeah, someone, please build that for me. That's my passionate plea [laughs]. And on that note, shall we wrap up? JOËL: Let's wrap up. STEPHANIE: Show notes for this episode can be found at bikeshed.fm. JOËL: This show has been produced and edited by Mandy Moore. STEPHANIE: If you enjoyed listening, one really easy way to support the show is to leave us a quick rating or even a review in iTunes. It really helps other folks find the show. JOËL: If you have any feedback for this or any of our other episodes, you can reach us @_bikeshed, or you can reach me @joelquen on Twitter. STEPHANIE: Or reach both of us at hosts@bikeshed.fm via email. JOËL: Thanks so much for listening to The Bike Shed, and we'll see you next week. ALL: Bye. AD: Did you know thoughtbot has a referral program? If you introduce us to someone looking for a design or development partner, we will compensate you if they decide to work with us. More info on our website at: tbot.io/referral. Or you can email us at referrals@thoughtbot.com with any questions.Support The Bike Shed
undefined
Feb 27, 2024 • 40min

416: Multi-Dimensional Numbers

Joël discusses optimizing slow SQL queries in a non-Rails app. Stephanie shares canary deploys in a Rails upgrade. The episode focuses on multidimensional numbers, treating data structures as mathematical objects. They explore simplifying operations on T-shirt sizes by defining custom classes for complex data types.
undefined
Feb 6, 2024 • 31min

415: Codebase Calibration

The hosts discuss a delightful and cute Ruby resource for understanding common errors. They delve into exploring new codebases, the benefits of new team members, and finding balance between change and pragmatism.
undefined
Jan 30, 2024 • 32min

414: Spike Tasks

Joël talks about using Turbo, a JavaScript framework for interactivity on websites. Stephanie shares her goal of working from the office more and enjoying watching chickadees. They discuss the concept of 'spikes' in software development, explaining how they help with decision-making and prioritization in complex codebases.
undefined
Jan 23, 2024 • 34min

413: Developer Tales of Package Management

Stephanie talks about retiring an internal link-shortening app, while Joël discusses debugging in ActiveRecord. They delve into package management nuances, comparing system and language packages like Homebrew and RubyGems. The hosts explore the complexities and implications for developers.
undefined
Jan 16, 2024 • 32min

412: Vertical Slices

The podcast discusses a unique bug that only crashes a page in January. They also talk about structuring work-from-home environments and the concept of 'vertical slice' development. They share insights on collaborative work, iterative development, and strategies for efficient software engineering.
undefined
Dec 19, 2023 • 39min

411: Celebrating and Recapping 2023!

In this episode, Stephanie hosts a holiday cookie swap while Joël talks about a hackathon. The hosts recap their favorite episodes, books, and blog posts of 2023. Topics include software writing heuristics, discrete math, Sandi Metz's rules, domain modeling, sequence diagrams, and more. They also discuss sustainable web development, belonging in software teams, and preemptive pluralization.
undefined
Dec 12, 2023 • 32min

410: All About Documentation

The podcast discusses challenges with handling JSON in a Postgres database using ActiveRecord, bookmarklets like 'Check This Out' for Libby, and the importance of accurate documentation, highlighting the pitfalls of using constants in code for system behavior.
undefined
Dec 5, 2023 • 28min

409: Support & Maintenance and Rotating Developers

Stephanie recommends 'Blue Eye Samurai' and a new ceramic pot for cooking. She discusses her shift to a part-time support role at thoughtbot, emphasizing communication and workplace flexibility. The podcast explores client-centric approaches and team collaboration in support & maintenance teams.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app