Aaron VonderHaar returns to the show for a deep dive on automated tests, test-driven development, and elm-program-test, a new high-level test framework for Elm.
Thank you to our sponsor, Culture Amp.
Special thanks to Xavier Ho (@Xavier_Ho) for editing and production of this episode!
Recording date: 2 April 2020
Guest
Aaron VonderHaar (@avh4)
Show Notes
00:00:00 Intro
Elm Town 50 – My favorite thing is when they don't even notice
00:01:42 TDD and automated tests at NoRedInk
RSpec
00:02:19 elm-program-test
elm-program-test
elm-test
Capybara
Selenium
Test.Html.Query (was elm-html-test)
Test.Html.Event (was elm-html-test)
00:04:36 Why write automated tests
00:06:44 Test-driven development
00:08:55 Tests vs types
00:11:33 Test-driven development (continued)
00:13:25 Red, green, refactor
TDD (Test-Driven Development) Traffic Light
00:16:18 Test-driven development (continued)
“Make Data Structures” by Richard Feldman
“Making Impossible States Impossible” by Richard Feldman
00:20:23 Testing at the right level
00:24:53 Testing culture in a team
00:26:43 The need for elm-program-test
Robolectric
00:30:22 The elm-program-test API
elm-program-test API documentation
00:32:12 Standing in for the Elm runtime
00:35:34 Testing Elm commands
Elm Town 46 – You Get All Of The Chapters
00:37:49 Standing in for the Elm runtime (continued)
elm-testable
00:39:46 Resolving and asserting on HTTP requests
00:43:08 Other supported effects
00:45:12 Modelling the user interface in your test suite
00:47:18 Smart DOM matchers
00:49:05 Keyboard focus tracking
00:49:48 elm-program-test vs non-Elm alternatives
00:51:53 Stability and feature-completeness
00:53:00 elm-program-test at NoRedInk
00:54:42 Testing the interface with the back end
Pact
00:55:58 Related talks by NoRedInk colleagues
"Writing Testable Elm" by Tessa Kelly
"A Month of Accessible Elm" by Brooke Angel
00:56:53 Sign-off and outro
Transcript
[00:00:00] Kevin Yank: Hello and welcome back to Elm town. It's your old friend Kevin, and I am rejoined by Aaron VonderHaar back for his second episode. Welcome back, Aaron.
[00:00:09] Aaron VonderHaar: Hey, Kevin, it's great to be back.
[00:00:10] Kevin Yank: Aaron, you were with us a few weeks back now to talk about your work on elm-format, among other things, including a new refactoring tool that, we'll be releasing very soon, I imagine, by the time our listeners hear this. If you haven't caught that episode,
[00:00:28] go back two episodes and have a listen to that first chat we had with Aaron, because we will be picking up on the threads of that conversation here today with a focus on elm-program-test, and testing in general for Elm. Richard Feldman is not the only person that, NoRedInk who cares about testing.
[00:00:48] Am I to understand that right, Aaron?
[00:00:50] Aaron VonderHaar: That's right. Test driven, and having a lot of automated tests has been pretty core and NoRedInk, for everybody there. So, yeah, we all get into it now.
[00:00:58] Kevin Yank: I'm actually interested in hearing about the background of that, cause I would say that's maybe not true of every company that's using Elm today. But I'm aware that NoRedInk started as a Ruby on Rails application, where in that ecosystem, in the Ruby ecosystem, test driven development and automated testing in general is very strongly embedded in the culture of that programming community.
[00:01:23] Is there a sense that NoRedInk's investment in testing and belief in testing flowed out of that background in Rails or, not?
[00:01:30] Aaron VonderHaar: Yeah, I'd say that is definitely fair to say. RSpec is the big testing framework there, and that was definitely encouraged by Rails, the framework as well.
[00:01:40] Kevin Yank: And so I guess a lot of developers who are used to having a strong testing framework on the back end, they are invested in the idea of having that on the front end with Elm. And that leads to people like Evan (sic.) working on elm-test and yourself working on this new package, elm-program-test.
[00:01:59] Aaron VonderHaar: I mean, web apps in general these days, testing is highly thought of across a lot of JavaScript platforms as well. So it's not unique to Elm, but Elm kind of being a totally separate language, even though it works in the JavaScript ecosystem, kind of needs its own tools and its own way of doing things that supports the language itself.
[00:02:19] In a nutshell, elm-program-test is an Elm package that you can use alongside elm-test, and it lets you write tests that are maybe at a higher level of testing than what you can do with elm-test out of the box. So the types of tests you can write in elm-program-test are similar to maybe what you might write in tools like Capybara in Rails or Selenium where you write tests saying, okay, the user is loading this page,
[00:02:48] the user is going to click this button, fill in text in the input with this label, navigate to this other page, submit the form, and then check that certain things appear on the page afterwards. So writing tests at that high level of the user perspective, is what program-test is designed for. And I like to think I've made a pretty nice and easy-to-use API to let you do that in a powerful way.
[00:03:10] Kevin Yank: So if elm-test by itself is for unit testing, elm-program-test is for the other kinds of tests you're going to write.
[00:03:20] Aaron VonderHaar: Yeah. So certainly if you have a function or a module with pure functions in it, using the built in stuff in elm-test is the place to start. After a while, Noah Hall developed the elm-html-test, which is now incorporated into the elm-test library itself, that lets you check things on the HTML values that get produced by your views. elm-program-test goes a step further and actually lets you incorporate your init function, your update function and your view function and all the messages and everything into a single unit that you can just run user steps on and inspect the output of the view and kind of interact with your program and check what the user's going to see afterwards.
[00:04:03] Kevin Yank: So tests at the program level. Thus the name.
[00:04:05] Aaron VonderHaar: It took a while actually to get to that name, but I think it reflects the purpose now.
[00:04:10] Kevin Yank: I can imagine, cause as you were describing the type of tests it's for, those tests I've heard called acceptance tests and feature tests and end-to-end tests. And I imagine all of those terms, were on the table to name this framework. And in the end, you landed on, “Well, actually, what do we call the thing we're testing here in Elm?
[00:04:29] “We call it a program.” So yeah, it makes sense once I understand what it does.
[00:04:34] Aaron VonderHaar: Yup. Exactly.
[00:04:36] Kevin Yank: Hmm. Well, let's take a step back, here. We're going to get to the details of elm-program-test by the end, here, but I'd like to come at this, from kind of a first principles, “What are you trying to achieve and why?”
[00:04:48] question here, cause I dare say there are a lot of people who use Elm, who are listening to this, who use it as a hobbyist or, as a first language of this type, are exploring the space and are maybe, maybe not necessarily feeling the need for a test framework as the first thing they want to invest their energy in as they're exploring Elm.
[00:05:11] So tell me a little bit about your philosophy around testing. When should you start testing a piece of Elm code what should that look like?
[00:05:20] Aaron VonderHaar: If you're working on a production system where you have a job that depends on code that you're writing working with other code, and many people are touching it, and it's a system that's used by real people in that environment, there's certainly value in testing in that it protects you from bugs, it protects you from future regressions, that sort of thing.
[00:05:41] But there's another aspect to testing, which, I think last time we talked about some of my background, previously working at Pivotal Labs doing agile consulting, and the way I learned to use testing in development then, and specifically the test driven development practice, is an approach to kind of integrating writing tests and thinking about tests
[00:06:05] into the way that you actually write the code. And you can do that through test driven development in a way that helps you think about the design process for designing the code, both the architecture of your code and the implementation. So I'd say if you're working on a hobby project that maybe doesn't have the correctness requirements of a paid job, it's certainly up to the individual.
[00:06:30] But personally, I find a lot of benefit in the process of using tests to think about and educate and even help me find correct or even better solutions faster. So I kind of use it as a thinking tool as well.
[00:06:44] Kevin Yank: Right. So for someone who hasn't practiced TDD, or maybe is hearing the term for the first time, how would you describe that process and the experience of using it as a thinking tool, as you put it?
[00:06:56] Aaron VonderHaar: When you get into kind of the details of test driven development, one of the big things is to think about the context of the caller, the person calling your code if you're writing an API, or the user that's using your site if you are building an application. So the way it tends to play out for me is I'll have some feature that I want to implement.
[00:07:19] Rather than just looking at it as a specific list of things to implement and make work, I want to look at that feature and think, okay, who is going to be using this feature? In the case of NoRedInk, we have a lot of teachers and students, so we might have. Like a student leader board for a class. We think about who is a teacher, who's going to be using this feature, what's going through their mind when they come to interact with it?
[00:07:44] And then from that you kind of walk through a scenario of a real teacher in your mind, actually using the feature and try to write tests from that. Kind of the process of TDD is you think about, okay, if you tried to go through this scenario as a user, what's the first thing that the site currently doesn't do that you would run into in trying to go through the workflow and that's your first test.
[00:08:10] You write a test saying, okay, as a teacher want to see the leaderboards for my students. That's your first test, and you'd say, okay, what page is that going to be on? The body of the test would be load the page where it's supposed to be, stub out some data about the class and maybe the grades in the class, and then check that certain things appear on the leaderboard, which are the things that the teacher would be looking for,
[00:08:35] like maybe the name of the first student and the number of points that student has.
[00:08:39] Kevin Yank: Assuming that page doesn't exist yet, that test will immediately fail.
[00:08:43] Aaron VonderHaar: Yup, exactly. And in the case of a strictly type language like Elm, it may not even compile yet. So, you can often think of compiler errors as a sort of test failure in a way.
[00:08:55] Kevin Yank: That does get at something, which I've heard said and said myself a number of times over the years, which is having a strongly typed language, like Elm means you have to write fewer tests. Is that something you believe?
[00:09:09] Aaron VonderHaar: I would say that is sort of true. Yeah, I definitely know that argument of types versus tests. there was an interesting way I saw of thinking about this, where in a functional language functions are types of values in a sense, like you can have a variable that contains a function. So when you're writing tests, you can think of that as reducing the number of possible functions that exist that could be implementing the type annotation that you've defined for that function.
[00:09:42] So in that sense, there's some similarities between tests and types in that by changing the type annotation, you could restrict what kind of inputs the function can take or what kind of outputs it can produce, that reduces the number of possible implementations you could have for that function, which hopefully reduces the number of possible incorrect implementations you could have.
[00:10:04] And the same is true for tests. By adding a test, you are in a different way and using different tools, reducing the number of possible valid implementations, for whatever the function is.
[00:10:15] Kevin Yank: All right. And so if your goal is to, first narrow the possibility space of what this code might do, if you're using a type to do that, maybe that's one less test that you have to write. But speaking for myself, those would be the, probably the lowest value tests I would be writing in another language.
[00:10:34] And the higher value tests are still worth writing.
[00:10:36] Aaron VonderHaar: Yes, exactly. So like an example is in JavaScript, let's say, you might want to write a test saying that when given these inputs, the particular function does not return null. It always returns some default value or something like that, in case of an error. In Elm, that's an example of something where the type can handle that.
[00:10:58] You can enforce in the types that whatever the function will never return null. But also on the other side of the coin, in JavaScript, you aren't always writing a test that your function does not return null. For every function, certain functions, it makes sense because you know it's a risk depending on the inputs, but other functions, you know, okay, it's never going to come up,
[00:11:19] or if it does come up, it's obvious what's wrong, in which case you don't necessarily need to write those tests. But certainly in Elm, there is a lot of safety that you can get basically for free or just for thinking about the types. Whereas in JavaScript, you don't have that protection.
[00:11:33] Kevin Yank: So back to this TDD scenario, we've created a test to where the teacher wants to go to the page and see the scoreboard and, the compiler fails, or if you happen to luck out and the compiler passes, the test fails because the page doesn't exist. What next?
[00:11:50] Aaron VonderHaar: I tend to think about different levels. So the test we just talked about is a fairly high level test. Maybe you heard the words high level when I talked about elm-program-test.
[00:12:00] Kevin Yank: Yeah. I was going to say, this sounds like the kind of test you would write without elm-program-test.
[00:12:04] Aaron VonderHaar: Right, exactly. And that's in fact kind of why I wrote elm-program-test to fill that gap. But even if you aren't writing that test, you can think about that as a test case anyway and say, okay, I'm going to start my local development server. I'm going to login.
[00:12:19] I'm going to try to go to that page. You see it fail. In a sense, you can think of that as a manual test case. In either case, the next step would be, okay, why doesn't it fail? We've already diagnosed that is because the page doesn't exist. Okay, we'll go create the page. If there's any logic related to the creation of the page that's non-trivial,
[00:12:39] we might want to write a test for a lower level detail. For instance, maybe we need to decode some page flags, for loading the page or something like that. This is getting a little bit contrived as a first page for a new test, but I think about it in layers where I write this kind of high level test describing the user behavior,
[00:12:59] then I see what's failing. I think about why it's failing, and then I figure out, okay, what code needs to exist for that first step to be able to move on. And sometimes at that point I go down and write a lower level or a more unit test on some specific module maybe to implement some function that I'm going to need to wire up the test at the higher level.
[00:13:24] Kevin Yank: Often when talking about TDD, we talk about the red, green, refactor cycle. Is that something that is still a real part of your workflow today or is that more like something you learn on your first day of TDD, but you kinda let go of the details of that process over time?
[00:13:40] Aaron VonderHaar: It's definitely something I keep in mind, and I feel like I follow that pretty consistently, but also it's not like I have a stoplight on my desk, like, to remind me—
[00:13:51] Kevin Yank: What mode am I in? Yeah.
[00:13:53] Aaron VonderHaar: Right. Yeah, I think it's something I tend to maybe reflect back on if I am in a position where I'm getting stuck and I can think, okay,
[00:14:04] let me run through kind of my checklist of what I need to do. Do I understand what behavior I'm trying to implement right now? Am I trying to take too big of a bite right now? Am I getting lost in the weeds? Is there some simpler version of the problem that I could focus on making work right now?
[00:14:23] An example that comes up for me a lot is, okay, maybe I'll have a failing test and I know that to make that test pass, it's going to require both maybe writing some information into the database and then reading it out later. I guess this would be in the case of a Ruby test where we're on the back end. But often you'll have a test failure where to make it work, you have to do something to both change what's getting stored internally and something that's being used, reading that internal state to produce some outcome. And often what I'll do in that case is I will make the test pass by just hard-coding the output value somewhere in the code, which the way that helps is that it helps me find the exact location in the code where I'm going to need to have some logic to decide what to output.
[00:15:14] And then often that'll make it obvious to me what specific value I need stored in the internal state that I need to be able to use to make that determination. So it kind of breaks the overall problem into. The output and the input step and you can tackle those separately.
[00:15:30] Kevin Yank: For those who may not have heard of red, green refactor, the idea is you first write a failing test and your tests are then red because they fail. You then write the necessary implementation to make that test pass, and the tests turn green, and then having that passing test suite, you are then allowed to refactor your code to try to make the structure of it a better expression or a more maintainable version of the set of features that your tests guarantee will exist.
[00:16:01] And as long as your test say green, you know, you haven't broken anything, and when you're happy with the shape of your code, it is time to write another failing test before you make your code do anything new. And so you go through that red, green refactor cycle all the way to success. Now, in your example of the scoreboard for the teacher, you mentioned hard-coding
[00:16:23] a value. And the thing that came to mind was, okay, as we are writing this scoreboard screen, we might start with the empty state. So there are no students have taken this test. So there are no scores yet. And so the first test I might write is I go to the scoreboard screen and it says “no scores”.
[00:16:44] And the easiest way to make that test that is currently failing pass is to hard-code a view that outputs the HTML “There are no scores” – at all times – and your test turns green. And the idea of red, green, refactor is before you then go on to add more features to this page, you need to add another test that says, oh, and if there happens to be some data in the database, then I get a different result on the page.
[00:17:13] And you add that feature while ensuring that the old empty state continues to work.
[00:17:19] Aaron VonderHaar: Yup. Exactly. And you were asking about how strictly I follow that. An interesting optimization for the example you just laid out is something that actually I learned to do from a product manager. But the idea is that, okay, the empty state seems like a natural starting case, but if you try to think about, okay, can I make my first test be a test
[00:17:44] that is the most common test that's going to happen in real life. So in this case, okay, most of the time teachers are going to have students. So if you make your first test be having actual data, it can kind of be a shortcut in a sense because it forces you to drive out more of the implementation upfront,
[00:18:02] and also gives you some flexibility from the product side to maybe descope the fine tuning of the empty state later, if you want to get a feature released sooner.
[00:18:11] Kevin Yank: One of my pet peeves is opening a test file and having to scroll past two pages of really artificial, edge-casey tests before I get to the meat of what does this thing actually do,
[00:18:23] so yeah, I like that approach. When I find myself leveraging the rigorous version of TDD is when I'm writing a piece of code that I don't necessarily fully understand yet, or I can't quite hold it in my head.
[00:18:42] Richard Feldman has done a number of talks about how, you want to get your data representation right: he's given a talk about, you know, focus on designing data structures; he's given another talk on making impossible states impossible. And all of those things come back to getting the data structure right and everything flows out of there.
[00:19:03] But every now and then I am solving a problem where I go, wow, I honestly don't know what the right data structure is going to be for this. There's just one too many features to support with a single Elm type definition, and how am I going to implement those constraints? How many of them will be provided by the type system?
[00:19:23] How many of them will be enforced by the functions that act on that opaque type? And if I don't know, that's when I break out TDD and I start going, all right. One requirement at a time, here. Start with what is the most important thing this has to do. Get a green test that guarantees it will always do that, and then start thinking through edge cases,
[00:19:43] start adding constraints one at a time. And invariably my implementation code starts to get ugly at that point, and forcing myself to wait until my test suite is green before I start messing with alternative approaches is really valuable. It kind of forces me to slow down, to prove to myself that an idea I'm experimenting with, actually satisfies all of the constraints that I can't hold in my head all at once yet.
[00:20:12] Aaron VonderHaar: Yeah, that's a great example. And a situation that is easy to run into when you aren't quite sure what data structure or what algorithm you should be using, internally, a lot of times it's easy to do what I was describing earlier and jumped down to testing at the lower level too soon.
[00:20:30] And you might start writing unit tests that are very specific to the data structure that you chose, in which case, if you later decide you want to totally change that data structure, you now have a whole bunch of tests that were tied to that old data structure that you now have to throw out or rewrite in terms of the new one.
[00:20:50] So that's an example where if you have that uncertainty, it's good to be able to identify that and have your testing at a little bit higher of a level. Think about kind of the high level API that you're trying to implement and what behaviors and outcomes are being observed on the system as a whole.
[00:21:08] So if you write your tests at that level, it gives you the flexibility to entirely change how it's all implemented, but still have those tests providing coverage without any risks the tests will be invalid.
[00:21:20] Kevin Yank: So what I'm hearing is want to work from the outside in. Before you start defining the criteria for a tiny module in the middle of your program, you want to start by justifying its existence at the outer levels. And so everything starts by an elm-program-test test that says, this feature should do this thing for this user under these conditions.
[00:21:43] You work your way in to the point where you are writing a single module in support of that.
[00:21:49] Aaron VonderHaar: Yeah. Be able to think about and identify the different levels that you could test at, and make a choice that's the correct one for whatever situation you're in, about how many tests at which level you want to do. We actually have some variants at NoRedInk. We have at the moment, two different teams that are focused on building user facing features, and do most of the Elm work at the moment.
[00:22:14] I'm on one of those, and my team tends to, on the one hand, have less involves features. Like we do a lot of user interface work for the teachers and changing the overall not very deep features, but features across the site. Let me give an example. So like, our team is responsible for like the dashboard that shows all the assignments or the grade book that shows a summary of all the grades, which can have a lot of Elm code, but it's not a particularly algorithm-heavy code.
[00:22:48] Whereas the other feature team has been working on, like our new self-reviews feature, which is this entire interactive assignment with multiple steps. And you kind of worked through an essay section by section where there's a little quiz and you can highlight stuff. So there's kind of a state machine involved on the back end.
[00:23:07] We're using a rich text editor that we had to integrate with Elm, be able to merge updates if there's conflicting changes in multiple browser tabs, things like that. So that's kind of like a deeper feature, and that team ends up doing a lot more of the more focused unit tests compared to my team where we have a lot more features that are—
[00:23:27] We only need to go as deep as kind of thinking about the user experience because the code to wire it up isn't that complex from an algorithmic sense.
[00:23:36] Kevin Yank: You were saying that informs the testing practice of your respective teams.
[00:23:41] Aaron VonderHaar: Yeah I like to always start by thinking about what's the highest level test, but there are tradeoffs on the spectrum too. Like if you test too high, one thing is your test can be slow if, for instance, you're using Capybara or Selenium or something like that.
[00:23:56] Luckily on the Elm side, even elm-program-test is very fast even for high level tests. But the other downside of high level tests is it can be harder to debug what is exactly wrong when a test fails. You might get a failure like, “Oh, the, the button with the submit text doesn't exist on the page.”
[00:24:15] But it doesn't tell you why it doesn't exist. Whereas having some lower level unit tests gives you some brittleness in that a lower level interface is now protected by tests, and if you need to change the API of that module, your tests will break or have to be rewritten, but it gives you the benefit that if one of those lower level tests fails, it's going to be a much clearer idea of like where exactly the problem is in your code.
[00:24:43] So there's kind of a balance you need to, you need to make between how many of each kind of test are appropriate for whatever your situation is and the experience level of the people on your team, and so forth.
[00:24:53] Kevin Yank: How do you and your team think of and talk about that balance? Are there established norms? If you're reviewing a code change that deviates from those norms, it will raise your eyebrows and cause you to question it. Or is there any tooling around things like code coverage and things like that?
[00:25:11] Aaron VonderHaar: Yeah, that's an interesting question. I believe there's a one or two Elm code coverage tools, but I haven't personally played with them, and we don't use them consistently at NoRedInk
[00:25:22] Kevin Yank: The thinking behind the question is, for me, this is one area where front end tests and back in tests differ significantly. Maybe it comes back again to the difference between a dynamic language like Ruby or a statically type language
[00:25:37] like Elm. But on the back end, it is way more common in my experience to have the, that tooling to go the code you are contributing must have a minimum level of test coverage before it is considered acceptable. Whereas on the front end. At least at Culture Amp, we kind of trust our engineers to think about and make good decisions about that and we trust them to hold each other accountable to that in code reviews.
[00:26:05] But it is a much more informal process that is difficult to set clear expectations around in a growing team.
[00:26:11] Aaron VonderHaar: Yes. Yeah. NoRedInk is definitely on the informal side there. Personally, my recommendation is just that aligning a team on that is pretty much like any other team process, it's going to require good communication, as much communication as possible, typically more than you think is going to be necessary.
[00:26:33] And ideally pair programming is a great way to really get people to talk about differences of opinion that they didn't realize they had until they started working together.
[00:26:43] Kevin Yank: Let's start to talk about elm-program-test a bit. I said at the beginning that this was new to me, but obviously this, this framework has been around a little while to, to get to a third major version, unless, you know, that is just semantic versioning at play and you had to put a couple of breaking changes in there, early in its life.
[00:27:02] But, was there a time when NoRedInk was shipping Elm code without the ability to write program level tests in Elm?
[00:27:12] Aaron VonderHaar: Even today, we still have a fair amount of our front end code does not have tests at that level. html-test itself was kind of an early attempt at answering the question of what kind of tests are appropriate and necessary to write for front end code in Elm. That turned out, in my opinion, to not be particularly useful for applications because you ended up unit testing your view function, which
[00:27:40] by itself, the view function is very declarative. You tend to not have a lot of complicated logic in your view function. So where Elm programs get interesting is the interaction between the messages that can get produced by the view or through other means and how the update function processes those, and in particular when you start having a sequence of interactions, how the state of the program evolves over time, as a sequence of messages gets played back.
[00:28:10] So being able to kind of test the view function and the update function together as a unit, and have a single test. So in our case, the origins of elm-program-test came when we were redesigning the assignment form at NoRedInk where teachers can create assignments and we have seven different kinds of assignments, all with different kinds of contents.
[00:28:32] So the page itself was pretty complicated. And having only unit tests kind of available in elm-test at the time, you tend to get very, a very confusing set of tests when you have a test that reads like, okay, given that the teacher has already selected these 10 options in this way, then we're rendering a view that has a button that when clicked, produces this message, and then you have a separate test saying that, okay, given that you have this initial state, when this message is processed, here's the next state.
[00:29:06] And it's just very hard to read and hard to understand and just grows very, very fast to write tests that way when you have a complex page.
[00:29:14] Kevin Yank: You end up with a bunch of isolated tests about a single transition from one state to the next, and the big picture of what is the system supposed to do gets lost. And bugs can fall into those cracks.
[00:29:26] Aaron VonderHaar: Yup. Exactly.
[00:29:28] There was kind of a choice of like, okay, do we just reduce and slow down on the amount of tests we're writing and kind of step away from having as much coverage of this page? Or do we continue to write tests that way? And to me there was the other solution of how can we write a testing API that makes tests that we want to write and that are easy to write and read easier to produce? And actually my background at Pivotal Labs, I've worked on a bunch of test framework API in the past. There's the Robolectric framework for Android, and we did an experimental one while I was there for iOS apps. That was kind of a similar type of high level API where you could specify interactions with the user interface and see what kind of effects it produced.
[00:30:15] So having that background, it was natural to me of like, oh, we just need to start writing a new API to, for this.
[00:30:22] Kevin Yank: So let's talk about that API. How do elm-program-test test suites look different from the test suites that people might already be writing with elm-test.
[00:30:31] Aaron VonderHaar: Typically in elm-program-test, you are going to typically define a top level definition called start, I like to call it, which basically sets up your program. It needs to specify the init function, your update function, your view function – and we'll talk about, commands and how to test commands and subscriptions in a minute –
[00:30:52] but you set that all up in the first place and that produces a value of the type called ProgramTest, which basically represents your program and also its current state in the test world. And it also tracks other things like, any pending HTTP requests or other effects that are being simulated, by elm-program-test. So your typical test case is gonna say start, and then the next line you'll pipe it to some test program command, typically clickButton or fillIn and you'll say like, oh, clickButton "next", or fillIn and you'll give the label of the input field to fill in and the value to put in. And then there's another set of functions, that all start with the word expect, or there's an ensure version if you need to do multiples.
[00:31:42] So you can say, ensure page has, and then give some html-test selectors, like ensure that the page has a certain piece of text on it or ensure that there's a heading with certain text in it or ensure that there's a button with this class on it, and so forth. So you basically just string a bunch of those together,
[00:32:00] and if you want, you can alternate – click some things, check things on the page, click other things – and kind of go through an entire workflow as much as you feel is appropriate for whatever particular test you're trying to set up.
[00:32:11] Kevin Yank: It sounds to me like there would have been a lot of work in modeling the outer layer of that system that you are testing there. like most Elm code that we write, we get to assume there is an Elm runtime there doing all of the things that we ask it to do by returning it commands, and we also assume there is a browser there that is, you know, maintaining the DOM and, sending us messages when the user interacts with it.
[00:32:40] And in a way to write the kind of tests you write, your testing framework kind of needs to step outside of that system and poke at it from the outside. So does that mean you have had to basically emulate the Elm runtime itself and, at least a simplified version of a web browser?
[00:33:00] Aaron VonderHaar: Yes, to some degree. And a big answer to that is something I learned from working on the Android test framework I mentioned, Robolectric, is you can kind of be as a TDD approach here to where your test framework only needs to implement the features that people actually need for their tests that exist so far.
[00:33:20] So there's a lot of, for instance, web APIs that are available in Elm that elm-program-test doesn't support yet, but it's kind of set up in a way that that can be added, hopefully if people like this and have unusual, effects that they need to simulate and so forth. Hopefully people will start contributing some PRs to add and flush out the API.
[00:33:42] But it's definitely been an incremental approach in developing it. And I think towards the end of last year, I decided, okay, I'm going to make an effort to add good documentation to that, clean up the API, and it was actually version three that was the first that I started publicizing it.
[00:33:57] Kevin Yank: I'm trying to get my head around some aspect of it that I think you're going to clarify when you explain how you test commands and update functions and subscriptions. But the thing that's going through my head is, if my program inside of an elm-program-test suite makes an HTTP request, does that HTTP request actually occur by default? Are those things, mocked or stubbed by default or by exception?
[00:34:23] Aaron VonderHaar: Oh yeah, that's an interesting question. It comes down to a subtlety in Elm where when you use the HTTP library in Elm and you call get or post or whatever, you get back a command. What's interesting to note is that the commands in Elm, when you call a function that returns a command, it hasn't actually done anything yet.
[00:34:48] The command basically represents a piece of data that when the Elm runtime receives that command from your update function, it interprets that and does the actual effect. So, stemming from that, when you run elm-test, there's basically no mechanism for commands to get to the Elm runtime. Kind of, the test runner is hiding all of that.
[00:35:09] So to answer that question, any commands you produce, HTTP requests, when you run them in tests, there's in fact no way for them to actually get executed and actually performed as real HTTP requests.
[00:35:23] Kevin Yank: Okay, so elm-program-test isn't breaking any new rules. There are no runtime environment exceptions specifically for this test framework is what I'm hearing.
[00:35:34] Aaron VonderHaar: So being able to test commands, program-test provides a way to do it, but it is certainly not the ideal. So the way it works now, and actually, on the elm-program-test documentation, I kind of wrote a guidebook of walking through step by step how to do what I'm about to describe.
[00:35:51] But basically at the moment, because Elm has command as an opaque type and there's really no way in Elm code to be able to take a command and look at it and figure out what it represents – only the Elm runtime can do that. So out of necessity at the moment, the way elm-program-test works is that if you're going to use it to simulate commands or subscriptions, you basically have to refactor your program to define a new data type that you define, that lists out all the possible, commands that your program can produce.
[00:36:31] I tend to call that type Effect and it's just a union type that you would define. And then you have a separate function that takes values of that new type and turns them into the real commands. And then on the testing side, you have to provide a separate function that basically looks the same, that takes your new type and turns it into the simulated commands that elm-program-test can read.
[00:36:57] So that's the way it works now, but I believe Richard actually talked about this in the episode he was in, about looking at ways for elm-test itself to provide some JavaScript code that would be able to eliminate that step and allow elm-test to be able to inspect commands,
[00:37:15] which would support features of elm-program-test and be able to do everything you can do now just with a bit less boilerplate and without having to refactor your current program before you can test commands and so forth.
[00:37:27] Kevin Yank: For listeners who want to go back and hear the last time Richard was on talking about elm-test, that's Elm Town episode 46, in September 2019. He was on to talk about the release of Elm 0.19.1, and we talked a little bit about what was coming in elm-test and it definitely is forming a big picture now, seeing elm-program-test landing.
[00:37:49] Aaron VonderHaar: You asked earlier about the complexity of simulating the Elm runtime. With the current solution, it's relatively straightforward. Basically, I have, a union type defined internally in elm-program-test that defines all the possible commands that can be simulated at the moment, which is a growing number, but basically HTTP, the sleep timer and a couple of others. I guess you can deal with tasks and things like that.
[00:38:17] But actually previously, another precursor to elm-program-test, I was separately looking into how to test commands and have this thing in called elm-testable, where I was actually trying to write native code that could inspect the internals of commands and figure out what they were and destructure them.
[00:38:36] And in that version of testing, I actually went through the work of writing Elm code that simulated the effect managers that Elm has internally and, like, queuing up all of the events that would happen in the event manager queue, and processing those. So that was actually a lot of complexity, but ended up not really being needed
[00:38:59] with this new approach, as long as you provide a way to tell elm-program-test what commands you're trying to produce. It's relatively straight forward to implement that internally now.
[00:39:10] Kevin Yank: I like that pattern of kind of replacing all of your commands with your own custom inspectable commands that get converted to real Elm runtime commands kind of at the outer layer of your program. I'm understanding that you are able to make these kind of transparent-to-you versions of Elm commands, and that leads you then to a process of stubbing out the responses to HTTP requests, for example. What does that end up looking like in practice? What are the kind of features that you need to provide test versions of to a typical Elm program?
[00:39:46] Aaron VonderHaar: So inside of the Elm program, state or like the internal state of a running program, there is a couple of things that are tracked now. So one is kind of a list of, or a dictionary of all of the outstanding HTTP requests. So whenever your update function produces an effect that maps to an HTTP command, all elm-program-test does at that moment is track it as a new pending request.
[00:40:15] What that means is then after you say, click a button that results in your update function, posting an HTTP to an HTTP endpoint, then after that, at any point – and you don't have to resolve it right away – but at any point after that, in your test, you can either assert that a request was made to a particular endpoint. If you want to, you can assert on things in that request. If you want to assert on the body of the request or the headers, something like that. And there's also a mechanism for stubbing a response. So you can either say that the request errored with some particular HTTP error, you can provide an okay value, with the status code, with the response body,
[00:40:58] and basically at any moment after that. So if you have multiple requests in flight, you could have a test that resolves them in different orders, if that's the case, if you want to test for and so on.
[00:41:07] Kevin Yank: Do you need to explicitly handle requests after they have been queued or is there the ability to, for example, set up some request handlers that say, if I get a request that goes to this URL, it will receive that response. I don't care in any given test if that has actually occurred or not,
[00:41:27] I care about the effect of that response having come back that the user can observe. That's a way I've seen these kinds of tests written in the past is you set up the system first and then you knock it down by playing at the user's interactions and expectations.
[00:41:41] Aaron VonderHaar: Yeah, that's a good question. That was actually the alternate API that was considered when I was first looking at what kind of API to design to expose this. In looking into it, I actually determined that that type of API, where you define your handlers upfront, can actually be built on top of the API that currently exists, where you let things get queued up and then you can resolve them later.
[00:42:08] So the API you're describing doesn't exist right now. It could. There's some interest in it. I think I have probably a GitHub issue, tracking that. So that's something I'd like to build in the future. But, it does not currently exist, but totally could and could be built in a particular test helper, for an individual project if you needed.
[00:42:26] Kevin Yank: Yeah. If I really wanted that, I could model a set of request handlers and then at a point in my test suite, say resolve all requests or something like that.
[00:42:36] Aaron VonderHaar: The tricky part in making an API for it tends to be that it needs to be flexible enough to allow, for instance, dynamic responses depending on content in the request, but not everyone needs that level of complexity. So how do you make an API that's simple for regular usage, but also is flexible enough?
[00:42:56] So that's kind of the API design issue there in making a general solution. But yeah, certainly I'd encourage anyone that wants that, it could be done, as you just described, using helper functions within your project.
[00:43:08] Kevin Yank: I'm sort of mentally going down my internal list of effects that I know that the Elm runtime can handle. And is an HTTP request an example of a particularly complex one? Are there harder ones that you've had to support? I'm thinking, okay. There's random numbers. There's ports. There's viewport queries now in Elm 19. Are you having to work through those one at a time, or is the same pattern applicable to supporting all of those?
[00:43:39] Aaron VonderHaar: Well at a high level, the same pattern's gonna work in that there's some internal state tracked with the running of the program. And then there's some new simulated commands that manipulate that internal state. And then there's a couple of new public functions on elm-program-test that lets you write assertions
[00:43:58] that inspect that internal state or produce new messages based on the internal state. HTTP is the most complicated one that's been implemented so far. The only other big one I have at the moment is, Process.sleep or delay, I forget, which it's called an Elm. And I've started implementing a couple of the browser navigation ones like back and forward and having a back stack and integrating that with the Program.Application
[00:44:28] like onRouteChange-type stuff. So all that works. So that, and the HTTP have been the most complicated so far. So, I do have a list of possible APIs, like for instance, even for HTTP, doing stuff like tracking the progress of ongoing requests and canceling in-flight request is something that isn't implemented yet. But it's on the to-do list. Random number generation, I think would be relatively easy.
[00:44:54] The internal, runtime state for that is just that there's a seed value somewhere that would be used and updated. So yeah, I think probably the depth of complication has been tested out. I think it's been proved as a concept, but certainly there's a lot more APIs that still could be added to support more things.
[00:45:12] Kevin Yank: I have not used elm-program-test myself, but I have written extensive test suites in something equivalent like Capybara in Ruby, and something that has happened in a couple of projects is that as soon as you start building up a nontrivial number of these tests, you start to want for a more abstract representation of your application's user interface, from the user's point of view. I've gone through the process of modeling a couple of times in my career with varying levels of satisfaction as to the result.
[00:45:49] Is that something that you have had to do at NoRedInk in your test suites, and how does that go with Elm?
[00:45:55] Aaron VonderHaar: NoRedInk is still largely a bunch of individual Elm apps. So pretty much every page on our site for the most part is still a separate Elm program, which means there's a limited number of interactions you can want to go through.
[00:46:10] You kind of start at the loading point of that page and only do actions within the page in any given elm-program-test. We do have a couple of things like within a container, like we have these little, raised rounded corner containers that can have things in them,
[00:46:27] so we have a helper that like checks and you can say, okay, look within the container that has this title, and then click the submit button within the container that says "class1" or whatever. So we have a couple of those, but I guess at NoRedInk, we're still relatively lean in our like building new features kind of fast rather than have we, we haven't shifted yet to focusing on having a comprehensive design system across the site that's hard to work with.
[00:46:55] Kevin Yank: What's got me smiling is I'm thinking back on those libraries of abstract descriptions of pages and their features and I'm realizing they were very object-oriented code bases. The pattern that I've heard this called is the Page Objects pattern. And just the fact that it has object in the name makes me wonder… I suddenly want to explore what that API could and should look like. Were there any other unique challenges in getting a elm-program-test to do what it needed to do to be useful?
[00:47:25] Aaron VonderHaar: Well, on the lines of UI components, elm-program-test actually does a lot of stuff behind the scenes, to work with whatever DOM structure you have. I've tried to make the API as simple to use and in the user terms as much as possible. So for instance, the clickButton function that we keep mentioning, it actually can check for a lot of different situations.
[00:47:48] It can find an input of type button. It can find a button with text in it. It can find a button with an image in it where the alt text is the text that you want. It can find a div that has an ARIA label and ARIA role button. There's some functions to help you write your own custom DOM matchers if you want to,
[00:48:09] but the built-in ones actually do a lot of extra work and also help provide nicer error messages in a lot of cases.
[00:48:16] Kevin Yank: User interfaces get refactored as well. It's not uncommon to start with something as a text link and then decide it should have been a button, for example
[00:48:26] and having to rewrite your tests when you make that change would be tedious.
[00:48:30] Aaron VonderHaar: Yup. Exactly. So I've tried to also help encourage good accessibility practices as well, when possible in the default helpers. But implementing that stuff has been kind of a pain. Like the way elm-html-test is implemented at the moment, it doesn't give you a lot of flexibility, for instance to find an element and then find another element nearby, or that has the same ID, or that is the parent.
[00:48:58] So I've kind of had to do a bunch of hacks internally to make the selectors work, to do some of those more sophisticated matchings.
[00:49:05] Kevin Yank: I imagine things like keyboard interactions are really hard to model and reproduce as well, especially since they can vary from browser to browser as well, what user input does in a particular widget.
[00:49:19] Aaron VonderHaar: Yeah. Well, that reminds me that doing focus tracking, is something that's not implemented. That's going to be a pain, someday when somebody gets interested in it.
[00:49 it's for,] Kevin Yank: So right no, do you sort of fudge it by saying, “and this thing gets focus?”
[00:49:35] Aaron VonderHaar: Huh? I believe at the moment. If you care about focus, you'll just have to do that manually. So you'll have to simulate a focus event. Then do the fill in to do the change event, and then do the blur event if you want to.
[00:49:47] Kevin Yank: What would you sayto someone who is already testing their Elm at this level, using another language like a Capybara or something like that, and they're just going, all right, spin up my actual Elm app in an actual headless browser and interact with it. What is the value of moving these kinds of tests into Elm itself?
[00:50:09] Aaron VonderHaar: Yeah. So the reason would be speed. And, fast compile time and getting the type safety within your test as well. But speed's the main reason. I guess the reason to still use Capybara would be for certain cases where you're using ports to interoperate with JavaScript and, you want to actually test the interaction between Elm and JavaScript.
[00:50:33] Or with a real server backing, your code instead of just a simulated server. But yeah, I would say at NoRedInk, our CI builds are now around 24 minutes, but most of that is taken up by Capybara tests. So if we could convert all those to elm-tests, or most of them, that would cut a huge percentage off of our test time.
[00:50:51] Kevin Yank: That's about the same amount of time it takes the builds that Culture Amp. I would be very interested in seeing that bell curve, for companies of a certain maturity, what is the, tolerated slowness of CI, past which engineers revolt and make the necessary changes,
[00:51:11] but below which they kind of live with it?
[00:51:13] Aaro,n VonderHaar: Yeah. Which I think. In my mind That's why it's good to get started with, program tests kind of right off the bat so that you get used to learning, okay, when do you actually need a higher level test that's slower and flakier? But I think given that Elm is a very stable language and that the Elm architecture is very well designed, and easy to understand,
[00:51:38] Most things you're doing in Elm, you don't need Capybara tests because you can trust that Elm is going to compile to working code. So if you have tests and you feel confident that if your tests pass, your page is going to work, then that's the ideal state.
[00:51:53] Kevin Yank: Last time you were on we talked briefly about the version number of elm-format and how there were a couple of, kind of, must have features that were still on the roadmap before you would call it a 1.0. elm-program-test is at version 3.2. Do those version numbers, line up with each other?
[00:52:11] Is that number an expression of the maturity of elm-program-test? And I guess, would you advise people to go out there and pick it up today?
[00:52:19] Aaron VonderHaar: Yeah, I would say that the 3.0 release was— A lot of effort went into that to improve the documentation, to publish some guide books about how to pick it up, how to test commands, how to test ports if you need to. And I got a couple of my coworkers to help on reviewing that.
[00:52:36] Kevin Yank: As a fan of good documentation, seeing a section called Guidebooks with multiple links to fully worked examples, that makes me smile when I'm encountering a new package.
[00:52:47] Aaron VonderHaar: So 3.0 definitely stable, but there's definitely some missing APIs, certain commands you might be using that aren't testable, but that shouldn't get in your way of testing the things that are currently supported at the moment.
[00:53:00] Kevin Yank: This is a tool that is, part of the everyday at NoRedInk? It's not unusual to be adding some program tests to the feature you're shipping?
[00:53:10] Aaron VonderHaar: Yeah. I mentioned we have several different teams now. My team uses it pretty consistently for every new feature that has front end code. The other teams use it to some degree, and I'm starting to roll out some internal documentation about what tests look like at different levels and how to think about choosing what level is appropriate for your particular feature, to kind of have more consistent approach to that across the company.
[00:53:35] Kevin Yank: If you can share that, even if parts of it only apply or are the right tradeoff for NoRedInk, I think just seeing how a company of your size thinks through those tradeoffs would be super valuable to the community.
[00:53:49] Aaron VonderHaar: Yeah, I'm sure I can get some version of that available publicly.
[00:53:54] Kevin Yank: Your point on the speed of the way this is implemented is especially interesting to me here because we long ago got to the point where we said, okay, we can't afford to write a Capybara test for every single thing these screens do because the test suite would run forever. And so we need to test the critical features that we want to make sure never break.
[00:54:19] And so that is often the happy path and any especially sensitive error states that we want to make sure remain error states. But, it means that, you know, in practice we have less than 50% of the actual features of our product covered by these or described by these tests. And it feels like you could get much closer to 100% with this.
[00:54:42] Aaron VonderHaar: Yeah. I mean, the concern you will have switching to elm-program-test is that the interface between your front end and back end is not implicitly covered. So for instance, at NoRedInk, we do this by having some Rails controller tests that actually output the JSON that they would be producing and save that in a file.
[00:55:04] And then our elm-tests— Well actually we have a script that like converts those JSON test fixtures into Elm files that just contain that JSON as a string and we load those in the elm-test.
[00:55:17] Kevin Yank: I'm smiling because I've seen that wheel reinvented so many times now. There's the behemoth that is Pact, which is this collection of frameworks for different languages that say, “Okay, we are the framework for testing that, that contract in two directions between your front end and backend.”
[00:55:36] But in my experience, it is so heavy and so featureful that it ends up being no fun to use. And at the other end of the spectrum, there are these, like homegrown, quick and dirty solutions of like, let's script together some JSON. And that tends to make people happier. But, the programmer in me hates to see that wheel reinvented over and over again.
[00:55:58] Aaron VonderHaar: Well, I did want to give a shout out to, uh, two of my coworkers who both did some strange loop talks in the past about testing. Tessa Kelly has one specifically about kind of designing your Elm code to, be more testable. And she actually works on the other feature team I mentioned here at NoRedInk that tends to do more unit testing in their Elm code, compared to my team.
[00:56:22] And the other is Brooke Angel, who gave a talk about accessibility in Elm and kind of using some testing approaches. And she actually worked with me on the assignment form project that I mentioned earlier where we started the initial prototype of building what became an elm-program-test, into NoRedInk's code.
[00:56:41] Kevin Yank: You're doing my scouting for future guests for me. I've got to sign them both up to do an episode in the future. But, in the meantime, those talks will be linked in the show notes of this episode. So go have a watch. I'm definitely going to be doing so.
[00:56:53] Well, all right. Thank you, Aaron. It was great to have you on a second time in so short a period. I feel like we have wrung the sponge that is your brain of everything that you've got going on right now. And I'm looking forward to seeing what it refills with next. And in the meantime, keep an eye out for elm-refactor that, if it's not out by the time you hear this, it will be out soon. And as Aaron said last time, he's eager to hear from people who might want to test drive it in the meantime.
[00:57:24] Aaron VonderHaar: Absolutely. I've been doing some work in the past couple of weeks, so it's getting closer.
[00:57:28] Kevin Yank: All right. Well, thanks again, Aaron, and thank you, listener for joining us once again in Elm Town. It's always great to have you here. Especially in these isolating times, it's nice to send a message in a bottle out into the ether and know that, it's finding your ears. Please hit me up
[00:57:45] on @elmtown on Twitter or on the #elm-town channel on the Elm Slack. Let me know what you're thinking of this new batch of episodes, and if there's anything going on in the universe of Elm that you would love to hear from the person who created it,
[00:58:02] please let me know. I'm filling my calendar with new recordings as we speak, so, eager to hear what you want to hear about. Until then, I'm Kevin Yank and this is Aaron VonderHarr. Bye for now.