Maybe Don't Write That Test

2019-09-19

Testing seems to be like going to gym. Everyone feels like “yeah. I should be testing. I should be going to the gym everyday.”

Koushik Goupal, Fragmented, “Episode 13,” 12:01


Remember those gimmicky fitness products that made you think you could “get fit” without actually going to the gym/dieting/etc? Because I live in Orlando and have seen the Carousel of Progress at the Magic Kingdom a bunch of times, the first example of this kind of gimmicky product that comes to mind is a thing called an “exercise belt.” Its the thing on the right:

alt

I also remember this product that came out in the 90s that would stimulate your muscles with electricity so that you could just watch tv while you got buff. I guess these are still around:

alt

If testing is like going to the gym, then many of us are doing the testing equivalent of using gimmicky fitness products that don’t actually work. Using gimmicky fitness products probably feels good. We think:

“Fitness is important and stimulating my muscles and getting my body moved will get me to fitness!”

Wrong. Not all muscle stimulation and body movement is actually helpful in achieving fitness. Some of movement and stimulation can actually be harmful. Similarly, writing tests often feels good. We think:

“Having confidence in my code is important and writing tests will get me there!”

Also wrong. Not all tests are equally helpful in achieving confidence in our code. Some can actually be harmful. It took me a while to figure this out. I’ve been saying this for a while now, but whenever I say it, people always look at me like I’m nuts. Maybe breaking down the argument in writing will help.

The basic idea here is that tests have a cost and that sometimes their cost can outweigh their benefit even if we do a good job writing our tests. If that seems obvious to you, you’ll want to skip out on the rest of this. It’s only going to get less interesting from here. If that idea seems outrageous to you, stick around. I’m going to try to convince you that not all tests are worth writing1 and try to show you that a few of the programmers who popularized automated testing probably think this is true too.

Cracks in the Surface

If we look closely at the folks who have popularized automated testing, we already see hints of a challenge to the idea that all tests are helpful.

Let’s start with a story from Gerard Mezaros, author of xUnit Test Patters, a book that has been blessed by Kent Beck:

White the time spent writing new tests and writing production code seemed to staying more or less constant, the amount of time spent modifying existing tests was increasing…When a developer asked me to pair on a task and we spent 90% of the time modifying existing tests to accommodate a relatively minor change, I knew we had to change something…2

Eventually, Mezaros figured out why this was happening and wrote a book about how to avoid this sort of situation. The whole premise of the book and his consultancy is that its possible for the cost of tests to outweigh their value. He’s sad when people don’t call him in early enough because if he’s too late, “the client will likely muddle through less than satisfied with how TDD and automated unit testing worked–and the word goes out that automated testing is a waste of time.”

Mezaros is mostly concerned with lowering the costs of tests with patterns and reasonable abstractions and testing infrastructure, but even when we take the care of creating good testing infrastructure, writing certain tests can still be a bad move for actually improving confidence in the codebase.

More on that later. For now, another quote from Dave Thomas, author of Pragmatic Programming and OG agile manifesto signatory in his “Agile is Dead” talk:

Testing sounds like something you should do but I am going to tell you right now that I mostly don’t test…and I’ve done measurements I don’t actually have any more bugs. I will still test complex algorithms but I won’t test the whole piece

🎤⬇️. Nuff said there, I think.

Let’s do one more. This one is from Kent Beck, and its his thoughts on the subject of what we should test:

The simple answer…“write tests until fear is transformed into boredom.” This is a feedback loop, though, and requires that you find the answer yourself. Since you came to this book for answers, not questions (in which case you’re already reading the wrong section, but enough of the self-referential literary recursion stuff…), try this list. You should test:

  • Conditionals
  • Loops
  • Operations
  • Polymorphism3

I want to highlight how playful and non-specific Beck is here on this subject. His answer is not “test everything,” and the answer that he does give is pretty chill and ad hoc.4

Here Beck doesn’t say that writing tests when you’re “not afraid” or “bored” can actually be harmful, but if we try our hand at a more precise answer to the question of what to test, we’ll see how this can be true. To do that, we’ll need to get clearer on the cost and the value of a test.

Tests as Investments

Tests have value and they have costs. We want their value to outweigh their costs over a certain time horizon, so tests are a kind of investment in the codebase. What’s the value of a test? People like to talk about how tests are good for documentation, design insight, and defect detection. This is a good start, but its too simple.

Let’s start with tests as documentation. The value of the documentation aspect depends on how likely it is that the tested code will change. If it doesn’t change, there won’t be a need to understand how it works. With this in mind, we might represent the value of the documentation aspect of a test like so:

Design insight is the next benefit. Tests force us to write more flexible code, which is great. But, what if the code doesn’t need to flexible?5 Again, we’ve got to introduce the probability that the tested code will change to understand the value of the design insight:

Defect detection is the last benefit of automated tests. As a first pass, we can say that the value of a test in this regard is the probability that a defect will be introduced times the cost to the business if the bug isn’t caught by the automated test:

The probability that a defect will be introduced (P(D)) is probably related to how good the documentation is and how often the code will change. It’s probably also related to the inherent complexity of the problem the code is trying to solve. (Its much more likely there will be bugs in a bubble sort implementation than a getter method.) The Cost of a bug being introduced depends on a lot of things like:

  • If an automated test doesn’t catch the issue, what will? Blocked developers? QA? Beta users? Actual users?
  • How prominent is the (now) buggy feature? What’s the relationship between the buggy feature and revenue?
  • What’s the relationship between the time to discovery and the time it’ll take to fix the bug?

So, here’s a rough formula for the value of a test:

The cost of a test is easer to track down. Its just the time it takes to write the test plus the time it takes to maintain it plus the opportunity cost of doing both of these things:

An Example Bad Investment

Let’s apply the formulas to an example.

Just the other day, I was doing some refactoring and I saw that I had broken a test I wrote about 2 years ago. I was testing CacheFirstRepository.make, a method that could take a DataSource and any number of DataCaches and return Observable that emits the data from the caches first and the data source (server) last. The motivation for this is that I didn’t want users to have to wait around for data that we already had cached, but I also wanted to update the UI if there was a newer version of the data. Here’s the test:

class CacheFirstDataTests {
  @Test void emitsCacheAndThenSourceData() {
    when(dataSource.get()).thenReturn(dataMaybe());
    when(dataCache.get()).thenReturn(cachedDataSingle());
    
    cacheFirstRepository
        .make(dataSource, dataStore)
        .test()
        .assertValues(
          cachedData(),
          data()
        )            
        .assertComplete();
  }
}

So here’s the thing: the target code hasn’t changed since I wrote this. No one cares about the documentation value of this test because no one has decided its a good idea to make users wait unnecessarily for their data. The flexibility gained from test-driving the development of this feature also hasn’t mattered because it hasn’t changed.

Because this code hasn’t changed, the probably of introducing a bug is extremely low, but let’s imagine a bug is introduced that would have been caught by this test and that its worst case scenario: the bug makes it all the way to production and isn’t discovered for a long time. What’s the impact on revenue? The impact to revenue in our case would have been almost negligible because the most common path for this app is that it stays closed most of the time. In other words, P(C), P(D), and Cost in our TestValue formula are all low:

The cost of the test, moreover, was not negligible. Although the meat of the target code hasn’t changed since I’ve written it, what has changed is the shape of the value objects I use for this test, so my team and I have had to revisit this test a couple of times to update the value object construction code. Sure refactoring tools help, but they aren’t perfect.

I’d go as far as saying this test was downright harmful in this case: not everyone on the team is convinced automated testing is worth their time. This test helps to confirm their suspicions, which means we have less people on the team willing to write the kind of tests that are valuable.

Conclusion

I don’t think this kind of test is uncommon. I suspect many usages of the @Ignore annotation point to the existence of these kinds of tests: “We wrote this test and it broke and we don’t know why but we don’t want to delete it or fix it so let’s just ignore it.”

To be clear: I think automated tests are great. I’ve studied how they should be written and I’ve written extensively on the topic on this blog, but sometimes it may be better to not write a test. Sometimes its better to write a test to test-drive your code and then to delete the test after you’ve developed it. Sometimes you might get more confidence from your codebase by skipping writing a test for the feature you happen to be working on and instead writing a test for a more frequently changing, higher-impact part of the codebase.


Notes


  1. Turns out I’m not the first person to explicitly argue this (shocker). Brian Marick, an OG Agile Manifesto signatory and testing/agile consultant, pointed out on twitter that he argued for this 20+ years ago here↩︎

  2. Gerard Meszaros, xUnit Test Patterns, 26. ↩︎

  3. Kent Beck, TDD By Example, 201. ↩︎

  4. I was shocked when I first read this. I felt silly. I was so serious about testing, and Kent Beck is over here making jokes and just screwing around. It was good for me. Got me lighten up a little. Why so serious? ↩︎

  5. Or what if you’re like Dave Thomas and you don’t need test pressure to make good design decisions? ↩︎

testingprogramming

Maybe Don't Inject That

An Intro to Gradient Descent for Kotlin Programmers