When should we create abstractions instead of duplication?

Because abstractions aren’t free, sometimes we’re better off duplicating code instead of creating them.

If that claim doesn’t make sense to you, read Martin Fowler’s “YAGNI” or Sandi Metz’s “The Wrong Abstraction” or watch Dan Abramov’s “WET Code” talk or Kent C. Dodd’s “AHA Programming” talk.

Each of these programmers give advice on when to duplicate code vs. create an abstraction, advice that broadly falls into two camps: either we are advised to follow some rule of thumb or we’re told to ignore rules of thumb, trust our feelings and to only introduce abstractions when it “feels right.” Fowler, Metz, and Abramov are in the first camp. Dodd is in the second one.

While all of these prescriptions are an improvement over categorical adherence to DRY, none of them adequately capture the complexity of the trade off between duplication and abstraction. We can do better.

In the first section of this essay, I’ll argue — like Dodd already has — that the aforementioned rules of thumb can lead to bad decisions about when to abstract vs. duplicate. I also argue — contrary to Dodd — that our feelings/intuitions are not up for the task of guiding our decisions about when to abstract vs. when to duplicate code.

Next, I’ll suggest a more systematic approach for thinking about the abstraction-duplication trade-off. The gist of the approach is to recast duplicate vs. abstract decisions as predictions, to use a brier score to track the quality of those predictions, and to actively build a mental model of how to make them for the particular codebase we’re working in.

Heuristics and intuition aren’t enough

Fowler and Abramov advocate for duplicating code twice before creating a common abstraction.¹ Metz tells us “if you find yourself passing parameters and adding conditional paths through shared code, the abstraction is incorrect.”² Dodd eschews these types of heuristics in his recent “AHA Programming” talk.

I won’t rehearse his argument here, but I do want to side with him in the rejection of simple heuristics. Programming is complicated. Codebases and the organizations that support them are diverse. The rules that describe optimal choices for an engineer at Google are not the same rules that apply to someone working at a 40 person startup.³ Given this, why should we think that good advice on when to abstract vs. duplicate will be easily stated as a cute acronym or pithy sentence?

If you’re skeptical partially because you’re just keeping score of well-known programmers who disagree with Dodd, note that he isn’t the first to suggest something like this. In Domain Driven Design, Eric Evans recognizes that rules of thumb for design often don’t work. He says:

Sometimes people chop functionality fine to allow flexible combination. Sometimes they lump it large to encapsulate complexity. Sometimes they seek consistent granularity, making all classes and operations similar in scale. These are oversimplifications that don’t work well as general rules…Cookbook rules don’t work.⁴

Enough on the shortcomings of heuristic-based advice. Let’s turn to Dodd’s feelings/intuition based advice.

Dodd tells us to “be mindful of the fact that we don’t really know what requirements will be placed on our code in the future” and that we should only create abstractions when it “feels right.”

The problem here is that our feelings can mislead us, which probably partially explains the appeal of heuristics: we want something more as a guide.

Some readers — and presumably Dodd — would reply, “But more experienced programmers have more trust-worthy feelings!”

Maybe. But if we’re to believe the science on expert judgment and intuition, the impact of experience in helpfully shaping our intuitions here is more limited than we think. Psychologist and behavioral economist Daniel Kahneman won a nobel prize partially because he taught us that expert judgement doesn’t form simply because we’ve been doing something for a long time. For that judgment to form, we need specific feedback loops,⁵ loops that are often absent for many programmers who have an average job tenure of 18 months or who use tools and languages that change quickly enough to inspire fatigue or who work for companies that undergo radical changes as they grow from tiny startups to large, proper businesses.

So, heuristics are too cookie-cutter and feelings/intuitions are too wishy-washy. Thankfully, there are other ways of guiding our decisions about abstraction vs. duplication.

A Brier score based method

When we make a design decision in our code, we are basing that decision on a prediction about how the code will change in the future. When we duplicate code instead of abstracting it, we’re predicting that the code we could have abstracted will somehow be inadequate for the future. When we abstract code, on the other hand, we’re predicting that the abstraction will earn it’s keep in our codebase by having many clients.

Since design decisions are informed by predictions, we can use the same frameworks we use for assessing predictions to assess architectural decisions, including decisions about when to abstract code and when to duplicate it. I’ve used brier scores for this a few times at my job, and consistent use over time will lead me to develop a set of design principles that are both specific to my situation at Heap — which avoids the cookie-cutter problem heuristics have — and not susceptible to the vagaries of my feelings.

Since I’ve already given an overview of how brier scores work elsewhere, I’ll skip the explanation of the simple math behind them here. Using this method for assessing abstraction vs. duplicate decisions, is as simple as creating something like the following table:

Prediction summary	Date	Confidence	Came true?	Prediction Squared Error	Brier Score
Abstracted code for profile detail view will be re-used within 6 months	10/31/20	.85	TRUE	0.0625	0.0625
Duplicated code in `TodoList.jsx` and `TodoDetail.jsx` will diverge within 3 months	1/31/20	.75	FALSE	0.5625	0.3125

Once enough time has passed to evaluate the accuracy of a prediction, I add some notes explaining why the prediction was off. For example, for the first prediction in the table — a prediction that is similar to a real one I made when I first started my new gig — I might write:

Turns out the profile detail view doesn’t change very often. In fact, I was told after I refactored it that the subsequent changes I made were the most significant ones done in several years. Should have examined git history of the files involved before committing to creating an abstraction.

Tracking this sort of thing is obviously more work that applying simple rules of thumb or relying on our feelings, but it’s not that bad. We don’t have to track every decision this way, and updating this spreadsheet is something that I spend less than five minutes on each day.

If spending 5 minutes a day⁶ gets me a context-specific, data-driven model about where to spend my refactoring time, that’s totally worth the trouble. I’ll take it over simple heuristics or my flawed intuition any day.

Notes

I’m thinking of Martin Fowler, Refactoring, 49 and Abromov’s WET code talk here. ↩︎
Sandi Metz, “The Wrong Abstraction”. ↩︎
I’m waving my hands a bit here, but I think this is plausible enough without further explanation. Feel free to call me on this and I’ll say more here. ↩︎
Eric Evans, Domain Driven Design, 292-294. ↩︎
Daniel Kahneman, Thinking Fast and Slow. Too lazy to find the specific pages. Just read all of it. ↩︎
A part of that 5 minutes is actually spent updating brier scores on time estimates, so the ROI for this is actually better than I’ve suggested here. ↩︎