Why and How we Should Calculate Expected Utility of Refactorings


In my experience, most applications are a mess…Changes are commonly made under urgent time pressure, which drives applications towards disorder…Velocity gradually slows, and everyone comes to hate the application, their job, and their life.

—Sandi Metz, “The Half-Life of Code”


Many of us work in codebases that are not easy to work with, codebases that we want to make better. The way that we typically choose what parts of the codebase get made better, however, is sub-optimal. The two dominant methods I’ve seen are:

  1. Fix code in areas of the codebase we happen to be currently working in.1 (“I’m here. might as well fix it.”)
  2. Fix code in areas of the codebase we touched in the recent past. (“Doing X last sprint sucked because the code is bad. Let’s fix it.”)

Neither of these methods is ideal.

The first problem with them is that the link between code that we’re working on now and code that we’ve worked on recently, on the one hand, and code that needs to change in the future, on the other, is tenuous. Just because code changed recently, doesn’t mean it’ll change again in the near the future. The second problem: even when the link between current and future changes is strong, these methods give us no point of reference for understanding how frequently the code we’ve recently touched changes vs. how frequently other code changes.

Why does this matter? If we refactor code that won’t be read or changed much in the future, we aren’t helping ourselves much. What good is readable code that won’t be read? What good is changeable code that won’t be changed? It’s worse than this actually: while we were making a particular piece of bad code better, we missed an opportunity to refactor code that will be changed more frequently in the future.

What should we do about this? We should try to estimate the relative expected utility of a refactor. This will be hard — just like estimating expected utility always is — but it’s better than not doing it at all.

We don’t just throw up our hands when evaluating two different economic policies, for example, simply because it’s hard to estimate the likely impact of those policies. We accept that our evaluation will be imperfect and we do the best we can.


How can we do this? Broadly, we have to estimate how likely it is that a file will change, relative how likely it is that other files will change.

Leaning on past changes

Insofar as we can take past changes as an indicator of future ones, our job is relatively easy and git gets us most of the information we need.2 We can sum up the diffs of all files in the codebase and assign a value to individual files that reflects the percentage of diffs it contains relative to the diffs of the whole codebase. If we’re looking at refactoring a file, we can just look up it’s score before taking the plunge.

This little script will get me a score for a file (Thanks, Joe)3:

#!/usr/bin/env bash
set -euo pipefail


last_sha="$(git rev-list HEAD | tail -n 1)"

function sum_numbers() {
  perl -lane '$t+=$_ for @F; print $t'

file_diffs_sum="$(git diff --shortstat "${last_sha}" "${file}" | sum_numbers)"
all_diffs_sum="$(git diff --shortstat "${last_sha}" | sum_numbers)"

echo "file_diffs: ${file_diffs_sum}"
echo "all_diffs: ${all_diffs_sum}"

score="$(echo "scale=5 ; ${file_diffs_sum} / ${all_diffs_sum} * 100" | bc)"
echo "Score: ${score} percent"

The more people that touch a file, the higher score that file should have. Readability matters more when you have more people editing a file because a single person can rely on their memory as a crutch if their code isn’t readable. git blame will get us that info.

Looking to the future

Insofar as past changes don’t indicate future ones, things get more challenging. What we need here is a mapping between a product roadmap and the files that will likely be touched to execute on that roadmap. This mapping doesn’t have to be perfect, of course. Tagging files and directories with their corresponding product/roadmap areas seems like a good start. This could be done via comments like this:

/** changes-for: accounts */

function login() {
  /* */

function isLoggedIn() {
  /* */

Or we could simply place files in certain directories to mark that all files in that directory are associated with certain product areas.

Once we have this mapping, we need to do a similar calculation we did with git diffs. We want to score a file based on the proportion of times we project it’ll need to be changed to address upcoming roadmap items. Suppose we have two files:

Files Changes for
login.js accounts
project.js projects

And you have 3 roadmap items:

Tasks area
Implement rename project projects
Implement google login accounts
Implement archive project projects

project.js’s score would be twice as high as login.js because it’s associated with tasks that appear twice as much on the roadmap. To automate the calculation of this part of the score, we’d need some sort of analysis on the tagged files in the codebase that would pull tagged tasks from whatever project management tool we’re using.

Combining both types of scores

How should we combine this information? We should strongly weight the roadmap information, so a formula for the relative expected utility for refactoring a file would be something like:

Past-changes-based score * .35 + roadmap score * .65


We all want to work in great codebases. Unfortunately, we don’t seem to have the practices and the tooling to make optimal choices about which part of our codebases we improve. Estimating expected utility of all the refactors we might make is a step in the right direction. To do this, we can use information about past changes sourced from git and information about expected future changes by building a mapping between files that will need to change in response to the types of upcoming roadmap tasks.


  1. This method comes in two flavors. We might think that making the refactor will facilitate the current change we want to make. Martin Fowler talks about this in Refactoring. We might also just think that the refactor will make future changes easier. [return]
  2. git diff --dirstat would be perfect if worked with files. [return]
  3. I owe Joe Schafer for rewriting this into something readable. The original code I wrote so hideous I couldn’t even get formatter to pretty-print it so that it’d be semi-readable. [return]

Should we refactor files we're working on?

My Mid-Career Job-Hunt: A Data Point for Job-Seeking Devs

comments powered by Disqus