Refactoring as an optimization problem

Words by Gustav Larsson

I've been thinking of machine learning, software development and refactoring. I've started to view refactoring as an optimization problem. And I think we can control how hard the problem is.

When performing code reviews, I think it's important to consider what kind of feedback to give, and what to omit. Besides discovering pure bugs, a big part of code reviews is identifying issues that affect maintainability. I'm thinking that certain flaws should never be allowed to enter the code base, while others are ok, if it helps the development velocity.

With an existing code base, we can improve maintainability by "making code changes until the maintenance cost is lower than before". If you know where you're going, and the path is convex, it's a relatively effortless activity that is usually worth the time investment. We can use the cheap gradient descent algorithm, and take as small steps as we want.

If you only have local problems, such as poorly named local variables or unnecessarily complex logic within a single function, this is easy to fix. It's also an isolated problem: If the abstraction that the function provides is still reasonable, the fact that the implementation is messy shouldn't affect the rest of the application. Preventing these types of problems from entering the code base is of course preferable, but it shouldn't be the main goal of a code review.

However, if there's a messy relationship and coupling between multiple methods or multiple classes (or multiple systems!), this should probably be fixed before the change enters the code base.

Seen as an optimization problem, we are stuck in an local minimum: We might still be able to make small improvements, but to make significant improvements, we need to make things worse before they get better. This requires more commitment, since we cannot take small steps. Also, the hills around this minimum tend to grow by themselves if left unattended. Avoiding getting into this state should be a priority.

Put in a different way, the code review comments that are easiest to make are least likely to provide value. A truly valuable review doesn't point towards the nearest local minimum, it peeks over the hill and finds a better one. Keeping the checked-in code convex helps managing technical debt.

We also have to keep in mind that we all have different estimates on how low the minima go. Everyone reading the code might agree that it's possible to make things better, but not necessarily on which direction is most likely to lead to the lowest minimum.