This week I’m paying off technical debt. If you’re not familiar with the term it’s from the world of software developers, and Martin Fowler describes it better than I would:
Technical Debt is a wonderful metaphor developed by Ward Cunningham to help us think about this problem. In this metaphor, doing things the quick and dirty way sets us up with a technical debt, which is similar to a financial debt. Like a financial debt, the technical debt incurs interest payments, which come in the form of the extra effort that we have to do in future development because of the quick and dirty design choice. We can choose to continue paying the interest, or we can pay down the principal by refactoring the quick and dirty design into the better design. Although it costs to pay down the principal, we gain by reduced interest payments in the future.
System administrators and operations folks know this phenomenon very well, since we’re often called to make “temporary” fixes to things. As we all know, nothing is ever temporary, because if it works we move on. And it isn’t like we stopped to consider a proper design for a temporary fix, so we end up with something that helps us greatly in the short term but really stinks in the long term. Steve McConnell, whose “Code Complete” book was the first to introduce this term to me, has a great post where he also outlines the problems with technical debt:
One of the important implications of technical debt is that it must be serviced, i.e., once you incur a debt there will be interest charges. If the debt grows large enough, eventually the company will spend more on servicing its debt than it invests in increasing the value of its other assets.
Soon you find yourself unable to move forward because you’re spending all your time servicing your debt. Sometimes it gets so bad that patching & updating stops, changes are frozen, all because something might tear loose the duct tape, gum, and string holding the infrastructure together. As Jeff Atwood puts it, “accumulated technical debt becomes a major disincentive to work on a project.” He’s totally right. As a result, over time, we end up with systems that are seriously decrepit because nobody wants to touch them. Eventually something comes along to knock the whole thing over.
That’s what I’m doing this week: knocking things over on my own terms. I’ve been keeping a list of all the things I hate about my environment. It’s mainly a series of little things, with just a few big things. I’m writing new monitoring scripts for my VMware environment so that we don’t get paged when we reboot a host that was in maintenance mode, and that we do get paged when DRS isn’t in fully automated mode for more than a certain time. I’m de-kludging the VMware Tools installs on Linux because VMware fixed most of the problems we’ve had with them. I’m going through all the hosts and fixing the firmware levels, because they’re a mess, and converting the last of my oddball hosts to our standard configurations.
This work is at the short-term expense of other projects, but it’ll be worth it because I won’t have to dedicate time, energy, brain cells, and documentation to all the exceptions. I’ll be able to focus on moving forward, not just running frantically to catch up. And it’s very easy to explain what I’m doing. As Steve McConnell puts it, the metaphor has an “incredibly rich ability to explain a critical technical concept to non-technical project stakeholders.” Turns out that most managers understand debt.
What technical debt do you have in your environment? Do you keep a list of things that would be nice if they were fixed? Why not? How much happier would you and your coworkers be if you could block off a few days of time to fix those things?