i work on long lived projects. These are projects that tend to run for years, and might even be considered to be tech debt magnets. That’s pretty natural, but it’s interesting considering that the rest of the company tends not to really think that way. They are instead focused on “the next version” of a product.
Since i’ve got services that need to last for years, i’ve got to manage levels of tech debt that happen. Some of it for internal reasons, a lot of it for external ones. This is because i need to balance working on those services with new services we’re rolling out because we’re generally understaffed and have lots of priorities.
To that end, i’ve started classifying concerns into “Asteroids and Papercuts”, and i’m wondering if others might find the framework useful.
An Asteroid is a big, potentially civilization ending event that will arrive at a known time. It’s something you need to pay attention to now, but the date is still (hopefully) a way off. This can include things like:
- The language your service is built on is no longer actively supported or receiving security updates.
- You need to move your data from one, cloud based, proprietary data store to another cloud based, proprietary store for reasons.
- A key individual who has deep domain expertise is leaving the company.
(All of those things happened at least once with projects i worked on, so yay!)
When dealing with an incoming asteroid, your priorities are:
- Knowing the date
- Knowing the damage
- Knowing the mitigations: Short term and long term
- Putting together the work load estimate that is based on worst case scenarios
That last one is the trickiest, and kinda requires your most pessimistic attitude. Basically, try to factor in the other things that can and will go wrong. It’s important to avoid jargon, undefined TLA (Three Letter Acronyms), and presumed understanding when writing each of those up.
For instance you’d want to write up a summary that clearly specifies the problem, includes dates and provides a reasonably terrifying summary of why this is really important. So for something like a language going EOL:
The Foo Service has been providing valuable Foo for 80% of our customer base for over ten years. It was announced on <date> that the programming language it was built in (Logo 1.2) will have it’s end of life on <date>. This means that the Foo Service will no longer be able to receive security or library updates and will be increasingly a security concern, which could compromise private user data and company resources.
The following potential mitigations are possible:
<List of clearly and plainly specified potential courses of action>
This is something that will probably require at least a day or two of focused effort.
If you can’t get the time to do that from your management, be sure to get that in an email or document so that when the service fails, you have proof that you were told what you work on isn’t critical. (You don’t have to be snippy, just a quick email saying
On <prior date> i informed you that <pending disaster> will occur on <asteroid impact date>. i believe this will significantly impact our company and customers by <brief, but terrifying damage description>. i’ve asked for time to further understand and propose solutions to this problem, but as i understand, you’ve requested that i not do that. i want to make sure i clearly understand your reasons, so i would appreciate if you would include them in a reply here. Thanks!
If the manager insists on doing an in person meeting, take notes and feel free to send a follow-up email that includes the points of discussion and resolution, and ask that they confirm that this is correct.
Also start looking around for a different team/org/company because this is bullshit politics and your manager is setting you up to be the sacrificial offering when it all goes to hell.
Once you have a plan, treat it as a high priority task and get your various product people aware of it and working toward it. Make it a banner line on your weekly reports. Time is your asset and enemy because it will go faster than you expect, particularly if you have other priorities, and you will have other priorities.
Papercuts are smaller annoyances. Things like blocked library updates or significant bits of notable tech debt. The thing about papercuts is that while they’re small, if you get enough of them, they will kill you. (e.g. Death by 1,000 Papercuts)
While these tend to sit in the backlog forever, it’s important to track them because they can also fester and turn into significant events. Each papercut can be a unique thing, so it can be hard to come up with as clear strategy as for an asteroid, but you should have one in any case. Fill out the bug/issue/ticket with details for future you. Note the relationship a given papercut has internally in the project or across the org. Show not only that it’s important, but how a delay on fixing it impacts the bottom line. Basically, present it so that someone who has no understanding of the tech or why this is important can understand why this is important.
Of course, there are lots of other issues that you can address, and lots of ways to categorize things. Not everything is or should be an Asteroid or Papercut. There will be some things in your backlog that are there to die, neglected and alone, but there will always be things that are more critical that you need to pay attention to. Your team and mileage will vary, but hopefully you now have a framework to help present critical issues up the chain if you didn’t already.