placeholder
thoughts and learnings in software engineering by Rotem Tamir

Chronic early-adopter tech-debt syndrome

Here’s a common scenario, if you’re a cutting edge shop: you early-adopt new infrastructure technology, it is awesome and gives you new capabilities, but there’s no obvious way to integrate it with the rest of your workflows, so you home-brew a tool (read: shell script) to solve whatever it is that you need. As time goes by, you deal with edge-cases, squash bugs, More workflows are built that depend on it, you become technically invested in the tool you built.

Some more time goes by, the tool is really useful, everyone at your company loves you for building it as it allows them the flexibility and velocity that are the envy of their friends; maybe you blog about it, maybe you present it in a local meetup. You become emotionally invested in the tool you build.

Even more time goes by, and the community realizes that there’s a common set of problems here, that every one is re-solving individually, so it would make sense to collaborate and create a community project to do what your home-brewed tool is doing. Some good people from strong teams join hands and build something that is designed much better than your tool, as it must be generic and flexible enough to support the use-cases of many, very different, teams. The code is well-written, and tested; as no one wants to put out crappy code in open-source under their name. Better yet, some people take the initiative and launch a product that solves this problem, as-a-service. Your once-shiny, state-of-the-art, source of pride, begins to look like what it is - a pile of tech-debt which you don’t have the resources to attend to.

Of birds and engineers

Birds fly in V-shaped flocks to conserve energy. Each bird (except the leader), enjoys the uplift created by the wings of the bird in-front of it. As the leader grows tired, another bird will replace it at the helm. When a bird begins to leaves the flock, it immediately feels the resistance in the air and responds by re-aligning itself with the flock.

As engineers, our feedback loops and motivations are a bit different from birds in a flock. As Larry Wall (creator of Perl), famously said, the first virtue of a great programmer is:

“Laziness: the quality that makes you go to great effort to reduce overall energy expenditure”.

When you take the risk of leaving the flock to early adopt new technology you do so for the possibility of a massive reduction in overall energy expenditure. You don’t necessarily feel the resistance of the air immediately, on the contrary, as we’ve shown above, under the right circumstances, breaking from the flock is exhilarating!

This exhilaration is short lived. Flying for long periods of time against the winds without the benefit of flocking together requires so much energy! You start yearning for the comfort of the flock. But where is it? How do you navigate your way back?

The hardships of rejoining the flock

As the community builds solutions that address the problems you’ve solved on your own, your hacky tool becomes technical debt you must address. But this is a special kind of tech-debt, which is very hard to tackle, i dub it: “chronic early-adopter tech-debt” (it even has a neat acronym: C.R.E.A.T.E.D).

Deciding to unroll a project you are so invested in technically and emotionally is not an easy endeavor. So much that it seems to me that even cutting-edge, technically innovative organizations, can get stuck in these muds for many years. Why is it so hard to rejoin the flock?

  1. It means redoing a system that is already working. It may not be perfect, but that home-grown solution for doing whatever, gets the job done, most of the time. Who has the time and energy to go deep into the new solution, figure out how to deploy and configure it correctly, integrate it with your existing CI/CD system, test for backward compatibility, and design (and execute) a migration plan?
  2. It means letting go. Sunsetting a system you’ve taken all the way from idea to production can be a painful experience. You’ve created all this value, got all of the recognition and love from your peers for solving these issues, is it really necessary to unwind this project and start over with someone else’s work?
  3. The devil is in the details. You built this project, and evolved it, tailor-made to your team’s specific circumstances. Moving to a new system carries the risk that certain edge-cases will be unsupported. The worse thing is that these edge-cases are often discovered deep into the integration process.
  4. You might pick a loser. To solve any important problem, many competing projects will be started, only a few of them will gain traction, most will die within a while. Picking a loser is even worse than staying with your in-house solution, you will be stuck with stale, code that you didn’t even write with no community or commercial support.

Adopting infrastructure technology is a very risky business. Changing an existing solution only to reduce the overhead of maintaining something in-house is even riskier, as we have shown. Why should we do it anyway?

  1. You are not an infrastructure company. You should focus your own and your team’s energy to solve hard problems in your specific problem-space, in your unique business domain. Stay away from all of the undifferentiated heavy lifting that everyone else in this industry is doing and concentrate on creating value for your users instead.
  2. You owe it to your teammates and employees. When they move on to their next endeavor, deep expertise in the quirks of your in-house partial implementation of whatever will not take their career much forward. The industry is consolidating around certain solutions, make sure your people leave your shop equipped with relevant knowledge for the coming years.
  3. Learn from other people’s mistakes. A tool that’s developed to cater to the requirements, quirks, and edge-cases of multiple companies is 100x more versatile, generic and tested than yours can ever be. Wouldn’t you rather have your system be tested in production by the entire industry, with bugs filed, pull requests submitted by the world’s best infra engineers?

Finding your way back

There is risk in making the shift back into a common solution, for sure. As with any scenario of risk management, despite the possible rewards, it is not always advisable to go forward with it. What can help us decide whether we should embark on this journey at a given time?

  1. Quantify costs. Before you trash your working solution and grab whatever is available in open-source or SaaS-land, do an honest estimation of what is the cost of your existing solution. These costs break down into two classes: hidden maintenance costs and missed opportunity costs.
    1. Maintenance costs: How many engineer-days are spent on bug fixes, version updates, on-boarding and continuously training engineers to use your solution? How many production incidents can be traced to lack of maintenance of its code or it’s low quality?
    2. Missed opportunity costs: What technical features are you missing out on? What would the flexibility and peace of mind that a well managed solution enable you to do? What projects are the maintainers of the current solution blocked from taking upon themselves? Once you have a good grip of what the cost of not changing anything is, you can weigh that against the risks of making a change.
  2. De-risk by experimentation. This might be obvious, but before taking a huge bet on a piece of infrastructure technology, it is wise to run an end-to-end experiment to validate that it is able to deliver the results you need under your technical (and budgetary) constraints. Prove to yourself and your stakeholders that at the very least, the change you are making is on-par with your existing solution.
  3. De-risk by using open-source. The beauty of open-source software is that if you find a bug or you need some feature to be added to a project, you are at liberty of forking the project, making the changes you need and using the modified version. Even better, contribute the code back to the project and enjoy peer reviewers and maintainers. Just make sure that you have the necessary knowledge and skills in-house set to actually make these modifications.
  4. De-risk by picking a likely winner. This is easier said than done. But the risk of picking a losing solution must be addressed properly. Do your research. Look at both quantitative data (how many people are using it - i.e GitHub stars and DockerHub pulls) and qualitative data (expert opinions). Monitor community discourse in conferences, blogs and on social media. Learn to separate the hype from actual, in-production, adoption by key players in the industry. Read and evaluate the opinions of the disappointed, of the competitors. Review the codebase and architecture as you would for an in-house project. Be as sure as you can that you are betting on a winner.

Conclusion

I can personally attest that chronic early-adopter tech-debt syndrome is very real, and very hard to deal with. The hard thing about it is that because it is both very expensive and risky to address, in some cases the best option is wait. You wait until the time is right, which may be, ironically, only when a brand new technology which you will want to be an early-adopter of, will come around.