One year of math-heavy legacy code


Published: 20.07.2022

I spent a year re-factoring math-heavy legacy code and learned something.

The story is familiar. A scientist develops new method and implements it in her favourite language. It seems to work, so a single developer implements it “properly” and it gets released. Being able to compare algorithms is nice so the program is expanded to allow users to hook their own algorithms into it. However, the hooking-procedure is complicated, not well documented and remains largely unused. Years pass.

Ok, why is this bad? Is it bad? I think that this is a perfectly reasonable way to develop, but that it can be tweaked. Here’s a list of things that make life better.

For the scientist:

  • Please write code that matches the math in the paper as closely as possible. If you write x = A^-1 b in the paper, don’t implement z = R * y_temp.
  • Write the minimum amount of code required. Don’t roll your own linear algebra, sorting etc.
  • Reference the paper in code, no need for excessive comments.
  • Go easy on OOP and especially inheritance. Stick to simple functions whenever possible.
  • Test properties, not specific cases, i.e. test that x^2 is positive for random inputs and not that 4^2 is 16.
  • Have someone honest read your code.

For the developer:

  • Don’t change notation if it already matches the paper. Don’t listen to your linter. i, j, k, x, y, x are perfectly good variable names.
  • Write the minimum amount of code required. Don’t roll your own linear algebra, sorting etc.
  • If you need OOP, use a language that properly supports it. Sounds obvious but it’s not.
  • Say no. Excessive flexibility is a form of premature optimisation. If you really want to allow users to hook their own stuff into your program, you’ll need to write documentation - and lots of it. With great power comes a lot of documentation.
  • Have someone honest read your code.

Life is struggle but mine will be less so if you follow these steps.

Thanks to Jonas for feedback.