Thursday, February 27, 2025

Team learning session: surviving legacy code by J.B. Rainsberger

This is the description, and experience report, of an exercise I picked up many years ago and then used in a team I lead. I'll describe this in the present tense as I imagine applying the exercise in similar contexts. Credit to J.B. Rainsberger for coming up with this diabolical codebase and make us work on it!

Context

A team has been formed and has been working on a single product, for example a TypeScript monolith comprising a frontend and APIs.

The team is now tasked with taking over a set of services and frontends written in PHP (or some other programming language and ecosystem). It turns out this set of services is the flagship product of the organization, and it needs to support innovative business change. Yet, normal operations and incremental improvements had been mostly outsourced as part of the technical strategy for the last few years. 

The focus of innovation was elsewhere. Testing coverage is redundant, confusing or missing, depending on the area you are working on. Very few people in house know the traps of these codebases: we are now in the realm of legacy code.

Learning goal

The original exercise by J.B. Rainsberger is oriented to learning patterns to take over legacy code: inherited codebases that exhibit a lot of business value. If they don't have business value, then why working on it?

Specifically, the patterns regard:

  • characterizing and understanding code
  • testing it, at various level of scope from whole application to small units
  • refactoring, with a safety net in place to support those changes without introducing regressions.

There isn't an emphasis on shipping new features, and there is no product owner.

To the original goals, we add a separate one here with its own new difficulty level: learn a new programming language that most of the team has not worked professionally in before, or has not picked up for a long time. With a new programming language, there's also knowledge of a new set of tools for running, testing, or linting the code. For PHP alone, in this example, the toolset ranges from Composer to PHPUnit, PHPStan, or PHP-CS-Fixer.

The learning at all these levels translate in being ready to use these patterns and tools on real projects, making it much easier and safer to deliver business value on those.

Activity

The original problem statement gives this guidance:

  • Refactor this code to understand it
  • [Decide] when to refactor and when to rewrite, and how to do that safely.
  • lResolve the central conflict of legacy code: I need tests to refactor safely, but I need to refactor to write tests effectively.

I've been running these workshops back in the days of the Legacy Code Retreats, which is where the name came from. The setting diverges from a whole-day session, where iterations are handle by various pairs. In this case, the repository is forked at the beginning of the activity and several sessions can take place, for example with one or more hour per week reserved. The participants work together as an ensemble, but they could split into multiple rooms working independently if there are too many.

11? 12? What?!

A loose plan to follow can be:

  1. run the application. Does everyone understand the problem domain and what problem the code solves?
  2. introduce golden master testing for characterization. This step involves figuring out seeding and reproducibility, and how to achieve the isolation of automated tests from the outside world and its changing conditions.
  3. introduce further testing at lower levels. This step involves introducing tooling for running code easily, testing it, or performing static analysis; manipulating the folders and file structure safely; and refactoring the code to isolate the units under test.

I act as a language expert here, but not with an agenda in mind. What to learn is decided by what moves the participants want to take, and what they are missing to be able to do so. It also pays off to prompt the team to research rather than trusting anyone's memory or word: tests and experiments are the source of truth.

The fact that this is a completely new, fictional, and ugly codebase should help with the feeling of safety when raising lack of understanding. The code is supposed to be hard to understand and fragile. Once that is established we can then work on our own learning, directed to improve our situation. It's a very different framing that delivering features on real legacy code with a deadline in mind.

Resources

The original code repository used in legacy code retreats.

The canonical explanation of a golden master approach to characterization testing.

Specifically for PHP, it helps to understand how seeding random number generation works to achieve reproducibility.

Retrospective

Add to a board two prompts to allow reflections to appear concurrently. For example:

  • what did we learn today? Anything from a language construct that did not map easily to something I already knew, to a tool's use cases.
  • any ideas for next time? There are lots of potential directions of exploration, and we want to crowdsource the gaps the participants are starting to see so we can fill them. This helps getting into context quickly when we start a new hourly session at another date.

No comments:

Featured post

Team learning session: surviving legacy code by J.B. Rainsberger

This is the description, and experience report, of an exercise I picked up many years ago and then used in a team I lead. I'll describe...

Popular posts