Friday, February 28, 2025

A map metaphor for architectural diagrams

It is a (two-dimension) representation of a pipe.

The map is not the territory, but in software engineering terms they are models of it. A frequent remark about models is that they are all wrong, as they eliminate many aspects of what they refer to. Big efforts can certainly go very wrong if they don't consider something that turns out to be very relevant, but models in general try to focus on a few aspect that are relevant for a particular job to be done. That's what makes them useful despite their limitations. 

If this philosophy was more established, we would spare software engineers from many an effort to keep a visual model completely up to date with code that is changing at a rapid pace. The continuous invalidation of a diagram leads to the conclusion that it's not abstract enough, or that it's the symptom of other larger problems.

Coming back to the map metaphor, geographical maps emphasize one primary aspect, generally mapped across two dimensions. For example, elevation or depth with respect to sea level:

 
 or political borders:
 

Like in Wardley maps, architectural maps can use meaningful spatial dimensions, even if they don't correspond to a physical location like the squares on a chess board or a projection of the world. In a Wardley map one key dimension is visibility (to the user) as they capture a value chain where products use components which use other components or utilities.

Unlike the modeling happening in Domain-Driven Design, I'm referring mostly to models of the software itself here rather than models of the domain; domain models in diagram and code format are just one specific example of the activity.

Which maps does a team need?

To answer this question, consider the goals for a team to:
  • identify and capture what "good" looks like for them in this project
  • easily see divergence from that to correct it
  • take fewer decisions over and over; reuse patterns that have emerged, and have been tried and tested multiple times already in similar changes

Some would call this documentation, or diagrams, and they would try to fit into some formal notation to be completely unambiguous if these were published in some application for funding. These maps are lightweight, they are only meant to be internally used by a team, and there is no formal notation as the barrier to entry (much like Eventstorming involves everyone in a room with only a set of orange stickies).

I've seen a basic Miro palette emerging depending on what the team is comfortable with:

  • boxes and arrows, with a couple of style of arrows if necessaries
  • stickies to capture particular decisions (text)
  • color coding to track status (e.g. adopted as best known practice, team fully endorsed it, deprecated/suspicious)

Substitute your favourite digital whiteboard products. I have not attempted to do this in a office setting. I suspect Miro lowers the cost of change to make this feasible, both in a sense of not churning through paper and markers and in a sense of having a simple enough UX that can be picked up in a day to allow contribution.

The first one of this maps replaced an attempt at maintaining Architectural Decision Records. Software engineering involves continuously taking hundreds of decisions, in different places, at different scales. I suspect ADRs cover the very high level perspective, or consequential decisions; but they don't scale to a high number of decisions that delve into the inner workings of a smaller module without reaching line-of-code level. Different abstractions for different purposes.

ADRs are also immutable and go through a deprecation and superseding process. These maps are supposed to be mutated all the time, for refinement. The continuous application of new user stories applies the pressure to revisit decisions to better suit what we now one or two years after a product was created.

Some real world maps covering the same project

To keep scaling to a large number of decisions, I started classifying them into various different maps segregated by a specific aspect. There's no right choice on how to organize information into a hierarchy, but there always was a cognitive load limit on how many decisions can be quickly grasped or considered when looking at a particular component.

In more formal approaches these separate aspects are called views or viewpoints. In any situation they might arise to support a specific discussion rather than just because there wasn't enough space in another map. The hard step is possibly to move from one model to multiple models, while maintaining the same attention and ownership from the team.

Here are some examples. The main dimension has usually been user visibility, but from left to right rather than from top to bottom like in Wardley maps.

The context around $productName

Includes no details about the internals of the system being worked on, but only other projects or organizations that the current system integrates with. Many things are out of the control of the team in this map. They are also often not visible in code at all, including for example a list of clients of our API.

I took this name from the C4 model's system context diagram.

Bounded Contexts and their languages

Where does a vocabulary apply consistently? For example, do we have a UX language to cater to users and a separate, symmetrical and consistent language for the underlying domain model? And how do they differ from the languages used or imposed upon us by third parties? Are we conforming to another team's language, or introducing anti-corruption layers?

I took this name from strategic Domain-Driven Design.

Static architecture

The dependencies between different modules of the codebase. Imports, requires, use statements depending on your language of choice will ultimately define this. It's a static map because this information can be detected and distilled into the map without running any code. There are also additional decisions that are not represented in code: which committed folders are modules with a strong interface, and which is just a folder?

A key map to foster cohesion (inside a component) and keep coupling under control (as it makes high-level dependencies visible).

Testing map (or testability map)

What testing strategies are used consistently in any application layer? Unit testing classes or functions? Screenshot reference testing for the UI? What integrated tests are we using and what choices have been made for their setup or assertions?

Observability map

What modules produce useful logs and how can I access them? What is the difference between an error or a warning level log? What dashboards (sometimes disparate) we should link to? An entry point more than something that can visualize data directly, often linking to disparate tools from Grafana to Kubernetes dashboards.

Technology stack

Answers questions on specific of various programming languages and tools. Does not need to capture what can be enforced via into linting rules instead, but the set of decisions includes more than just conventions:

  • what safe subset of JavaScript or Typescript are we endorsing for usage?
  • What we will use to represent URLs or dates?
  • How do we mark deprecated code intended to be replaced?

The choices made here often have security, performance or other non-functional implications.

Process and ownership

This was not a complete list! The set of maps should be owned by the whole team that uses them, not by its tech lead only. Decisions on retirement of maps can then be taken together. 

Often, though the tech lead role has their senses oriented to detecting when a new map could be helpful; proposing its adoption at the right moment; and intentionally not filling it in completely to co-create it with the rest of the team. 

As part of the development process, the team self-organizes themselves to pick up user stories, and often has a refactoring checklist that they want to achieve before closing off their unit of work and move onto something else. Conceptual breakthrough might also have happened as part of delivering value, like a new data structure having been identified and tested successfully. While most of the notes will be deleted as the team moves on, some can be captured by refactoring the code to follow our understanding. And some are hard to fit at that level and can be captured by maps. 

Often items are marked with a specific color, indicating a pair or a subset of the team has stored these decisions but there is some catch up to do with the other team members so that they are all aware of the direction. 

Speculations

A gap I would have liked to be filled regards explicitly referring to maps at the beginning of a new unit of work, at some granularity. For example, consulting them when a new epic or new user story is prioritized. 

The results might be various:

  • speeding up as fewer decisions have to be taken, and maps constitute an enabling constraint. I've seen this happen empirically more by referencing existing code. There is a trap waiting as developers might pick an outdated item to copy from, and they rely on memory to come up with a recent reference rather than the old button no one has not touched in the last couple of years.
  • invalidation of decisions. New business requirements may require a change in architecture to support them, hopefully infrequently.
  • refinement. Some aspects we might find out of date, or obsolete, and maps could be simplified as a result.

One way to look at this process could be evidence-based: can we work within the existing architecture to deliver, or will we fail to do so? A spike can help us understanding which one is the case. However, failure is not binary: we might never encounter an absolute blocker to delivery, and yet spend sweat and tears before deciding a new approach is needed.

Thursday, February 27, 2025

Team learning session: surviving legacy code by J.B. Rainsberger

This is the description, and experience report, of an exercise I picked up many years ago and then used in a team I lead. I'll describe this in the present tense as I imagine applying the exercise in similar contexts. Credit to J.B. Rainsberger for coming up with this diabolical codebase and make us work on it!

Context

A team has been formed and has been working on a single product, for example a TypeScript monolith comprising a frontend and APIs.

The team is now tasked with taking over a set of services and frontends written in PHP (or some other programming language and ecosystem). It turns out this set of services is the flagship product of the organization, and it needs to support innovative business change. Yet, normal operations and incremental improvements had been mostly outsourced as part of the technical strategy for the last few years. 

The focus of innovation was elsewhere. Testing coverage is redundant, confusing or missing, depending on the area you are working on. Very few people in house know the traps of these codebases: we are now in the realm of legacy code.

Learning goal

The original exercise by J.B. Rainsberger is oriented to learning patterns to take over legacy code: inherited codebases that exhibit a lot of business value. If they don't have business value, then why working on it?

Specifically, the patterns regard:

  • characterizing and understanding code
  • testing it, at various level of scope from whole application to small units
  • refactoring, with a safety net in place to support those changes without introducing regressions.

There isn't an emphasis on shipping new features, and there is no product owner.

To the original goals, we add a separate one here with its own new difficulty level: learn a new programming language that most of the team has not worked professionally in before, or has not picked up for a long time. With a new programming language, there's also knowledge of a new set of tools for running, testing, or linting the code. For PHP alone, in this example, the toolset ranges from Composer to PHPUnit, PHPStan, or PHP-CS-Fixer.

The learning at all these levels translate in being ready to use these patterns and tools on real projects, making it much easier and safer to deliver business value on those.

Activity

The original problem statement gives this guidance:

  • Refactor this code to understand it
  • [Decide] when to refactor and when to rewrite, and how to do that safely.
  • lResolve the central conflict of legacy code: I need tests to refactor safely, but I need to refactor to write tests effectively.

I've been running these workshops back in the days of the Legacy Code Retreats, which is where the name came from. The setting diverges from a whole-day session, where iterations are handle by various pairs. In this case, the repository is forked at the beginning of the activity and several sessions can take place, for example with one or more hour per week reserved. The participants work together as an ensemble, but they could split into multiple rooms working independently if there are too many.

11? 12? What?!

A loose plan to follow can be:

  1. run the application. Does everyone understand the problem domain and what problem the code solves?
  2. introduce golden master testing for characterization. This step involves figuring out seeding and reproducibility, and how to achieve the isolation of automated tests from the outside world and its changing conditions.
  3. introduce further testing at lower levels. This step involves introducing tooling for running code easily, testing it, or performing static analysis; manipulating the folders and file structure safely; and refactoring the code to isolate the units under test.

I act as a language expert here, but not with an agenda in mind. What to learn is decided by what moves the participants want to take, and what they are missing to be able to do so. It also pays off to prompt the team to research rather than trusting anyone's memory or word: tests and experiments are the source of truth.

The fact that this is a completely new, fictional, and ugly codebase should help with the feeling of safety when raising lack of understanding. The code is supposed to be hard to understand and fragile. Once that is established we can then work on our own learning, directed to improve our situation. It's a very different framing that delivering features on real legacy code with a deadline in mind.

Resources

The original code repository used in legacy code retreats.

The canonical explanation of a golden master approach to characterization testing.

Specifically for PHP, it helps to understand how seeding random number generation works to achieve reproducibility.

Retrospective

Add to a board two prompts to allow reflections to appear concurrently. For example:

  • what did we learn today? Anything from a language construct that did not map easily to something I already knew, to a tool's use cases.
  • any ideas for next time? There are lots of potential directions of exploration, and we want to crowdsource the gaps the participants are starting to see so we can fill them. This helps getting into context quickly when we start a new hourly session at another date.

Team learning session: the (in)famous types folder

This is the description, and experience report, of an exercise I used in a team I lead. I'll describe this in the present tense as I imagine applying the exercise in similar contexts.

Context

The team has been working on a Typescript monolith application; Typescript is used exclusively on the backend and compiled with Node.js as a target.

Encapsulation and modularization have been promoted as important software engineering concepts, and some high-level architecture and modules are emerging. For example:

  • a clear write and a read side in a CQRS application
  • a third-parties module to isolate from external dependencies
  • a top-level evolving http layer to isolate the Domain Model and the rest of the code from HTTP and other delivery mechanisms.

The team is oriented to functional-programming, modeling some domain types but also keeping related functions close to those types for cohesion. Or they could be working within the object-oriented paradigm, similarly organizing the domain model and other code with types and classes.

For the rest of the discussion, the word module applies to units of encapsulation of varying sizes. In a TypeScript monolithic codebase, this could be a folder with a index.ts file; depending on its size, it could contain other sub-modules following the same structure. This exercise is also focusing on compile-time dependencies, mostly import statements.

The types folder refers to a (anti-)pattern where shared types are collected in a top level, or very high level, folder. The types can then too easily be imported anywhere, with a thick tree of dependencies towards the types folder being established, inadvertently coupling together disparate parts of the codebase.

Learning goal

The team is picking up encapsulation and information hiding at the small scale, encapsulating functions in a folder behind a index.ts and its export statements.

We want the team to expand this process to larger scale modules, sized at 1K to 10K lines of code: how much code can be hidden inside within these modules, or added to their public interfaces to elevate it as a contract between modules?

We also want the team to think inside a monolith as we are not sure there is enough capacity to address the overhead of separating repositories and services.

Activity

The original text of the exercises follows, edited for clarity:

Our src/types/ folder acts as a catch-all for types and related functions when we do not find an encapsulated place for them to be placed into. While our architecture is often changing for the better, over time these global types remain visible everywhere in the monolith's codebase even if hiding them could help working on the code that does not need to know about them.

For each of the .ts files linked here, work as an ensemble to ask the following questions before deciding any refactoring move:
  • Where are the contents of this file currently used?
  • Are all the usages legitimate?
  • Are these file contents addressing too many responsibilities?
  • Do the names reflect the intended responsibilities or usage?

Once you decided on a new location for the file (or for part of it), ask:
  • Do the resulting import rules agree with our architectural maps?
  • If we change a decision within the newly encapsulated file, how far does the change propagate?
This is not a backlog to clear: do not rush to cover them all (there are 32 in total anyway). The goal is to identify patterns for architectural refinement.
Some example of problematic types we found when running this exercise, with real named and an general explanation:
  • RecordedEvaluation, Group: used only in module A and module B that depends on A. Yet globally visible.
  • GroupId: used in multiple modules that relate to writing and reading to a database, yet visible to the HTTP layer.
  • EvaluationType, ArticleServer: union types of a few strings, global for convenience but also only used in a few modules.
  • CommandResult: a return type and part of the contract between module C and D. Module C depends on module D. Should this be defined by C, D, or a third module to break the dependency completely?
  • DescriptionPath:  a branded and validated string. The same problem could be solved by ensuring validation happens in the right modules. Should this type be simplified to a string?


A map of (compile-time) dependencies between high-level modules, intentionally zoomed out. Dependencies are generated by import statements originating in one module and reaching out for a type or a function contained in another module at any level of nesting.
Resources

A small guide to barrel files (index.ts files containing only export statements) can be useful. They are often mentioned as having performance implications, but if there are any measurable drawbacks they should be traded-off with maintenance.

dependency-cruiser can be used to help map out the current dependencies of a codebase, generally at a very fine-grained level rather than top-level modules.

Retrospective

A couple of questions can be asked here in a round robin, or for lack of time written as prompts on a digital board for participants to brainstorm about for a few minutes.

As often in quick retros, one question is looking backwards into what has happened and one is looking forwards to improve our practices:

  • did you see a type which looked reasonable before today, but now is a target for refactoring?
  • what would you be differently tomorrow when you create a new type?
The reflections that emerge from all the participants help set a shared direction for the architecture that you want to see. Enjoy!

Monday, February 24, 2025

Ensemble programming roles and cues

In the last few years I've been leading a team that has adopted ensemble programming and pair programming as the main practice to deliver production code. We have evolved cues that have a precise meaning for team members and that make communication more effective, especially when some of these phrases are used every few minutes.

I considered various terms to indicate this set of phrases, and discarded most of them:

  • mantra (a slogan repeated frequently,  but often to yourself, and with spiritual implications),
  • shibboleth (implies a closed group having secret handshakes and hiding from an external force)
  • catchprase (a signature phrase associated with a specific character)
  • formula (works in other Romance languages but has many other meanings in a English context)

Formula could be considered a loan word from Italian, but I think its original etimology fits:

1630s, "words used in a ceremony or ritual" (earlier as a Latin word in English), from Latin formula "form, draft, contract, regulation;"

To keep with the metaphor of an ensemble of actors or musicians performing, I found cue was fitting too:

the trigger for an action to be carried out at a specific time

A few roles in ensemble programming

The rotation of roles visualized on a Miro board

This experience is in the context of remote ensemble programming, where there is a single audio medium shared across a virtual room, and limited surface space for visual cues such as body language or facial expression. One of two monitors, or part of a monitor, would usually be dedicated to 2 or 3 other camera feeds, with a digital whiteboard or an IDE on a shared screen being the place of operation. This isn't to say that paying attention to your colleagues isn't prioritized; rather than making use of continuously-improved verbal language is one way to achieve that.

Within this setting, a few roles are assumed by the team members participating into a particular ensemble:

  • the Driver inputs all changes into a working copy of the codebase.
  • the Navigator has the responsibility to coordinate the group into the next decision it needs to take, and verbalize all changes for the Driver to enact.
  • other ensemble members assume roles as needed and without even noticing, actively contributing to the discussion. They might help with diagrams or note-taking, or notice code smells or other people dynamics since they can dedicate their attention to something different than the next code change.

In this team, a rotation based on committing and pushing the last change emerged. A strict rotation where a navigator remains in that role for the whole duration of the cycle also emerged, mostly to keep a level playing field across all levels of seniority and giving the space to everyone to lead the next change.

The cycle described here is the one of starting a new screen share session; pulling the latest version of the code; making a change and pushing it to make it available to the rest of the group. This micro-iteration has an associated cycle time, which is where I believe the term came from. After each cycle, the previous Navigator would rotate in to become a Driver; someone else would become a Navigator.

Many of these cues are commonly uttered from the Navigator or the Driver, but they are not limited to them. Consider this list harvested in a design patterns sense: extracted from repeated real world experience; no claim to completeness. It is however limited to a single team, operating in various context and changing its membership over time. Crediting Kevin Rutherford for introducing ensemble programming when this team was newly formed at the beginning of 2020, and helping the team honing in on particular solutions over time.

At the beginning of a cycle

"What do you want to see?"

The Driver may start a cycle asking this to the Navigator, to relieve pressure on deciding a direction quickly. To ease into the role of Navigator, there is little downside in spending time observing or understanding some area of the code, rather than attempting to change it immediately.

In the context of new ensemble members, this cue also relinquishes control explicitly to the Navigator. If you are used to the person with control of the keyboard showing what they mean, you will instead seen a Driver waiting for instructions.

"I'm just a pair of hands" or "Waiting for instructions"

Indeed, the Driver uses these or other cues to remark how they are not taking decisions just because they happen to hold the keyboard or the mouse that can perform changes. The Navigator has the responsibility to coordinate. The driver can focus on efficiently executing a move.

"Let's go to the IDE/"the code"/Github/maps/Miro"

At the beginning of a new cycle, or triggering a new cycle, an ensemble member suggests to change the visual support to fit the discussion. This might involve not sharing a screen anymore if that medium supports collaboration natively e.g. Miro as a digital whiteboard.

For example, Github could be useful to navigate unfamiliar repositories; a whiteboard is vital for remote Eventstorming or for diagrams; maps indicate a set of long-lived architectural diagrams for context, dependencies or separation of concerns.

During a cycle

I think this is where the phrase comes from. But lack of verbalization of intention should be the exception, not the rule.

"Make it so"

The group has shared understanding of a change that needs to be applied. For example, we slice a change into many small steps; there is low risk because we already performed a few of these slices; uncertainty might lie more related to what can be discovered from the compiler or a test suite reacting to the change; or a refactoring move may need to be applied to a different part of the codebase, in what we suspect will not require new design decisions.

After having stated the change they'd like to see, the Navigator hands over control to the Driver with this cue, and lets them loose in achieving it with their preferred tooling: VSCode, Vim, grep, even an LLM. The separation of concerns is between the destination, and how to get there.

"Rename the ... class/function/method/variable and all references"

The Navigator directs the Driver at a high level of abstraction. The Driver is a intelligent IDE and applies this change predictably as they both understand what the outcome should look like. In case the Driver runs into trouble such as unforeseen situations, they stop and ask for more precise direction.

"Can we scribe on the board something about ...?"

The Driver or the Navigator asks other members of the ensemble to take a note about something that we should look at later. They could just ask to take a note, but we picked the word scribe from Eventstorming practices to identify whoever is currently responsible for creating artifacts to a shared digital board.

Assigning this role, for example, to the third ensemble member in the rotation helps avoiding concurrency clashes, with multiple people trying to take the same note.

"From the back of the room, ..."

Other roles than Driver or Navigator hand out suggestions or considerations, without wanting to interrupt or override the navigator's intention. The Navigator still coordinates making a decision, but the other ensemble member can contribute specific knowledge or cues that help the group make progress without taking away the opportunity for the Navigator to exercise their skills.

The expression comes loosely from the phrase "leading from the back of the room"; I've seen this role referred to as the Rear Admiral in some ensemble programming literature, turning the Driver and Navigator rallying metaphor into a naval one where everyone helps running the ship.

"... can be a rabbit hole"

A problem we face is recognized as something that might be taking away from the momentum of the group, or its overall importance is controversial. It might be more productive to mitigate it now, rather than digging into its ultimate causes. The group can agree to postpone the discussion of this problem, to focus on the next test to write or to make green. If the problem makes it very difficult to make progress, it will keep emerging. Rabbit hole is just internet slang for an engrossing and time-consuming topic such as programming language trade-offs; a troublesome library upgrade; a non-deterministic test; or large-scale architectural changes that require a lot of evidence collection and consideration.

At the end of a cycle

Group peer pressure helps doing better than that

"Anything else for this cycle?"

The Driver is asking the Navigator about whether the intended scope has been achieved. Confirmation will start a commit process if no other checks or changes are necessary.

Sometimes the Driver acts as a preemptive time keeper, reminding everyone uncommitted code is work in progress and the bigger it gets, the higher the risk of progress being lost if it triggers inadvertent behavior changes in the codebase.

"End of cycle" or "Let's rotate"

The Navigator decides it's time to rotate the roles, considering what has been learned in this cycle enough to perform a switch without loss of context. The cycle might have been focused on investigation or reading code, given in this case it ends without a commit. Even if there is an attempted change at play, the knowledge gathered can led to the next cycle to regenerate the same git diff quickly, or to find a safer or easier path through that change.

"make check and commit" 

The navigator communicates it's time to commit what we have done. Substitute make check with your local testing command of choice; it's just a team convention to capture a set of compilation, static analysis and tests that we deem acceptable to run before every commit. Other more specific or intensive testing can be left to a Continuous Integration build as a safety net that will constitute a longer feedback loop. Normally, the commit is followed by a push and a default end of the current cycle.

"What [commit] message would you like?"

A Driver understand we are ready to commit a change, and invites the Navigator to summarize and describe it. This is an opportunity, if the team wishes so, to try other cues to repeatedly nudge towards a certain behavior and getting into a new habit. For example, you could experiment with having Drivers ask "Why are we doing this change?" for a whole day.

Featured post

A map metaphor for architectural diagrams

It is a (two-dimension) representation of a pipe. The map is not the territory , but in software engineering terms they are models of it....

Popular posts