A monorepo misconception - atomic cross-project commits

Posted on 2021-07-21 in General

In articles and discussions about monorepos, there's one frequently alleged key benefit: atomic commits across the whole tree let you make changes to both a library's implementation and the clients in a single commit. Many authors even go as far to claim that this is the only benefit of monorepos.

I like monorepos, but that particular claim makes no sense! It's not how you'd actually make backwards incompatible changes, such as interface refactorings, in a large monorepo. Instead the process would be highly incremental, and more like the following:

Push one commit to change the library, such that it supports both the old and new behavior with different interfaces.
Once you're sure the commit from stage 1 won't be reverted, push N commits to switch each of the N clients to use the new interface.
Once you're sure the commits from stage 2 won't be reverted, push one commit to remove the old implementation and interface from the library.

There's a bunch of reasons why this is a nicer sequencing than a single atomic commit, but they're mostly variations on the theme: mitigating risks. If something breaks, you want as few things as possible to break at once, and for the rollback to a known-good state to be simple. Here's how the risks are mitigated at the various stages in the process:

There is nothing risky at all about the first commit. It is just adding new code that's not yet used by anyone.
The commits for changing the clients can be done gradually, starting with the ones that the library owners are themselves working on, the projects that are most likely to detect bugs, or the clients that are most forgiving to errors. Depending on the risk profile of the change, you might even use these commits as a form of staged rollout, where you'll wait to see if the previous clients report any problems in production before sending the next batch of commits for code review.
The final commit to remove the old implementation can only break a minimal number of clients: the ones that just started using the library between the removal commit being reviewed and pushed, and did so using the old interface. The ideal environment would have tooling in place to prevent that kind of backslipping from happening in the first place (e.g. lint warnings on new uses of deprecated interfaces).

If anything goes wrong in stage 2, it's trivial to revert a commit that's only touching a couple of files. By contrast, reverting a commit that's spanning hundreds of projects would be quite painful, especially if the repo has any kind of per-directory ACLs (which I think is mandatory for a big monorepo). It gets worse if the breakage isn't detected immediately, since the more code that the single change is affecting, the less likely it's that the reversion applies cleanly.

If anything goes wrong in stage 3, it would also have gone wrong when using atomic commits. But with atomic commits the breakage in stage 3 is far more likely, since the new users will naturally use the old interface (the new one doesn't exist yet in their view of the world), and since the window between start of code review and committing will be wider. And again, the rollback will be far easier with the commit that's only touching the library and not the clients.

There's some additional reasons for why the huge commit will be annoying. For example getting a clean presubmit CI run will become progressively harder the more projects a single commits is changing.

Sure, the atomic commit will save a little bit of work in not needing to have the implementation support both interfaces at once. But that tiny saving is just not a worthwhile tradeoff when compared to how much work wrangling the huge commit would be.

It's particularly easy to see that the "atomic changes across the whole repo"story is rubbish when you move away from libraries, and also consider code that has any kind of more complicated deployment lifecycle, for example the interactions between services and client binaries that communicate over an RPC interface. Obviously you can't do an atomic change in that case, since you need to continue supporting the old server implementation until all client binaries have been upgraded (and are rollback-safe). The same goes for changes to database schemas, command line tools, synchronized client-side Javascript + backend changes, etc.

I think it's true that monorepos make refactoring easier. So that's not the problem. It's also true that they have atomic commits across projects. But the two facts have nothing to do with each other. The reasons monorepos make refactoring simpler all boil down to everyone in the organization having a shared view of what the current state is:

A monorepo will, in practice, mean trunk-based development. You'll know that everybody really is on HEAD rather than actually doing their development on some year-old branch.
And conversely, you'll know that every user of the library is using your library from HEAD rather than pinning it to some year-old version.
It's trivial to find all the current callers, so that you know which clients need to be updated. (Once you've solved the highly non-trivial problem of having any kind of monorepo tooling at scale, of course.)

In theory you could do the exact same thing with multirepos assuming sufficient tool support, discipline about code organization, enforced trunk-based development in all repositories, a master list of all repositories in the org, and defaulting to all repositories being readable by every engineer with no hidden silos. That's all technically doable, but I suspect not culturally compatible with using multirepos in the first place.

Where does this misconception come from? It's certainly present in the Google monorepo paper, which somewhat contradicts itself on this. On one hand, they describe exactly this form of atomic refactoring as a benefit of monorepos:

The ability to make atomic changes is also a very powerful feature of the monolithic model. A developer can make a major change touching hundreds or thousands of files across the repository in a single consistent operation. For instance, a developer can rename a class or function in a single commit and yet not break any builds or tests.

But when it comes to the actual refactoring workflow is, the process that's described is quite different:

A team of Google developers will occasionally undertake a set of wide-reaching code-cleanup changes to further maintain the health of the codebase. The developers who perform these changes commonly separate them into two phases. With this approach, a large backward-compatible change is made first. Once it is complete, a second smaller change can be made to remove the original pattern that is no longer referenced.

I suspect what happened here was that the atomic commits were identified as a benefit in the abstract, with refactoring being used as an illustration of a use case. This was then quite understandably read as a practical example of how you'd work with a monorepo.

There might be a few cases where atomic commits across the whole repository are the right solution, but it has to be exceedingly rare. The example of renaming a function with thousands of callers, for example, is probably better handled by just temporarily aliasing the function, or by temporarily defining the new function in terms of the old. (But this does suggest that languages, both programming languages and IDLs, should make aliasing and indirection easy for as many constructs as possible).

Are there organizations with a large monorepo where atomic cross-project commits are routinely used to change both the implementation and the clients?

Name
Message
	As an antispam measure, you need to write a super-secret password below. Today's password is "xyzzy" (without the quotes).
Password

A monorepo misconception - atomic cross-project commits

Comments