An atomic feature flow, flexible and safe branching strategy[^2]

tl;dr

Merging back to the main branch can create virtual dependencies¹ between teams working on the same software component, and stop your delivery pipeline. If you keep walking on each other’s feet, atomic change flow can help you.

The basic idea

Instead of delivering from the main branch, atomic feature flow advocates delivering from a disposable release branch, following these rules:

The master branch is always in sync with code in production
Features are implemented in feature branches created from master. It is encouraged that they’re small and short-lived, but not a requirement.
A new release branch is created from master. This is the branch your CI pipeline is hooked up to.
You assemble a release by merging feature branches which are ready into release.
No work (including fixing bugs or merge conflicts) is ever done directly on a release branch, since we want to be able to delete it without looking back. It is only done in feature branches.
When release is passing all the way through the CI pipeline, and is deployed on production, it is merged back into master, and all feature branches embedded in it are deleted.
If the release branch is not passing through the CI pipeline, investigation is done to understand which feature branch is creating the issue. Then we unclog the delivery pipeline:
- by insisting: either it is a minor fix and nothing’s blocked, and we try to fix issues in the defective feature branch, then update release consequently,
- or by abandoning the release branch: by deleting it and re-creating a new one from master, then merging only non faulty feature branches.

Flow diagram

The main difference with git-flow is that we do not deliver from master but rather from the release branch. The point being that we maintain atomicity of changes in the feature branches until they’re deployed, making roll backs easier.

The full stuff

There are lots of solutions when it comes to source control and branching strategy. But there are few that I’ve found very flexible and adapted to daily releases. Our needs were the following:

Support a daily deployment process
A bug on a feature being developed shouldn’t block the deployment process
Be able to recompose a release depending on what works and what does not
If development of feature B started after feature A, I still want to be able to deploy feature B first
We have a limited number of test environments, our current architecture limits the ability to spin off new ones (something we’re working on) and they cost quite a lot (something we’re working on)

The closest thing corresponding to our needs would be git flow. But not quite. It is much more complex, and at the same time does not allow you as much flexibility.

One could also argue for trunk-based development + feature toggles. Although it’s a fairly flexible solution, I don’t think feature toggles apply on all cases. While I genuinely recognize the value for specific release management perspectives, I don’t think they are a one-size-fits-all kind of solution. My main concern with these is that

They decrease testing efficiency by creating combinations that cannot be tested — while the code is the same between environments, you’ll have configuration that directly impacts the behaviour of the application.
You start doing source control operations directly into the code, which is kind of hacky.
It increases overhead while trying to reduce it, by a) requiring that you cleanup the code after some time which can be a very tedious and risky business and b) making it less obvious what a change is about.
They’re very impractical for some changes like cosmetics or bug fixes.

So we started from that and came up with the simplest, most flexible thing that you could actually do.

In short, the whole concept is to keep atomicity of our change requests as far as we can in our source control. It’s a bit like ingredients in a cake. As long as they’re separate, you can do whatever you want to each of them individually (wipe, heat, shake, etc.). But once they’re mixed, you’re forced to do the operations to the whole, and it might prove more complex. And if one of the ingredients is actually stale, then the whole is stale.

Applied to branching workflow, we basically reverse the usual dependencies of master/release/feature branches:

the main branch (further called master) contains a stable version of your application. Something that has already been QAed and that you would be proud to push to live, or in our case that you already pushed to live. You must have a super-high level of confidence that what’s in there is sane and you’ll not want to do roll-backs on that thing. It is crucial to that whole thing.
any new feature or bug correction goes into a separate “change request branch”. We try to keep these small and at the same time too small makes them impossible to manage. In our case these correspond to an EPIC, and an EPIC never has more than 4 user stories or 8 to 10 bugs.
from master we pull a release branch. We assemble a release by merging change request branches to that release branch. Our infrastructure automation is set to trigger a new build and deploy it to a test environment every-time there’s a commit to the “release” branch, so effectively when we assemble the release and wait for a few minutes, the test environment is cooked with a new release corresponding to what we picked. We have two test environments so two release branches (called respectively whistler and blackcomb). But you could imagine that you pull out release branches and then have to deploy it manually, or orientate the infrastructure automation to select the release you want to deploy.
developers code on their own sub-branches created either from master or from the change-request branch, then merge to the change request branch (using pull request or whatever process you want)
if the release is stable and QA passes, we push the package that was tested to the live environments, then merge the release branch to master and wipe out the feature branches. Any other feature branch may update its source from master (though we’ve determined that most than not it’s unnecessary)
if the release fails QA, any correction is brought to the change request branch having failed QA or caused a regression, then merged down to release branch. If correcting that branch is going to take some time, then we wipe the release branch (by actually deleting it and re-branching from master) and re-assemble a release without that change. The trick is to never change the release branch directly so that you keep the ability to nuke it if necessary.

Release containing multiple features

Now to make that work there’s a very limited set of rules to follow:

You build and test from the release branch. Ideally, infrastructure automation is set to trigger a new build and deploy it to a test environment every-time there’s a commit to it. So effectively when we assemble the release and wait for a few minutes, the test environment is cooked corresponding to what we picked. In our case, we have two test environments so two release branches.
developers code on their own sub-branches created from the change-request branch, then merge to the change request branch (using pull request or whatever process you want)
if the release is stable and QA passes, we push the package that was tested to the live environments, then merge the release branch to master and wipe out the feature branches. Any other feature branch may update its source from master (though we’ve determined that most than not it’s unnecessary)
if the release fails QA, any correction is brought to the change request branch having failed QA or caused a regression (NOT on the release branch), then merged down to release branch. If correcting that branch is going to take some time, then we wipe the release branch (by actually deleting it and re-branching from master) and re-assemble a release without that change. The trick is to never change the release branch directly so that you feel comfortable to nuke it if necessary.

Abandoning a release and restarting from master

For people familiar with GitFlow, it almost looks like it, but with an intermediary release branch, which is the one getting deployed to test environment(s).

The beauty of that model is that it’s very natural, and straight forward to understand. Its advantages are severalfold:

it’s simple, really simple. You merge only one way, and you can never get blocked. For developers it’s easier than the usual model (once used to it): you branch out of master, you push to the change request branch you work on, case closed. For release managers it’s easier: you pick from branches and assemble your releases functionally rather than just bet on people pushing their code where you asked. Simple is good.
it’s straight to the point: you pick your features and keep roll back capabilities without any complex and risky cherry-picking or revert/reset operations. Just because you only merge, you don’t rollback.
patching or releasing is the same process. There is no exception. You just do the same thing over and over again. A patch may be a very, very small release whereas a release might be a gargantuesque assembly of features. Both are the same thing. No maintenance branch, no develop, etc.
it’s easily adaptable to any context, and gives an angle to bring the magic of reduced WIP where it’s not. Whether you do several releases a day or one every six months, you can still use that to assemble releases and focus your development to very small batches
they can be tuned to the infrastructure automation and the infrastructure itself: whether the build is manual or fully automated, the presence of one or several test environments, you can use it.
if you want to create several releases (say, 1.2.3 and 1.2.4) at the same time, you can, and it’s entirely natural to re-scope a feature from one to the other. You could for example create release 1.2.4 from 1.2.3 instead of master. In which case the same rules apply, nuking 1.2.3 means you’ll also nuke and re-generate 1.2.4 once you’re done. Nuking 1.2.4 just means you wipe it from 1.2.3. And updating 1.2.4 means merging 1.2.3 into 1.2.4
depending on your own preference, whether you like having 100s of individual commits or much less corresponding to features, or much less corresponding to releases, you can do both. It works with merge –no-ff if you want. You might even do both using a History branch in addition to master to keep all commits. Or not. Your choice.
the migration path to this model is simple as well. Pick what’s live and make that master. Pick what is on your current sub-branches and make these change-request branches. Start assembling release branches depending on how your IA is built. Any subsequent feature or bug-fix or bug-group may go to its own change-request branch.
it’s so simple it even deserves to be mentioned a second time.

There are obvious objections that first made us cautious about such a transition, but when you really try to follow the train of thought, they’re actually all very simply overcame:

dependency between different source controls: if you have product A and product B stored in different repositories or even different source control solutions, the multiplied number of branches and the need of synchronicity may be a hassle to manage. That is true. But thinking in terms of feature instead of release is actually shifting that problem to the level of infrastructure automation. Automating creation of branches or merge operations is fairly basic (bonus: I’ve created a Hubot module that creates change-request branches from webhooks, so that moving an EPIC in Jira triggers that for example, and lets you do the merge in all github repositories from slack. You tell it: hubot merge CR-1221 into Release-1 and it’s gonna do it everywhere. I’ll make it public as an NPM module soon.
dependency between change requests: there are really three cases:
- a) change A depends on change B, so in any case you will need B to be ready before A can be pushed. Still, de-correlating both changes gives you a better control. As far as you’re concerned you can merge change B into change A’s branch.
- b) change A and change B are touching the same code base, which creates merge conflicts. In this case: either they both are expecting the same code change, in which case you didn’t decompose enough in the first place and you would put that first change into change C, then fallback onto case a) (change A and change B depend on change C so you’ve got to deploy these linearly). Or they touch it independently but that creates a conflict (e.g. resource file) in which case either you may group both features, deal with it with a change C strategy, or just have the first feature ready be merged to the other guy to resolve the conflicts.
- c) they touch it independently but that creates a conflict (e.g. resource file) in which case either you may group both features, deal with it with a change C strategy, or just have the first feature ready be merged to the other guy to resolve the conflicts. Try to keep this situation at a minimum. You don’t want to be too linear, and ideally you should finish what you started (i.e. deploy one bit = merge) before you start something new (i.e. start the new feature)

Either way these conflicts are, from experience, very straight forward to resolve if you understand what you’re doing.

Dependency between two changes

large features, or a large number of features will result in an important merge effort. (alternatively - “it’s gonna be a merge horror story”). That’s all on your ability to structure work into small batches, which lots of people explain much better than me are almost magical. Said otherwise: if you keep features small and in small number, it’s not going to happen. Of course if your release cadence is 4 months, it’s going to be a small problem. In which case, you can consider having an intermediary between master and your release branches that would be considered an internal master. In that scenario, your feature branches are created from that intermediary, and the intermediary is merged to master on each actual release.

What strikes me the most with that model is its overall universality. There must be cases where it does not work, but on top of my head I can’t seem to find any.

Notes

That is, one team can be blocked by another because their code is broken and prevents pushing to production, rather than because their code are inter-dependant. ↩

tl;dr

The full stuff

Notes

Related