The ultimate git merge vs rebase tutorial

Back in September 2017, another post titled “Why you should stop using Git rebase” had made it quite high on Hackernews. As always, it only tells half of the story. So I started writing this post, and four months later it’s finally finished.

First, the basics.

Keeping branches up to date
- Merge
- Rebase
Getting your work into master
Squashing
- Squash on merge
- Squash on rebase
Discussion
Recommendations
Conclusion

Keeping branches up to date

Say you have a feature branch based on a master branch, and commits have been added to master to fix a bug. Now your feature branch is trailing master, and you need those bug-fixes ASAP.

Merge

When you merge a branch into another, generally speaking you combine both histories, and fix any conflicts in a merge commit. There are a few different merge semantics, which we will explore in the next section. It’s easier with examples.

Two branches have diverged, and so merging combines the histories with a merge commit.

Rebase

git rebase <branch> applies the commits from the current branch onto the specified branch.

If the specified branch hasn’t changed, rebasing is a no-op, it does nothing. But if the specified branch has changed, then rebasing will apply the commits from current branch onto the head of the specified branch (“replaying”).

Two branches have diverged, and so rebase replays the current branch’s commits on the head of the specified branch, creating new commits.

For this to work, rebase is actually creating brand-new commits with the same changes, but new commit hashes and new timestamps. If there’s a merge conflict, it must be fixed when the commit is applied. The merge resolution is absorbed into the new commit, instead of creating a merge commit. Rebasing is very powerful, and has almost no limits what can be done during a replay. We will explore later in squash on rebase. The benefit of rebasing is that the branch is cleanly ahead of the other.

Don’t mix merging and rebasing on the same branch. Did you know rebases are performed from the earliest common commit and not the last? So if you merge master in, and then later rebase onto master, that will undo your merge commits and you’ll have to fix the conflicts again. It gets worse with more complicated histories. Pick one strategy, and stick with it. (Only on the same branch, e.g. rebasing onto master if it contains merges is fine, as long as those merges aren’t with the current branch.)

Getting your work into master

Talking about git can get confusing. So I have more pretty pictures, and a small script. It initialises a new repo called test, and creates two feature branches in a way that might happen with developers working on different features at the same time, over a given time period. The initial commit A is the basis. Time-wise, first commit B is created on the feature1 branch, then D on the feature2 branch, then C on the feature1 branch, and finally E on the feature2 branch (A < C < B < D).

Two branches, the first called feature1 with commits B and C, the second called feature2 with commits D and E, both based of master with commit A. Commits B and D have a conflict.

This will be the initial state for all examples, which you can follow along with if you wish. Let me also introduce two variations on the git log command that will help. You can run at any time to give you a good overview of the history of the repository. The first, git log --pretty=format:'%s', prints out only the commit messages. This is compact and allows you to easily compare your results to mine, ignoring commit author names or commit hashes. The second, a slight variation, git log --pretty=format:'%s' --graph, has git display the history as a graph, which is useful for visualising where branches diverged and joined.

Merge, default/fast-forward

The default merge behaviour of git is to perform a fast-forward, so that commits without conflicts are simply absorbed into the branch as they are, and no merge commits are created. This is the case with commits B and C (diagonally filled).

$ git merge feature1
Updating xxxxxx..xxxxxx
Fast-forward
 src.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
$ git log --pretty=format:'%s' --graph
* C
* B
* A

feature1 was merged into master without a merge commit.

If there are conflicts, a merge commit is still necessary. Since commits D and E were based on A, but changes to the same lines were made in B and C, we have to resolve this conflict.

$ git merge feature2
Auto-merging src.txt
CONFLICT (content): Merge conflict in src.txt
Automatic merge failed; fix conflicts and then commit the result.
$ # fake fixing the conflict, we know it should end up as "E"
$ echo "E" > src.txt && git add -- src.txt && git commit --no-edit
[master xxxxxx] Merge branch 'feature2'

Commit D on feature2 conflicted with commits B and C on feature1 and so a merge commit was necessary.

This seems good, until we look at git log.

$ git log --pretty=format:'%s'
Merge branch 'feature2'
E
C
D
B
A

The linear history is based on the time of the commit, and so we get the order A, B, D, C, E, and M1. Which is correct, but confusing. When using a graph view, things are clearer. But because of the fast-forward behaviour, it looks like commits B and C were always part of master (indicated by the diagonally filled commits). Super confusing - or it will when the feature branch is deleted/never checked out/otherwise missing.

$ git log --pretty=format:'%s' --graph
*   Merge branch 'feature2'
|\
| * E
| * D
* | C
* | B
|/
* A

Thanks to the merge commit, we know where commits D and E come from. This is a huge benefit.

Merge, no fast-forward

So the solution to the shortcomings of the default merge behaviour is obvious: force every merge to produce a merge commit, and never fast-forward.

(Remember to reset the repo to the initial state! You can do this by deleting the directory and re-running the script.)

Two branches, the first called feature1 with commits B and C, the second called feature2 with commits D and E, both based of master with commit A. Commits B and D have a conflict.

$ git merge feature1 --no-ff
Merge made by the 'recursive' strategy.
 src.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

feature1, although fast-forward-able, was merged into master with a merge commit.

$ git merge feature2 --no-ff
Auto-merging src.txt
CONFLICT (content): Merge conflict in src.txt
Automatic merge failed; fix conflicts and then commit the result.
$ # fake fixing the conflict, we know it should end up as "E"
$ echo "E" > src.txt && git add -- src.txt && git commit --no-edit
[master xxxxxx] Merge branch 'feature2'

feature1, was merged into master with a merge commit.

When viewed as a graph, the merge commits indicate clearly when the feature branches were merged in.

$ git log --graph --pretty=format:'%s'
*   Merge branch 'feature2'
|\
| * E
| * D
* |   Merge branch 'feature1'
|\ \
| |/
|/|
| * C
| * B
|/
* A

The linear git log is still ordered by timestamp, and so commits B, C, D, and E are still interleaved.

$ git log --pretty=format:'%s'
Merge branch 'feature2'
Merge branch 'feature1'
E
C
D
B
A

Using --no-ff is highly recommended. It’s such an improvement, and so prevalent that Github just does exactly this by default.

Rebase

There is a third option. Since git rebase master applies the commits from the current branch on top of master, the resulting branch can always be fast-forward merged into master. This is the opposite of the above.

(Remember to reset the repo to the initial state! You can do this by deleting the directory and re-running the script.)

Two branches, the first called feature1 with commits B and C, the second called feature2 with commits D and E, both based of master with commit A. Commits B and D have a conflict.

Since commits B and C are based on commit A, there is nothing to be done, and the rebase is a no-op. (Rebasing never merges the commits into the other branch, so merging is still required.)

$ git checkout feature1
Switched to branch 'feature1'
$ # print hashes (yours will be different throughout)
$ git log --pretty=format:'%s (%h)'
C (61dcdc9)
B (32acae7)
A (08229ff)
$ # make sure feature1 is up to date with master
$ git rebase master
Current branch feature1 is up to date.
$ git log --pretty=format:'%s (%h)'
C (61dcdc9)
B (32acae7)
A (08229ff)
$ git checkout master
Switched to branch 'master'
$ git merge feature1 --ff-only
Updating xxxxxx..xxxxxx
Fast-forward
 src.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

feature1, was fast-forward merged into master without a merge commit.

However, now commits D and E clash with the commits on master and will be updated.

$ git checkout feature2
Switched to branch 'feature2'
$ git log --pretty=format:'%s (%h)'
E (1376c86)
D (5e84a89)
A (08229ff)
$ # make sure feature2 is up to date with master
$ git rebase master
[...]
$ # fake fixing the conflict, we know the first commit of feature2 changed it to "D"
$ echo "D" > src.txt && git add -- src.txt && git rebase --continue
Applying: D
Applying: E
$ git log --pretty=format:'%s (%h)'
E (cc58c69)
D (4c886de)
C (61dcdc9)
B (32acae7)
A (08229ff)
$ git checkout master
Switched to branch 'master'
$ git merge feature2 --ff-only
Updating xxxxxx..xxxxxx
Fast-forward
 src.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
$ git log --pretty=format:'%s (%h)'
E (cc58c69)
D (4c886de)
C (61dcdc9)
B (32acae7)
A (08229ff)

feature2, was fast-forward merged into master without a merge commit.

Technically, the second feature branch is no longer the one it was. Unlike when feature1 was rebased, feature2 needed conflict resolution, and commits D and E are actually newly rewritten commits (indicated by an asterisk). You can see above that the commit hashes have changed.

$ git log --pretty=format:'%s' --graph
* E
* D
* C
* B
* A

With rebasing, because of fast-forward merging, the information about which commit was in which branch is lost. Basically, you’d better hope your commits make sense on their own.

Squashing

Squashing is a way to mash several commits into one. Say you have a feature branch, and you’ve made some changes (after a code review, or fixing bugs), but it doesn’t make sense to commit these small bug-fix commits to master. Squashing is for you.

Squash on merge

(Remember to reset the repo to the initial state! You can do this by deleting the directory and re-running the script.)

Two branches, the first called feature1 with commits B and C, the second called feature2 with commits D and E, both based of master with commit A. Commits B and D have a conflict.

$ git merge --squash feature1
Updating xxxxxx..xxxxxx
Fast-forward
Squash commit -- not updating HEAD
 src.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
$ git commit
[master xxxxxx] Squashed commit of the following:
 1 file changed, 1 insertion(+), 1 deletion(-)
$ git merge --squash feature2
Auto-merging src.txt
CONFLICT (content): Merge conflict in src.txt
Squash commit -- not updating HEAD
Automatic merge failed; fix conflicts and then commit the result.
$ # fake fixing the conflict, we know it should end up as "E"
$ echo "E" > src.txt && git add -- src.txt && git commit
[master xxxxxx] Squashed commit of the following:
 1 file changed, 1 insertion(+), 1 deletion(-)

feature1 and feature2 were squashed into master as a new commit.

The default commit messages are a bit useless when printing the first line, and I recommend writing something about the feature on the first line. But this also results in a clean graph.

$ git log --pretty=format:'%s' --graph
* Squashed commit of the following:
* Squashed commit of the following:
* A

Note that squashing a branch with only one commit will still produce a new squash commit (as opposed to simply fast-forwarding the commit, so --no-ff is not needed with --squash):

$ git checkout -b feature3
Switched to a new branch 'feature3'
$ echo "F" > src.txt && git add -- src.txt && git commit -m "F"
[feature3 xxxxxx] F
 1 file changed, 1 insertion(+), 1 deletion(-)
$ git checkout master
Switched to branch 'master'
$ git merge --squash feature3
Updating xxxxxx..xxxxxx
Fast-forward
Squash commit -- not updating HEAD
 src.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
$ git commit
[master xxxxxx] Squashed commit of the following:
 1 file changed, 1 insertion(+), 1 deletion(-)
$ git log --pretty=format:'%s'
Squashed commit of the following:
Squashed commit of the following:
Squashed commit of the following:
A

Squash on rebase

(Again, I recommend not to mix rebasing and merging on the same branch, so don’t merge changes into the feature branch and then use this method to squash.)

This is just rebasing. Follow the rebase instructions, but use git rebase master -i to start an interactive rebase. Because rebase is rewriting the commits, all sorts of alterations can be made; such as changing the commit message, squashing commits together, even splitting commits (!).

feature1 and feature2 were squashed, and then fast-forwarded onto master.

So rebasing gives you the opportunity to clean up messy feature branches.

Discussion

I’d also argue that merge and rebase represent fundamental differences in what commits mean. The former being commits are history, and the latter being commits are features.

keeperofdakeys

To recap that subtle quote, there are two situations. If your features consist of several commits and their history is important information, non-fast-forward merging preserves that. Conversely, if you prefer a simple, linear history with single features being (roughly) one commit, rebasing or squash on merge gets you there. This only holds for pure git.

When using a repository manager, e.g. Github/Gitlab, if you squash a pull request, that pull request is still there with all its commits, for future reference. So you can have your cake and eat it.

A linear history is also a killer feature when finding bugs or regressions. Using git bisect on a linear history is much easier, and reverting commits is straight-forward. Reverting non-fast-forward merges is painful. You can easily imagine a feature branch where master was merged in several times, and the finally the feature branch was merged into master. Do this with several branches at about the same time, and it becomes a jungle. Then again, after a while you learn how to navigate this jungle.

My main issue with merging/preserving history at all costs is that low quality commits (e.g. “fixed linting”) just add noise. And no matter how careful, mistakes happen. We’ve all been there. (Although some people are worse than others in abusing the CI system as a replacement for local linting - you know who you are.)

I have learned the hard way that Github doesn’t work great with rebases.

In general, rebasing alters commits and therefore history*, this is fine on local branches. But doing that to branches which have been made available to other people (pushed to a remote) is not cool, since they might be using it. To avoid this, git won’t alter history on a remote branch without a force push. To create a pull request on Github, you have to push the branch - so rebasing after the initial push is out, this severely limits the usefulness.

(* Sidenote: while rebasing does “alter history” of the current branch, this doesn’t mean the history of the whole repository changes. Just pointing this out, since it’s a bit of FUD that is sometimes used against rebasing. Additionally, most repository managers allow you to protect the master branch from being altered by forbidding force pushing. This is a good idea.)

Even if you agree to check before force pushing, on Github links in code reviews break. Github does have a rebase and merge option, but if there are conflicts you have to do it locally, and all the caveats still apply.

Some tools, like Gerrit, are completely rebased focussed. A feature, called “change set” in Gerrit, is given an ID, which is appended to every commit message in that feature. So when the feature is rebased, Gerrit can keep track (new commit hashes, but same commit message). A rebase is called a “patch set”, and the GUI can show diffs between the patch sets in a change set. This makes Gerrit one of the best code review tools I have used, and I’ve come to love Gerrit, even if the UI is ugly as sin.

A quirk of git is when resolving conflicts during a rebase, the sides are swapped compared to merging:

Note that a rebase merge works by replaying each commit from the working branch on top of the <upstream> branch. Because of this, when a merge conflict happens, the side reported as ours is the so-far rebased series, starting with <upstream>, and theirs is the working branch. In other words, the sides are swapped.

Git documentation on rebase

This can cause some confusion for people unfamiliar with rebasing.

Recommendations

Pure Git

Merge > Rebase > Squash. Merging preserves feature history, rebasing gives a linear history and is more flexible than squashing.

Github/Gitlab

Squash > Merge > Rebase. Squashing gives a linear history while preserving pull request history. Avoid rebasing.

Gerrit

Gerrit has its own workflow; tracking commit metadata results in a linear history while preserving full feature history. It is excellent, but unusual.

Conclusion

There is no right answer, only bad blog posts and experience.