Rewriting history - Git history that is

 Remove unwanted commits from your Git history

I recently co-created a PhD course on reproducibility for quantitative data science with Dr Melanie Ganz. We also wanted to make the content Open, not just for students but also for other researchers and teachers. One thing we do is get students familiar with Git and GitHub. Since that was our 1st attempt at this course, I let people push and test stuff in the main repository. Because I also have help from guest teachers, I wanted to keep those while deleting students' commits (next time I'll use a branch). Time to clean.

Note: All the documentation exists online, but there are always extra steps one needs to do which are not part of the command document, the idea behind this blog is to chain all the commands you need

clean-up illustration from https://images.app.goo.gl/S72QU9M2Er7hcYyY7

Use the doc and tools

A quick search led me to find (i) the git documentation 'Rewriting History' (ii) a Git history editor, straightforward to edit the history, but I wanted to delete stuff, (iii) various discussions on how to use rebase.

Edit your own commits with git -rebase

This is a command where the magic happens.

step 1: git branch backup

well, let's make sure not to screw up everything and make a copy, JIC (we'll delete that later once we know the job is done - sure there is the git reflog command but I don't like it)

step 2: check how far back you want to go (which commit)

git log --pretty=format:"%h - %an, %ar : %s"

it can also be useful to look for specific users and then their commit

git shortlog -sn --all

git log --author="name"

step 3: git rebase -i <commit ID>

using the -i argument is useful to do things interactively with your editor. Ah yes, you do need to tell git what editor to use (git config --global core.editor "'some command or link to exe, depends your OS' -w")

step 4: delete 

Because one now uses an editor, simply use the drop command in front of the commit I wanted to remove (instead of pick). As you will see, there are many commands you can use.

step 5: anonymize (change author names)

use the edit command in the edit when calling rebase

now interactively in the command window, do the editing

git commit --amend --author="John Doe <john@doe.org>" --no-edit

git rebase --continue

pro tip: better to do a few rebase than a big one, avoiding conflicts

Delete PR: squash and merge


You cannot fully delete stuff, that's kind of the point of version control. In my case, we merged code from students but I/they wanted to remove this (while before I removed the commit, the history of files and authors are there via pull requests) -- the solution: create branches from given commits and merge them

git branch beforePR commitID1
git switch beforePR
git rebase -i xxxxx commitID2

use the squash command back to the commit before we start merging stuff, that is I branch to the last PR (commitID1) and squash all PR to commitID2. 

now I make another branch starting right after commitID1, and I merge it with the previous branch - having 'deleted' the PR, although details can be seen in the squashed commit.

git switch main
git branch afterPR commit ID3
git switch beforePR
git merge afterPR

then rename branches as needed

Branch and cherry-pick


There is one option that allows 'removing' commits by cherry-picking only the one you want. The idea is you branch from somewhere you want, and then commit only what you want, knowing there are no dependencies from those omitted commits (overwise it can't work).

git branch commitID1 newbranch 

restart from commitID1

git cherry-pick commit2^..commit_latest

select stuff omitting a bunch between commit1 and commit2 for instance but everything from commit2 to latest

git push origin newbranch 

Done


git log

checking my clean commit history

git diff whaterver_branchname backup

checking against my backup that I'm happy with the actual repo

git branch -D backup

et voila :-)

pic from https://images.app.goo.gl/QsqcsMZRBkbYCgAB7





Comments

Popular Posts