Jason Roundtree

The Three Trees of Git (Part 1)

Interacting With Git’s Working Directory, Index and HEAD Mechanisms to Track, Stage, Commit and Inspect Files

June 17, 2020

Image of Only Lines by Raul Petri by Unsplash

When you first start using Git it's pretty easy to build up the habit of using commands like git add and git commit to track and save updates to our projects, but at the same time it's very easy to gloss over the mechanisms that Git uses to execute these commands, and to conflate the different options that modify how these and other common commands work. Without gaining a better understanding of how Git is going about it’s business it becomes difficult to build confidence and take advantage of all of the more advanced and nitty-gritty things that you can do to mangage your Git-tracked projects (and boy howdy is there a lot you can do!).

The main mechanisms that form the foundation of managing Git version control are the Working Directory (aka Working Tree, Worktree), the Index (aka Staging Area, Staging Index, Directory Cache, Cache) and the HEAD (aka Commit Tree). These mechanisms, which can also be thought of as collections of project files and the state of those files, are often referred to as “The Three Trees” of Git. In this article I’ll be giving brief overviews of these mechanisms and discussing the difference between some of the more frequently used commands that you can use to track, add, commit and inspect your project files.

Even very inexperienced users of Git will probably be familiar with the word “commit” since its the mechanism used to save snapshots of a project at different points along the project’s evolution. But the other two trees, while important, are easier to overlook since they’re just intermediary steps that you take along the path to making the commits that ultimately represent the different states of our projects that we end up showing to the world. The first of these mechanisms that you typically interact with, whether you know it or not, is the Working Directory, which represents the current state of your project that you’re working on and all of the files that the project is comprised of (i.e. in Git parlance, the files you have “checked out” in your code editor). The files in the Working Directory can be further broken down into additional states referred to as either “untracked” or “tracked”. Tracked files can in turn be further broken down into either “modified” or “unmodified” states. If you initialize Git for a project that already contains some files, or add new files to a project that has already been Git-initialized, those files are considered untracked, which means that Git is sort of aware of their existence but it’s not currently keeping track of updates that you make to these files. In order to take the first step in properly managing a project with Git you must tell Git which files to keep track of, which you can do with the git add command (an exception to this is if you clone an existing Git project then all of the files from the cloned repository are automatically going to start being tracked for you). Not only does git add tell Git to start tracking new files but it also adds these new files, along with any newly modified files (i.e. tracked files that have been updated since a prior commit), to the Index, which is an area where you prepare, or “stage” (i.e. add) and “unstage” (i.e. remove), the files that will be saved to your next commit. Here are some of the different ways that you can stage files to the Index:

I use angle brackets (< >) to indicate that the content inside of them is just a placeholder value where you would insert the actual value.

stage a specific file by name

git add <file_to_add>

stage multiple files by name

git add <file_to_add_1> <file_to_add_2>

stage all files with a specific file extension

git add *.<xyz>

stage a whole folder and all of the sub files/folders

git add <folder_to_add>

stage all new files and file modifications including any files that have been deleted from your repository

git add -A (-A is shorthand for --all)

Staging deleted files seems counterintuitive but it just removes the deleted files from the next commit. Also note that for the first version of Git (version 1.x), `git add .` didn’t stage deleted files, however that changed starting with version 2 of Git. I just mention this because sometimes when Googling for `git add .` you’ll see version 1.x references. The modern ways to only stage new and modified files while ignoring deleted files is the command `git add --no-all`. or `git add --ignore-removal`.

This command stages files similarly to `git add -A` except that it only stages files in the current directory and any sub directories while `git add -A` stages all files throughout the project (e.g. files in parent directories of the working tree if you’re running the command from a sub-directory)

git add .

stage modified files and deleted files but ignore any untracked files

git add -u (-u is shorthand for --update)

If you’re interested in learning about more advanced and detail-oriented ways of staging files you should research the git add --interactive and git add --patch commands. I should also mention that there are various commands that you allow you to unstage files from the Index (e.g. git rm --cached, git reset, git checkout --) but I will save discussion of these and other potentially destructive commands for another post.

One important thing to keep in mind is that if you stage a file to the Index but then make updates to that file before making the next commit, you will need to re-stage the file again to see the updates reflected in the commit. That being said, any unstaged changes that exist when you make a commit will still remain on your local disk so you can always stage those changes later.

Once you’ve staged all of your files and you’re ready to commit a snapshot of your project you can use one of these commonly used options:

Make a commit and add the corresponding commit message all in one line

git commit -m "<this is your commit message>" (-m is shorthand for --message)

Make a commit and open a text editor (typically a mouse-less editor like VIM) for you to add your commit message.

git commit

VIM is a text editor that is very powerful but it can be challenging to initially learn and get comfortable with. Unless you’re familiar with VIM you should probably just use `git commit -m` instead. If you accidentally find yourself in VIM with Git then you can usually press `I` to enter Insert mode, which lets you enter your commit message, then the escape key to exit Insert mode, and finally `:wq` (write and quit) to save your commit and quit VIM

This command works similarly to `git commit` but it also stages tracked files and deleted files that were previously tracked. Keep in mind that it doesn’t stage any untracked files.

git commit -a (-a is shorthand for --all)

This command works similarly to `git commit –a` but by adding the `m` flag it allows you to enter your commit message inline in quotes. Again, keep in mind that it doesn’t stage any untracked files.

git commit -am "<your commit message>"

This command allows you to change the commit message for the prior commit. If you also staged any new changes before running the amend command then those changes will also be included in the amended commit.

git commit --amend -m "<your revised commit message>"

The `--amend` command will delete the most recent commit before adding the amended commit so you generally shouldn’t amend commits that have been pushed to a shared repo since somebody else could be adding work on top of the deleted commit. If you've pushed an amended commit to a personal repo that nobody else has added work to, then after you make the amended commit you can run `git push --force` to force git to disregard the original remote commit. Otherwise, you'll need to first run `git pull` to get your local repo back in sync with the remote repo before you can push the amended commit.

It’s obvious that commits are important because they allow us to travel through time to view, edit and revert our project to any particular commit, but the often-overlooked mechanism that allows you to actually move around and interact with different commits is called HEAD. HEAD is usually not a commit but rather it’s a reference to the currently checked out branch (historically this default branch has been called the “master” branch but it sounds like Microsoft will soon be renaming master to “main” or another neutral word that isn't potentially associated with slaves and racism), which in turn references the most recent commit of that branch. If you move the HEAD to a prior commit (see below for some basic examples on how to move HEAD) then the Working Directory is updated to match the state of the files at the moment that commit was made, and now HEAD points directly to that commit, as opposed to the branch itself and the most recent commit of that branch. When the HEAD no longer references the most recent commit of the current branch it is said to be a “detached HEAD”, which means that it’s not currently associated with any branch and it exists out in what I like to call “No Branch’s Land”. If you go on to make a new commit while in detached HEAD then the HEAD will be moved to that new commit and still won’t be associated with any branch by default. If, after entering detached HEAD mode, you travel back to a branch without creating a reference to the branchless commit you just left then that commit and any of it's ancestoral commits that were also made while in detached HEAD will eventually be removed by Git (making commits while in detached HEAD is often frowned upon). So if you want to keep branchless code around you’ll want to create a reference to that commit by doing something like creating a new branch.

So, as you may'be noticed, HEAD basically follows you around to whatever commit you travel to within your project. Here are some basic commands you can use to move HEAD around and inspect your commits:

Similar to what I mentioned earlier in regards to the Index, there are other commands you can use to interact with the HEAD that I haven’t listed here because those commands are generally used for more potentially destructive purposes, so I will discuss those in a future post.

Move HEAD to the most recent commit in <branchname>

git checkout <branchname>

Move detached HEAD to the commit with <commit_SHA>

git checkout <commit_SHA>

<commit_SHA> is a placeholder for the unique ID that Git assigns to each commit. You can view a history of your commits and the corresponding SHAs by running the `git log` command. Note that by default when you run `git log` it will only show commits on the current branch but you can also run `git log <branch_name>` to see the commits for a given branch or run `git log --branches=*` to view a log that includes all branches.

Move detached HEAD back to the <n>th most recent commit. For instance, `HEAD~1` (equivalent to `HEAD~`) moves to the 2nd most recent commit and `HEAD~2` (equivalent to `HEAD~~`) moves to the 3rd most recent.

git checkout HEAD~<n>