Demystifying Git references aka refs

The real workhorse enabling core Git operations

Deepak Tunuguntla
Towards Data Science
6 min readMay 17, 2021

Image by author, made using diagrams.

As a data scientist or a data engineer, working with Git is always a breeze when he/she is aware of all its core elements that are utilised under-the-hood by a Git operation like git-clone or git-push. Although, core Git operations have received sufficient hands-on attention from fellow enthusiasts, special interest is laid, here, on a core entity called Git references that enable several such core Git operations.

To understand Git references and their importance, let us consider the below complex-structured remote repository

Illustrates a remote repository structure with five branches including the main branch. Image by author, made using diagrams.

and clone the above illustrated remote repository as below

$ git clone https://github.com/<git_username>/my-repo.git

As expected, the cloning operation results in a local repository with a default local main branch linked to the remote main branch, a remote connection origin and all the remote branches, see below

Illustrates the cloning operation of a remote repository with five branches including the main branch. Image by author, made using diagrams.

However, on closer observation in your local machine, the local directory my_repo only happens to contain a copy of the files that are present in your remote main branch. Although the contents of your remote branches like branch_1 and branch_2 are cloned and present in your local repository their file contents do not yet seem to be visible on your local machine. Thus, leading to the question

How can you access the contents of the remote branches?

Well well well, do not worry, this is where Git references come into play. To enable access to the contents of other remote branches besides the remote main branch, git-clone does more than just creating a local copy of the remote main branch and a remote connection origin. According to the Atlassian’s git-clone documentation,

As a convenience, cloning automatically creates a remote connection called “origin” pointing back to the original repository. This makes it very easy to interact with a central repository. This automatic connection is established by creating Git refs to the remote branch heads under refs/remotes/origin and by initializing remote.origin.url and remote.origin.fetch configuration variables.

In other words, besides simply creating a remote connection origin, see the illustration above, the git-clone operation also creates Git references aka refs to the remote branches branch_1, branch_2, etc. These refs are located in your local my_repo directory under the folder refs/remotes/origin. Simplifying further, by default, the git-clone operation creates one local branch main whose file contents are in your local directory my_repo. For the rest of the remote branches, it gets its contents, stores in your local .git directory and creates Git references instead of creating multiple local branches with its own branch-specific local my_repo directories. All the Git references to these remote (origin) branches can be listed as below

my_repo $ cd .git/refs/remotes/origin
origin $ ls
HEAD branch_1 branch_2 branch_3 branch_4

Note the .git in front of the refs directory. Generally, all the Git metadata is stored under .git directory in your local my_repo directory, except the .gitignore file, which is user-specific. To list the hidden files use ls -a.

Another way to list the above remote Git references (refs) is by utilising the the git branch command together with the flag -a or --remote. We use the --remote flag, which lists all the refs the above git-clone operation created for all the remote branches present in your remote repository, see below

my_repo $ git branch --remote  remotes/origin/HEAD -> origin/main
remotes/origin/branch_1
remotes/origin/branch_2
remotes/origin/branch_3
remotes/origin/branch_4
remotes/origin/main

The Git reference remotes/origin/HEAD, by default, points to the remote (origin) main branch. Note that origin/main is just another way to refer the Git reference remotes/origin/main. Alternatively, the -a flag lists, both, the local and the remote Git references, see below

my_repo $ git branch -a* main
remotes/origin/HEAD -> origin/main
remotes/origin/branch_1
remotes/origin/branch_2
remotes/origin/branch_3
remotes/origin/branch_4
remotes/origin/main

Note that the local Git reference main is, by default, created during the git-clone operation and is available under refs/heads folder in contrast to the remote Git references located in refs/remotes/origin folder.

Given all the remote refs are set, we rephrase our earlier question as

How do we use the remote Git references to access the contents of the remote branches like branch_1 on our local machine?

Lets say we would like to access the content of the remote branch_1 on our local machine, i.e. in our local repository, all we need to do is utilise the set remote Git reference for branch_1 in refs/remotes/origin and perform the below operation

my_repo $ git checkout branch_1

The above command does a lot of things under-the-hood. Firstly, it looks for the Git reference branch_1 in the refs/remotes/origin folder. If not found it throws an error. Secondly, it creates a local branch in your local repository called branch_1, which is based upon the remote branch_1. The local branch_1 is also linked to the remote branch_1, i.e. for performing the pull or push operation. Thirdly, it creates a local Git reference branch_1 under the folder refs/heads. And, finally, it replaces the file contents of your local directory my_repo with the contents of the files in the remote branch_1, i.e. the files in your local my_repo directory are no longer a copy of the files present in the remote main branch. As a result, we now have two local branches, i.e. main and branch_1, linked to corresponding remote branches and still only one local my_repo directory, see below

my_repo $ git branch -a* branch_1
main
remotes/origin/HEAD -> origin/main
remotes/origin/branch_1
remotes/origin/branch_2
remotes/origin/branch_3
remotes/origin/branch_4
remotes/origin/main

The asterisk symbol ‘*’ in front of the local Git reference branch_1 denotes the current active branch in your local repository. Similarly, we can also checkout (access) the contents of other remote branches like branch_2 and branch_3 in our local repository. The cool thing about doing it this way is that you create a local branch based on the remote branch only when required. Hence, avoiding the unnecessary clutter of local branches in your local repository.

To switch back to the local main branch from the local branch_1, all you need to do is

my_repo $ git checkout main

This will place the asterisk symbol ‘*’ next to the local main branch. To make sure that the contents of the files in your local main branch are the same as that of the files in the remote main branch, perform a git-pull.

And, that brings us to the end. Hope the above insights have strengthened 🏋️‍♀️ your understanding of Git references and its benefits. Enjoy Gitting! 🙏

For the curious ones. To see how a local Git reference like main or branch_1 is linked to its corresponding remote branch, we list out the local repository’s configuration variables

my_repo $ git config --list... skipped other variables ...

remote.origin.url=https://github.com/<git_username>/my_repo.git
remote.origin.fetch=+refs/heads/*:refs/remotes/origin/*
branch.main.remote=origin
branch.main.merge=refs/heads/main
branch.branch_1.remote=origin
branch.branch_1.merge=refs/heads/branch_1

As you can see, in bold, the local configuration variables branch.main.remote and branch.branch_1.remote are initialised with the remote connection origin. Implying that the local branches main and branch_1 are linked to the remote branches main and branch_1. Note that whenever a new remote branch from is checked out on your local machine corresponding configuration variables will be added to the local repository’s configuration file.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Published in Towards Data Science

Your home for data science and AI. The world’s leading publication for data science, data analytics, data engineering, machine learning, and artificial intelligence professionals.

Written by Deepak Tunuguntla

Applied Mathematician | Machine Learning | NLP

Responses (1)

What are your thoughts?

Unfortunately your article is some what incorrect since it conflates clone and checkout. git clone makes a full clone of the remote repository - this includes all branches. It also performs local checkout of the active branch of the repo which is…...

--