The world’s leading publication for data science, AI, and ML professionals.

Git 101 – From Terminologies to Architecture and Workflows

Git behind-the-scenes and how to use Git efficiently

Photo by Roman Synkevych on Unsplash
Photo by Roman Synkevych on Unsplash

So you have learned and know about git add, git commit, and git push, but what does each step achieve – and what is happening in your local and remote repository? What are different merging strategies and branching strategies you can use to harness the power of Git to do much more?

After pouring through numerous articles simply showing Git commands, I feel that it is important to know the architecture of Git to truly appreciate and understand what is happening. This article will touch on the basic terminologies of Git, building up to Git architecture, and finally, different common Git workflows that you can consider adopting for your next coding project!

Table of Contents


What is Git

In short, Git is a tool that helps with version control which enables teams to manage their code in an agile manner – from tracking the history of code changes to handling conflicts in code changes and much more.

Taking it one step further, Git can also be used with Continuous Integration Continuous Deployment (CICD) tools since the codebase is well-maintained and versioned (if used properly). Deployment can be automatically triggered upon code change using tools such as GitHub Actions – but this is not in the scope of this article.

Note: GitHub, GitLab, and Bitbucket are hosting services for Git repositories, not to be confused with Git itself!


Git Terminologies

Before diving into Git architecture, a few terminologies are frequently mentioned and it would be good to have them defined.

  • Repository: A data structure that stores documents and codes, made up of a series of commits over time
  • Branch: An independent line of project development
  • Commit: One version of the project; a snapshot of the project
  • Pull Request: Request to merge a branch into another branch
  • Merge: Combined work of separate branches

To put it together, while working on a project, you will usually branch out to make changes and commit your changes after it is completed. After which, you will raise a pull request for your branch to be merged with the main repository.

The existence of independent branches and pull requests to approve code changes allow teams to scale their work while maintaining a stable repository

There are also terminologies on Git actions that you can perform. Note that there is a concept of local and remote, which will be explained in the later section.

  • Clone: Local copy of a remote repository
  • Fork: Copy remote repository to own account, both repositories are remote
  • Fetch: Retrieves new objects and references from the remote repository
  • Pull: Fetches and merges commit locally
  • Push: Adds new objects and references to the remote repository
  • Stash: Save files in the working tree for later access

Git Architecture

As mentioned in the previous section, there is a concept of local and remote. Remote repositories refer to the location of all commits of the project, usually hosted in a data center or cloud (i.e., GitHub). Local repositories, on the other hand, refer to the location of commits of the project on your local machine (i.e., laptop) and are made up of 3 separate ‘areas’,

  • Working tree: Location of directories and files of a single commit
  • Staging area: Location of files planned for next commit
  • Local repository: Location of all commits of the project

To put it into action, when you perform git checkout, it places directories and files into the working tree. After making changes, performing git add transfers the modified files to the staging area. After which you can perform git commit to add a message to the changes made and changes are committed to the local repository. Finally, performing git push pushes the changes in the local repository to the remote repository.


Branching, Merging, and Merge Conflicts

Branching Strategies

A branch is a set of commits that trace back to the project’s first commit and is typically used to isolate work amongst team members as they can work on separate independent branches.

There are two types of branches. Short-lived branches contain small changes to the project, such as implementing a feature, bug fix, hotfix, etc. Whereas long-lived branches can last for the life of the project, such as the master, develop, or release branch.

Ideally, branches should be deleted after merging to prevent a continuous increase in the number of branches. In case of accidental deletion, branches can be restored – only if the branch exists in someone’s local repository.


Merging Strategies

Merging combines the work of multiple independent branches. It is good practice to use small and frequent merges to avoid merge conflict. There are 4 types of merge – and the merging policy to follow usually depends on the team commit history policies.

Fast-forward (FF) Merge

Fig 1: Fast-Forward Merge Example - Image by author
Fig 1: Fast-Forward Merge Example – Image by author
  • Moves the base branch label to the tip of the merged branch
  • Only possible if no other commits have been made to the base branch – otherwise perform a git pull before performing any git push
  • It is considered the strictest merge policy as it ensures code modifications are always compatible as the latest changes are always fetched first

Merge Commit

Fig 2: Merge Commit Example - Image by author
Fig 2: Merge Commit Example – Image by author
  • Combines commits at the tips of merged branches, resulting in non-linear commit graphs as merge commit has multiple parents
  • This may result in merge conflict when multiple commits are combined and both branches change the same thing in different ways
  • This is the automatic behaviour of Git

Squash Merge

Fig 3: Squash Merge Example - Image by author
Fig 3: Squash Merge Example – Image by author
  • Merges the tip of the merged branch onto the tip of the base branch (can be using fast-forward merge or merge commit), and reduces multiple commits into a single commit
  • This may result in deleted commits as commits B and C will eventually be garbage collected when the branch is deleted
  • To be done with caution! This rewrites commit history and we should not rewrite history that has been shared

Rebase

Fig 4: Rebase Example - Image by author
Fig 4: Rebase Example – Image by author
  • Moves commits in merged branch to a new parent and commits will have different IDs due to different ancestor chain
  • The ‘new’ commits in the merged branch can be fast-forwarded and do not require a merge commit
  • When performing rebasing, this may result in merge conflict when both branches change the same thing in different ways
  • To be done with caution! This rewrites commit history and we should not rewrite history that has been shared

Merge Conflicts

From the earlier section, we noticed that merge commit and rebasing may result in a merge conflict, a situation when two branches change the same thing in different ways.

Resolving a merge conflict involves 3 commits – the tip of the base branch, the tip of the merge branch, and a common ancestor between the base and merge branch. The conflicting parts are surrounded by conflict markers as such,

<<<<<<< HEAD
The version in the base branch
=======
The version in the merge branch
>>>>>>> feature/merge-branch

When attempting a merge, the files with conflicts are modified by Git and placed in the working tree. The merge conflict must be resolved manually before staging, committing, and merging again.


Git Workflows

Besides deciding on the type of merging strategy to implement for your projects, there are also 4 types of Git workflow based on the team preference, size of the team, size of the project, etc.

Basic Workflow / Centralized Workflow

  • Type: Single central repository (no branches)
  • How: Each member clones the repository, works on code locally, commits, and pushes their changes to the central repository
  • Suitable for small projects and small teams since there is no Code Review or pull request involved

Feature Branch Workflow

  • Type: Single central repository (with feature branches)
  • How: Each member creates a branch for every new functionality, and merges the completed changes with the main branch
  • The recommended workflow as there are code review, pull request, and discussion involved

Forking Workflow

  • Type: Multiple remote repositories (by forking)
  • How: Team members fork the repository, making a duplicate copy of the repository. Members do not need write access to the original repository and are free to modify the forked repository in any manner to add features or tailor the repository to another use case or project
  • This workflow allows the maintainer of the repository to accept commits from any developer without giving them access to the official codebase, useful for open-source projects

Gitflow Workflow

  • Type: Single central repository (with short-lived and long-running branches)
  • How: There is a use of short-lived branches for features or hotfixes, and long-running branches for development and releases. Feature branches are created from the develop branch, and merged back to develop branch when it is completed. For release, features in develop branch are merged back to the main branch and only the main branch is used for releases
  • This workflow allows for the safe and continuous release of the project through releases and hotfixes

Git is imperative as long as you are working on any codebase. Even with or without a project team, version control is still important to manage your code! Hope this article shed light on how Git works under the hood with Git architecture, the different branching and merging strategies, and finally the different workflows you can use for your projects.


Related Links


Related Articles