Getting started

Git is a free and open-source version control system. Most programmers and data scientists interact with git on a daily basis. So what is version control? Version control is a way that we as a programmer track our code changes and a way to collaborate with other programmers. This allows us to look back at all the changes we’ve made over time, which helps us to see when and what we did, as well as convert back to a previous version of the code if needed. You may have heard of Github before and may wonder what the difference is between Git and Github. It’s important to note that Git and Github are not the same. Github is a cloud-based hosting service that hosts and manages Git repositories, which expands upon Git’s basic functionality. Besides Github, there are many other services such as Bitbucket, GitLab, Codebase, Launchpad, etc. In this article, I’ll share some common Git Commands along with some comparisons and their use cases.
Basic Overview of how Git works:
- Create a "repository" (project) with a git hosting tool (like Github)
git init
Make sure you are in a root folder of your project when you type git init
, otherwise, git will track all the files on your computer and slow everything down. If you accidentally type git init
in the wrong place, you can undo it by typing rm -rf .git
.
- Add the remote repository / Copy (or clone) the repository to your local machine
//add the remote repository
▶️ git remote add origin <HTTPS/SSH>
// clone the repository
▶️ git clone <HTTPS/SSH>

- Create a "branch" (optional but recommended)
//create a new branch and switch to it at the same time
▶️ git checkout -b <branch-name>
▶️ git switch -c <branch-name>
//simply switch to an existing branch
▶️ git checkout <branch-name>
▶️ git switch <branch-name>
git switch
is not a new feature but an additional command to the overloaded git checkout
command. git checkout
can be used to switch branches and restore the working tree files, and it can be confusing. To separate the functionalities, the GIT community introduced this git switch
command.
- To keep your feature branch fresh and up to date with the latest changes in the master branch, use rebase
▶️ git pull
▶️ git pull --rebase origin master
We often see conflicts happen in this step. Resolving conflicts during this step can help keep your feature branch history clean and have an easier time with merging at the end.
While git pull
and git rebase
are closely connected, they are not interchangeable. git pull
fetches the latest changes of the current branch from a remote and applies those changes to your local copy of the branch. Generally, this is done by git merge, i.e. the local changes are merged into the remote changes. So git pull is similar to git fetch
+ git merge
at the same time.
git rebase
allows us to apply our changes on top of the remote master branch, which gives us a cleaner history. It is an alternative to merging. Using this command, the local changes you made will be rebased on top of the remote changes, instead of being merged with the remote changes.
- Add a file or make some changes to your file to your local repo then put it into the staging area when you are ready to save the changes
//Add one file
▶️ git add <file-name>
//Add all the new/modified/deleted files to the staging area
▶️ git add -A (note: -A is shorthand for --all)
//Stages files in the current directory and not any subdirectories, whereas git add -A will stage files in subdirectories as well.
▶️ git add .
//Stage all new and modified files. The previous commands will also remove a file from your repository if it no longer exists in the project.
▶️ git add --ignore-removal
//Stage all modified and deleted files
▶️ git add -u (note: -u is shorthand for --update)
- "Commit" (save) the changes
git commit -m "message about the changes you've made"
- "Push" your changes to your branch
git push origin <branch-name>
The git set-upstream allows you to set the default remote branch for your current local branch. You can set an upstream by adding -u
, git push -u origin <branch-name>
. This command will set the <branch-name>
branch as the default branch, which allows you to push the changes without specifying the branch you are pushing into. After setting an upstream, next time when you push some changes to the remote server, you can simply type git push
.
- Open a "pull request" (aka "PR") and request to merge the changes to the main branch
Pull request is a feature that makes it easier for developers to collaborate. Once a developer created a pull request, the rest of the team members can review the code, and then merges the new changes into the master branch.

- "Merge" your branch to the main branch

Useful Git Commands
git status
– show what files are staged, unstaged and untracked.git log
– display the entire commit history. One thing to note here is that the yellow highlighted number is the commit ID. The commit ID is a sha1 hash of all the data in the commit. It’s very rare for two commits to have the same commit ID, but it’s possible.

git diff
– comparing the changes.git diff
can be used to compare commits, branches, files, and more. You can copy the first few characters (>4) from the commit ID, and Git will be able to figure out which commit you are referring to. Using the image above, we can use 9b0867 and 51a7a.
//Show difference between working directory and last commit.
▶️ git diff HEAD
//Show difference between staged changes and last commit
▶️ git diff --cached
//Show difference between 2 commits
//To see what new changes I've made after the first 51a7a commit:
▶️ git diff 51a7a 9b0867
git branch
– list all of the branches in your repo. Remember to check this before you push the code. I’m sure you don’t want to accidentally push your code to the master branch or other branches.git branch -m <new-branch-name>
— rename branch name
//Checkout to the branch you need to rename
▶️ git checkout <old-branch-name>
//Rename branch name locally
▶️ git branch -m <new-branch-name>
//Delete old branch from remote
▶️ git push origin :<old-name> <new-branch-name>
//Reset the upstream (optional) branch for the new branch name
▶️ git push origin -u (optional) <new-name>
git revert
– create a new commit that undoes all of the changes made in , then apply it to the current branch. This has to be done at the "commit level".git reset
— This can be done at either the "commit" or "file" level. At the commit level,git reset
discard commits in a private branch or throw away uncommitted changes. At the file level,git reset
can remove the file from the staging file.
//Reset staging area to match most recent commit, but leave the working directory unchanged.
▶️ git reset
//Move the current branch tip backward to <commit>, reset the staging area to match, but leave the working directory alone.
▶️ git reset <commit>
//Same as previous, but resets both the staging area & working directory to match. Deletes uncommitted changes, and all commits after <commit>.
▶️ git reset --hard <commit>
//Reset staging area and working directory to match most recent commit and overwrites all changes in the working directory.
▶️ git reset --hard
git stash
– takes your uncommitted changes (both staged and unstaged), saves them away for later use, and then reverts them from your working copy. By default, Git won’t stash changes made to untracked or ignored files. This means that git will not stash unstaged files (i.e haven’t rungit add
) and files that have been ignored.
//Stash your work: once you've stashed your work, you're free to make changes, create new commits, switch branches, and perform any other Git operations; then come back and re-apply your stash when you're ready.
▶️ git stash
// re-apply stashed changes
▶️ git stash pop
// list stack-order of stashed file changes
▶️ git stash list
//discard the changes from top of stash stack
▶️ git stash drop
git fetch <remote> <branch>
— fetches a specific from the remote repository. Leave off to fetch all remote refs.git rm <file>
— remove the file. When a file is removed using thisgit rm
command, it doesn’t mean the file is removed from history. The file will keep "living" in the repository history until the file will be completely deleted.
Summary
Now that you understand the basic Git commands, it’s time to put them to use and start building your Data Science portfolio with Github!
Resources:
- https://www.youtube.com/watch?v=RGOj5yH7evk
- https://git-scm.com/
- https://www.atlassian.com/git/tutorials
- https://learngitbranching.js.org/
- https://bluecast.tech/blog/git-stash/
- https://www.atlassian.com/git/tutorials/cherry-pick
If you find this helpful, please follow me and check out my other blogs. Stay tuned for more! ❤
How to Communicate More Effectively as a Data Scientist