GitHub Primer for Dummies

A simple guide to using GitHub to host your complex code

Sam Liebman
Towards Data Science

--

Me vs. GitHub — Source: http://devhumor.com/media/git-push

Introduction

GitHub is an essential tool for programmers around the globe, allowing users to host and share code, manage projects, and build software alongside a growing base of almost 30 million developers. GitHub makes collaborating on code much easier by tracking revisions and modifications, allowing for anyone to contribute to a repository. As someone who only recently started programming, there have been countless times where GitHub has been a literal lifesaver, helping me learn new skills, techniques, and libraries. Yet, sometimes a simple task on GitHub such as creating a new repository or pushing new changes is more daunting than training a multi-layer neural network. So, I decided to create a guide to help users (read: myself) fully harness the power of GitHub.

Creating a Repository

A GitHub repository, often referred to as a “repo,” is a virtual location on GitHub where a user can store code, datasets, and related files for a project. Clicking on the new repository button on the homepage will bring you to a page where you can create a repo and add a name and brief description of the project. There is an option to make your repository public or private, but the private feature is only available to paying users/companies. You can also initialize the repository with a README, which provides an overview and description of the project. Adding a README to your repository is highly recommended, as it is often the first thing someone sees when looking at your repository and allows you to craft a story about your project and display what you deem is most important to viewers. A strong README should provide a clear description of the project and its goals, display the results and outcome of the project, and demonstrate how someone else can replicate the process.

Creating a repository on GitHub

Unfortunately, clicking create repository is just the first step in this process (spoiler: it doesn’t actually create your repo). The next step involves using your terminal to initialize your Git and push your first commit. Git is not the same thing as GitHub, although they are related. Git is a revision control system that helps manage source code history and edits, while GitHub is a website that hosts Git repositories. In layman’s terms, Git takes a picture of your project at the time of each commit and stores a reference to that exact state. To initialize the Git for your project, use terminal to enter the directory on your computer where it is stored and enter git init into the command line. Type git add FILENAME to upload your first file. The next step is making your first commit, or revision. Enter git commit -m "your comment here" into the command line. The comment should provide, in short detail, what changes were made so that you can more easily track your revisions. The commit adds changes to the local repository, but does not push the edits to the remote server. The next step is to type git remote add origin https://project_repo_link.git into the command line to create the remote server on GitHub that will host your work. Finally, enter git push -u origin master to push the revisions to the remote server and save your work.

Git data stored as snapshots of the project over time — Source: https://git-scm.com/book/en/v2/Getting-Started-Git-Basics

Adding Files to Repository

The process for adding changes to your GitHub repo is similar to the initialization process. You can choose to add all the files in your project directory in one fell swoop, or add each file individually as edits are made. For a multitude of reasons, discovered through trial and error, I highly recommend pushing each file individually. First, it will keep your repository clean and organized, which is useful when providing links to your GitHub profile/repo on LinkedIn, resumes, or job applications. Second, this will allow you to track changes to each file separately, rather than pushing up a vague commit description. Third, it will prevent you from accidentally pushing files that were not meant to be added to your repo. This can be files containing personal information, such as API keys, that can be harmful if posted to a public domain. It will also prevent you from uploading datasets that exceed 100mb, which is the size limit for free accounts. Once a file is added to the repository, it is extremely difficult to remove, even if it has not yet been pushed or committed. Speaking from experience, I have had to delete a repository on numerous occasions after accidentally uploading a file that I didn’t want, so I stress the importance of carefully selecting which files to upload.

Vim interface

To add a new file, enter your project directory via terminal and type git add FILENAME into the command line. To make a commit, there are two options: you can follow the same process as creating a repo and type git commit -m "commit description”, or use Vim, a unix based text editor to process the changes. Vim is a counterintuitive text editor that only responds to the keyboard (no mouse), but provides multiple keyboard shortcuts that can be reconfigured, and the option to create new, personalized shortcuts. To enter the Vim text editor, type git commit into the command line and press enter. This brings you to the Vim editor; to proceed to writing your commit, type i to enter --INSERT-- mode, and then type in your commit message. Once finished, press esc to exit --INSERT-- mode, and then save and exit Vim by entering :wq to write and quit the text editor. From there, all you need to do is enter git push into the command line to push your changes to GitHub.

Git Ignore

To ignore certain files when pushing to a repo, you can create a .gitignore file that specifies intentionally untracked files to ignore. To create the file, click on the new file button on your repository homepage and name the file .gitignore, or use one of the sample templates provided. There are multiple ways to specify a file or folder to ignore. The first way is to simple write the name of the file in the .gitignore file. For example, if you have a file called AWS-API-KEY-DO-NOT-STEAL.py, you can write the name of that file, with the extension, in the .gitignore file.

Creating a .gitignore file on GitHub

To ignore all filenames with a certain extension, say .txt files, type *.txt into the .gitignore file. Lastly, you can ignore an entire folder by typing folder_name/ in the file. Once you have added all of the files you want to be ignored to the .gitignore file, save it and put it in the root folder of your project. Now, if you try to add and push those files to the repository, they will be ignored and not included in the repository. However, if the files were already added to the repo before being added to the .gitignore file, they will still be visible in the Git repo.

Forks and Branches

If you have used GitHub before, or are familiar with the lingo, you have probably seen the terms Fork, Branch and Merge been tossed around. A fork is essentially a clone or the repository. Forking someone else’s repository will create a new copy under your profile that is completely independent of the original repository. This is useful in the case where the original repository is deleted — your fork will remain, along with the repository and all of its contents. To fork a repository, simply visit the repo page and click the Fork button on the top right of the page. To overwrite a current fork with an updated repository, a user can use the git stash command in the forked directory before forking the revised repo.

A branch provides another way of diverging from the main code line of a repository. Branching a repository adds another level to the repo that remains part of the original repository. Branches are useful for long-term projects or projects with multiple collaborators that have multiple stages of the workflow that are at different stages. For example, if you are building an app, you might have the skateboard and one key feature ready but are still working on two additional features that are not ready to launch. You can create an additional branch, leaving only the finished product in the Master branch, while the two work-in-progress features can remain undeployed in a separate branch. A branch is also useful when working with a team — each member can be working on a different branch, so when they push changes, it does not overwrite files that another team member is working on. This provides an easy way to keep each individual’s work separate until it is ready to be merged and deployed.

Microsoft rolls out automated GitHub support chatbot

Branches can be locally created from your terminal as long as you have a cloned version of the repository saved locally. To see all of the branches in your repo, type git branch into the command line from within your project directory. If no branches have been created, the output should be *master, with the asterisk indicating the branch is currently active. To create a new branch, type git branch <new_branch_name>, and then enter git checkout <new_branch_name> to switch to the new branch so you can work from it. The git checkout command lets the user navigate between different branches of a repository. Committing changes to a branch follows the same process as committing to the Master, just be sure to stay aware of which branch you are working in.

Merging two branches — Source: Atlassian GitHub Docs

To combine multiple branches into one unified history, you can use the git merge <branch_name>command. One type of merge is called a 3-way merge, which involves two diverging branches being merged into one. The 3-way merge gets its name from the number of commits required to generate the merge — the two branch tips and their common ancestor node. Invoking the merge command will combine the current branch with the specified branch by finding a common base commit, and then creating a new merge commit that combines the two commit histories into one. If there is a piece of data that was changed in each branch, git merge will fail and require user intervention.

3-way Merge — Source: Atlassian GitHub Docs

Another type of merge is the fast-forward merge, which is used in an instance where there is a linear path between the target branch and the current branch. In this scenario, the merge shifts the current branch tip forward until it reaches the target branch tip, effectively combining both histories into one. In general, developers prefer to use fast-forward merges for bug fixes or small feature additions, saving the 3-way merge for integration of longer running features.

Fast Forward Merge — Source: Atlassian GitHub Docs

Tips and Tricks

Those are pretty much the basics for being able to successfully use GitHub; however, I would like to share a few more tips I found to be helpful.:

  • Pinned repositories: a free account can pin up to six repositories that will always appear on the top of the user profile. This is a neat way to show the most important projects you have worked on without someone having to sift through the clutter of older commits.
  • Add collaborator: when working with a team of people, it is nice to have each user as a repository collaborator so that they can receive credit for their work. To add a collaborator, click on the settings tab on the repository home page and select collaborator from the left-hand menu. From there, enter in the other user’s GitHub name or email to add them.
  • Remove sensitive data from history after a push: git filter-branch — force — index-filter ‘git rm — cached — ignore-unmatch <path-to-your-file>’ — prune-empty — tag-name-filter cat — — all && git push origin — force — all
  • Show list of all saved stashes: git stash list
  • See author and time of last edit so you can blame them for messing up: git blame <file-name>
  • Shorten GitHub url for sharing: https://git.io/
  • GitHub Pages: You can easily turn your repo into a website hosted by GitHub through a few simple terminal commands https://pages.github.com/

--

--