The world’s leading publication for data science, AI, and ML professionals.

GitHub I: Getting Started with GitHub

Now that you have a nice little environment set up on your local machine inside the folder for your project, it's time to create your…

How to Create a GitHub Repository for Python Data Science Projects

Learn how to use GitHub repositories to share and collaborate on data science projects

Photo by Christina Morillo on Pexels
Photo by Christina Morillo on Pexels

I. What is Git?

Git is a version control system – a software tool that allows developers to track and manage changes to the code in a special kind of database. Since it was developed in 2005 by Linus Torvalds, Git has grown to be one of the most widely used version control systems. GitHub is a code repository hosting service that you can use for free to share and collaborate on code.

Some advantages of using GitHub for collaborative projects are:

  1. When the code breaks, developers can return to an earlier version of the code to identify problems in an isolated branch, minimizing disruption to other aspects of the project.
  2. The separate branches allow members of a team to working on different aspects of the same project to make changes without affecting the other teams.
  3. When changes are complete, they can be merged into the main branch. Any conflicting changes are flagged so they can be addressed.

GitHub keeps track of how often you contribute (or upload) files to your GitHub. This is a great way to show potential employers that you are serious about Coding.

II. Creating a Project Repository on GitHub

  1. If you are starting from scratch, take a look at General Setup for Data Science Projects with Python, Virtual Environments I: Installing Pyenv with Homebrew, and Jupyter Notebooks I: Getting Started with Jupyter Notebooks. The proper data science project setup is important for this tutorial.
  2. Make sure that you followed the steps outlined in the first four parts of this tutorial and that you have:
  • Created a project folder
  • Activated the a new virtual environment in the project folder
  1. Navigate to GitHub.com. Create an account if you don’t have one. Head over to the Your repositories tab on the drop down menu in the top right corner near your profile picture
GitHub Profile Screen with Your repositories highlighted
GitHub Profile Screen with Your repositories highlighted
  1. Once there, click on the green button in the upper right corner that reads New.
The New button is in the top right corner.
The New button is in the top right corner.
  1. Enter a unique repository name and short description. You can change it later if you want. The naming conventions for repositories suggest that using lowercase names separated by dashes (e.g. "my-project") is best. Set the repository to public or private.

Note: If you’re following the Data in Day tutorials, you can name this project "metal-project" with the description of "Data in a Day tutorial repository" for now.

Create a new repository with a name, description, and privacy setting.
Create a new repository with a name, description, and privacy setting.
  1. When you get to the bottom of the page DO NOT select any of the items in the "Initialize this repository with" section. Click Create repository.
Finish creating your repository.
Finish creating your repository.
  1. A new screen will appear with some instructions. Below, I will provide the same instructions, with some extra details for beginners.
Screenshot of Instructions on GitHub.com
Screenshot of Instructions on GitHub.com

III. Initialize Your Repository Locally

Now that you have created a repository on Github.com, it is time to sync your project folder with your repository by initializing the repository you have created on GitHub.com to your project folder.

  1. Open up Terminal and enter:
$ cd my-project
  1. Once you are in there, you can initialize the repository by entering the following (make sure to substitute your username):
git remote add origin https://github.com/username/my-project.git
git branch -M main
git push -u origin main

Now, the repository is initialized inside of your local project folder. This is how we will connect our local files to the repository that is online. Once connected, we can use commands to push (or upload) new versions of our project as we complete them. GitHub will keep track of the changes and the different versions.

V. What’s Next?

In the next section, Python Beginner Essentials for Data Science, you will create a Jupyter notebooks documenting your project and I will show you how to make your first commit by pushing this notebook to your GitHub repository.

Thanks for reading!

👩🏻💻 Christine Egan ◇ christine-egan.xyz


Related Articles

Some areas of this page may shift around if you resize the browser window. Be sure to check heading and document order.