The world’s leading publication for data science, AI, and ML professionals.

Fall in Love with Your Environment Setup

A Python developer's guide to the optimal setup for collaboration, production readiness and time saving

Image by author
Image by author

Note: The setup instructions/ useful commands for recommended tools can be found in the python-dev-setup repo, which will be referenced throughout the article

Coding classes betrayed me. I was starting my first professional Python project and nothing was set up right on my machine:

  • My Windows Command Prompt was not compatible with the BASH shell scripts intended to make project setup easy.
  • I didn’t have a good Integrated Development Environment (IDE) for working with .py instead of .ipynb files.
  • When I tried running the code, it quickly errored because my Python package versions were different than the versions my team members had installed.

I was used to running code in Jupyter notebooks launched from Anaconda and pip installing whatever packages were needed ad hoc. Environment setup ideal for true development was not a lesson I was taught.

I was scrambling, desperate not to seem incompetent, because a key piece of my education had been ignored.

Photo by JESHOOTS.COM on Unsplash
Photo by JESHOOTS.COM on Unsplash

The truth is, you might know a lot about Python Coding in an educational setting, but if you don’t know anything about environment setup, you are in for a rude awakening when you start your first project in production

What exactly do I mean by "environment setup"?

I mean having all the tools you need to efficiently perform the following 5 tasks:

  1. Run scripts from command line
  2. Version control your code
  3. Edit and debug your code
  4. Manage the Python version you use to run your code
  5. Manage the Python package versions you use to run your code

Why is environment setup important?

Having a good setup is crucial for a number of reasons:

1. Team Collaboration

Photo by Brooke Cagle on Unsplash
Photo by Brooke Cagle on Unsplash

When multiple people are working on the same code base, it is hard to collaborate if Environment setup is not uniform among all team members. Code that works on one team member’s machine may break when another team member tries running it on their machine if they are using different versions of python or python packages.

2. Production Readiness

Photo by Braden Collum on Unsplash
Photo by Braden Collum on Unsplash

The problem extends beyond team collaboration to production. If the production environment is different than the machine you’ve been developing on, code may break once deployed. In programming, it is crucial to control as many external variables as possible so that production behavior is as predictable as possible.

3. Time Saving

Photo by Aron Visuals on Unsplash
Photo by Aron Visuals on Unsplash

When you are first getting started on a new project or testing someone else’s code, you can waste a lot of time preparing your machine to run their code if environment setup is not straightforward.

And time wasted is not limited to the beginning of a project – if setup is not standardized, developers will continue to loose time troubleshooting version incompatibility errors that could have easily been avoided.

I hear you but what do I need to do?

Environment setup can be confusing and complicated. I have struggled through the process many times myself.

The variety of options available for accomplishing the 5 key tasks of environment setup (see the What exactly do I mean by "environment setup"? section above) can be overwhelming.

To save you time, effort and frustration, I have listed the tools I have found most helpful (as well as their key features) below. I have tested these tools across operating systems to ensure optimal setup no matter your OS. Read on for all things needed to create a great environment setup on your machine and reference the python-dev-setup repo for install instructions/tips!

1. Run Scripts from Command Line

Recommend: Terminal for Mac Users, WSL for Windows Users

When developing within an integrated system, you will have to ditch notebooks and get used to running scripts/ commands from the command line. Mac’s Terminal app is ideal for this because Mac has a Linux based OS and most apps are deployed to Linux machines in Production due to their cost effectiveness.

Since Window’s is not a Linux based OS, you can set up a Linux virtual environment with WSL. See the Set Up a WSL Environment **** section of the Python-dev-setup repo for instructions.

Format Your Command Line Interface

No matter the command line interface (CLI) you use, it helps to format your CLI to work well with Git so that you know what branch you are working on and don’t accidentally commit code to the wrong branch.

Image by author
Image by author

See the Format Your Terminal section of the python-dev-Setup repo for instructions.

2. Version Control Your Code

Recommend: Git

Git is the powerful software behind Github and Bitbucket. It is helpful for more than just basic version control, it is also key to:

  • Team collaboration – You can make/test changes to the code on your local machine before pushing them to the team-shared remote code base. Additionally, you can create branches off the master branch so that your development work doesn’t impact the rest of the team or the code in production. Lastly, you can easily merge other team member’s branches into your own branch or the master branch when development is complete.
  • Release Management – The branching system is also key for release management. Depending on project requirements, different enhancements may have different timelines for when they should be deployed to production. By isolating new features in separate branches, you can easily deploy each feature independently.

For more information on Git, see: What is Git and Why Should You Use it?

To get started using Git:

  1. Install Git – Mac, Windows WSL (Ubuntu Linux), Red Hat Linux
  2. Set your git config username and email by following these steps
  3. Configure SSH Auth by following these steps so that you don’t need to enter your username/ password every time you pull/push to your Github/Bitbucket repo
  4. Reference this list of useful git commands
  5. Download the archive_branch.sh script and add it to the root of your project to automate the tedious process of archiving inactive git branches
  6. Download the sync_git_branches.sh script and add it to the root of your project to automate pulling/ updating all remote branches and deleting local branches that no longer connected to a remote ref.

3. Edit and Debug Your Code

Recommend: VS Code

While there are many IDE’s out there, Visual Studio Code (VS Code) is my favorite editor for local code editing (I have also tried Atom, PyCharm, and Spyder). I like VS Code for Python development because of its superior ability to trace function/ class definitions across nested files. In PyCharm, this feature stops working as the trace becomes more complex.

Gif by author using https://www.onlineconverter.com/
Gif by author using https://www.onlineconverter.com/

VS Code also has great source control features:

  1. You can easily see what branch of the repo you are viewing in the bottom left of your screen
Image by author
Image by author
  1. If you are working across multiple repos (i.e. using one repo as a standard tools library that you import as a locally editable package), you can easily see the changes across repos without having to "cd" and "git status" multiple times. For example, if you add a script in both the app and tools repos below, you can view these changes simultaneously in VS Code’s Source Control tab.
Image by author
Image by author

Additionally, VS Code’s Debugging tool is super helpful, as long as you know the following tricks:

  • Set your Python Interpreter to be the venv in the root of your project or your conda venv (NOTE: we will talk more about venv’s later)
Image by author: venv in root of project
Image by author: venv in root of project
Image by author: conda venv
Image by author: conda venv
  • Select a Python file to run
  • Set breakpoints (if desired) by clicking in the left margin
  • Navigate to the Debug tab and choose the "Python File – Debug the currently active Python file" fo the Debug Configuration NOTE: For this to work, your VS Code workspace must be open to the directory the script is mean to be run from. Thus, it is easiest to have your scripts built to be run from the root of your project.

Image by author
Image by author
  • Step through the code and view the variables/ data created along the way.
Image by author
Image by author

Lastly, when it is appropriate to use Jupyter notebook files (i.e. for testing code snippets), VS Code has a Jupyter extension to support this. All you need to do is set the kernel to be the venv in the root of your project or your conda venv and install the ipykernel package when prompted.

Image by author
Image by author
Image by author
Image by author

To get started with VS Code:

  1. Install VS Code and Key Extensions
  2. Use VS Code with WSL (Windows users only)
  3. Configure Remote-SSH Editing (Only if needed for working on a remote Linux machine)

4. Manage the Python Version You Use to Run Your Code

Recommend: Mambaforge

Mambaforge is an open source package manager that allows you to seamlessly create a virtual environment on a specific version of Python. It supports most conda commands, but unlike Anaconda, it is free for commercial use and it is lighter weight than miniconda.

To get started using Mambaforge:

  1. Install Mambaforge – Mac, Windows WSL/ Linux
  2. Reference this list of useful mambaforge commands

.bashprofile vs .bashrc

In Linux, you can use the crontab to schedule jobs. Cron jobs that execute scripts use a non-interactive shell login that loads startup files from the .bashrc file (not the .bash_profile file). In the setup instructions, .bash_profile is updated, but if you will be working with non-interactive shell logins, make sure you also update your .bashrc file accordingly. See this article for more details on .bashrc vs .bash_profile.

5. Manage the Python Package Versions You Use to Run Your Code

Recommend: Poetry

Creating a Python virtual environment is crucial for dependency management. For more information on what a virtual environment is and why you should always use one, check out this article.

While pip installing from an up-to-date requirements.txt file that specifies hard-coded package versions is better than nothing, this method fails to account for your dependencies’ dependencies. For example, you may think you only installed pandas to run your code, but the pandas library is actually dependent on 5 other packages.

Image by author
Image by author

You can easily cause errors when creating a virtual environment from a requirements.txt if, for example:

  • You specify a numpy version that is incompatible with the pandas version you specified
  • You specify a package version that has a numpy version that is incompatible with the numpy version needed for your pandas version

Even if there are no errors when creating the virtual environment from the requirements.txt file, team members may wind up with slightly different sub-dependency versions which may cause problems down the line.

Thinking about sub-dependencies can easily make your head spin. Thankfully, poetry accounts for all of these inter-related dependencies and creates a "poetry.lock" file that you can push to your repo. All your teammates need to do to mirror your setup is run the "poetry install" command.

Is there a case where the lock file may actually cause issues among team members?

Yes, but this case is the exception. For example, if your code is loading other repos as locally editable packages, you won’t want your team members to be locked into the absolute path of your space where you may be working on different git branches of the sub-repos.

If you encounter a situation like this, you can always revert to pip installing dependencies from a requirements.txt with hard-coded versions after controlling your Python version with pyenv.

To get started with poetry:

  1. Install poetry
  2. Use poetry to install project dependencies into your conda environment
  3. Reference these useful poetry commands

If you followed along with the instructions in this article…

Photo by Agnieszka Boeske on Unsplash
Photo by Agnieszka Boeske on Unsplash

Congratulations, you now have an awesome environment setup!

You have all the tools you need to effectively:

  1. Run scripts from command line
  2. Version control your code
  3. Edit and debug your code
  4. Manage the Python version you use to run your code
  5. Manage the Python package versions you use to run your code

I hope this helps, and you find yourself ready to hit the ground running the next time you start a new project!


Related Articles