Things They Don’t (but Should) Cover in Intro Data Science with Python MOOCs
If you’re reading this you probably already taken some online course, (sometimes known as a MOOC) with a title that sounds something like “Intro to Data Science with Python.” There are a few things in these courses that aren’t being covered in these sorts of courses that I think should be.
In most of these courses you’re either told to download Anaconda or showed how to do it. They then go through the basics of using a Jupyter Notebook, then show you how to use some combination of pandas/numpy/sklearn/matplotlib. While that’s all well and good (these are essential skills for data scientists) there are a few things missing from the intro courses that would make your data science projects cleaner, maturer, more professional.
Managing Your Environments and Installing Packages
If you want to work on multiple projects, you have to be concerned about environments. Let’s say you have two projects, both require pandas and sklearn, and only one requires keras. The first project requires pandas (version 0.24.2) and the second one requires pandas (version 0.23.1). The projects uses keras has to use Python 3.6, but the other project has to use Python 3.5. An environment (short for “virtual environment”) will help you manage these dependencies (dependencies are packages that other packages depend upon).
First let’s create an environment. We will do this in the anaconda navigator by…