The world’s leading publication for data science, AI, and ML professionals.

Meet Datascienv – A Fail-Proof Method for Setting up Data Science Environments

How to configure a data science environment with a single Pip install

Photo by Anthony Riera on Unsplash
Photo by Anthony Riera on Unsplash

Managing Data Science workflows isn’t fun. There are usually dozens of virtual environments involved, and the whole thing is a mess. Seriously, how many times did the execution fail just because your Python environment was missing a dependency?

Picture this – you’ve created a new virtual environment for your project. Library-wise, you need almost anything imaginable, from data preprocessing, to data visualization, machine learning, and deployment libraries. Installing them by hand takes time. There’s a better way.

Today, I’ll introduce you to datascienv – an open-source Python package that simplifies data science environment setup. We’ll start fresh with a new environment and configure everything from there.

Don’t feel like reading? Watch my video instead:


What is Datascienv?

Put simply, datascienv is a Python package that offers data science environment setup with a single pip install. Here’s the list of libraries it installs for you, according to the official Pypi page:

Image 1 - Python packages installed by datascienv
Image 1 – Python packages installed by datascienv

It’s a lot – from your everyday data analysis, preprocessing, and visualization to Machine Learning packages. After the installation, I’ve found that datascienv installed even more packages than listed, but more on that later.

Let’s see how to get started with datascienv next.

How to install Datascienv?

We’ll start with a clean slate. I assume you have Python installed through Anaconda. If that’s not the case, simply translate the Terminal commands to match your virtual environment manager.

1. Create a new virtual environment

I’ve created a virtual environment named datascienv based on Python 3.9:

conda create --name datascienv python=3.9 -y
Image 2 - Creating a new virtual enviornment (image by author)
Image 2 – Creating a new virtual enviornment (image by author)

You should see a message like the one below if everything went according to the plan:

Image 3 - Creating a new virtual environment (2) (image by author)
Image 3 – Creating a new virtual environment (2) (image by author)

2. Activate a virtual environment

Once the virtual environment is configured, you can activate it:

conda activate datascienv

You should see (datascienv) displayed in front of the path:

Image 4 - Activating a virtual environment (image by author)
Image 4 – Activating a virtual environment (image by author)

And that’s it – you’re ready to install datascienv.

3. Install datascienv

If you’re on Windows, make sure to have Microsoft Visual C++ 14.0+ installed first. I don’t know if there are any prerequisites for Mac or Linux. I haven’t managed to install datascienv on my M1 MacBook Pro, as it still doesn’t support all data science libraries. Not without workarounds, at least.

Use the following command to install datascienv inside your virtual environment:

pip install -U datascienv
Image 5 - Installing datascienv (image by author)
Image 5 – Installing datascienv (image by author)

The installation will take a while, as it has to pull a ton of Python packages. Once done, you should see something like this printed out:

Image 6 - Installing datascienv (2) (image by author)
Image 6 – Installing datascienv (2) (image by author)

You now have datascienv installed. But what does that mean exactly? What’s included with the library? Let’s explore that next.

What’s included with Datascienv?

A lot, as it turns out. More than listed on the official Pypi page. There are countless ways to check, and probably the simplest one is to export the Anaconda virtual environment to a .yaml file:

conda env export > UsersDarioDesktopenv.yaml
Image 7 - Exporting an Anaconda environment (image by author)
Image 7 – Exporting an Anaconda environment (image by author)

Open the env.yaml file with any text editor. Brace yourself, as there are close to 300 rows inside the file:

Image 8 - env.yaml file (image by author)
Image 8 – env.yaml file (image by author)

That’s a lot for a single pip command. Still, let’s verify we have everything needed by opening up a Python shell and importing the usual suspects:

Image 9 - Checking installed Python packages (image by author)
Image 9 – Checking installed Python packages (image by author)

Yes – even Plotly, Flask, and TensorFlow were installed. These aren’t listed on the official Pypi page. I use them for most data science projects, so it’s nice to verify they’re included.

The only question remains – who should and who shouldn’t use datascienv?


The verdict

You’ve seen what datascienv can do. It’s nothing groundbreaking, but it does what it advertises and saves you time. I recommend it to anyone exploring data science and machine learning, as it installs everything needed. There’s no need to kill the Python kernel, install a missed dependency, and launch Python again.

I wouldn’t recommend it if you plan to deploy your data science solution and want to keep the requirements file neat and tight. It’s likely you don’t need every package datascienv offers.

What are your thoughts on Datascienv? Have you tried it, and if not, do you plan to? Let me know in the comment section below.


Loved the article? Become a Medium member to continue learning without limits. I’ll receive a portion of your membership fee if you use the following link, with no extra cost to you.

Join Medium with my referral link – Dario Radečić


Stay connected


Related Articles