The world’s leading publication for data science, AI, and ML professionals.

How to Easily Set Up M1 MacBooks for Data Science and Machine Learning

You'll only need 10 minutes. And a M1 Mac.

Photo by Andrea De Santis on Unsplash
Photo by Andrea De Santis on Unsplash

Configuring M1 Macs for Data Science can be a pain in the bottom. You can either go with the simpler option and run everything under Rosseta, or install dependencies manually like a madman and face a never-ending log of error messages.

The first option is fine, but Python won’t run natively, so some performance is lost. The second is, well, tedious and nerve-racking.

But there’s a third option.

Today you’ll learn how to set up Python to run natively through Miniforge on any M1 chip. We’ll also go through some examples to explore if Python really is running natively.

The article is structured as follows:

  • Installing and Configuring Miniforge
  • Performance Testing
  • Final Thoughts

Installing and Configuring Miniforge

I’ve spent so much time configuring the M1 Mac for data science. It never worked without a flaw. Until I found this option. It will take you between 5 and 10 minutes to set up completely, depending on the Internet speed.

To start, you’ll need to install Homebrew. It’s a package manager for Mac, and you can install it by executing the following line from the Terminal:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Keep in mind – if you’re setting up a new M1 Mac, it’s likely you won’t have XCode build tools installed, which are required for Homebrew. The Terminal will inform you if these are missing and will ask if you want to install them.

Once both XCode build tools and Homebrew are installed, you can restart the Terminal and install Miniforge:

brew install miniforge

It’s a couple hundred MB download, so it will take some time to complete. Once done, restart the Terminal again.

That’s it! Miniforge is now installed and you’re ready to create virtual environments and initialize conda. The following Terminal line will create a virtual environment called base_env based on Python 3.8:

conda create - name base_env python=3.8

Finally, initialize the conda for the Z shell (zsh):

conda init zsh

Just for fun, restart the Terminal once again before activating the environment. After the init was called, the base environment will be activated by default. You can change it by executing the following line:

conda activate base_env

You should see something like this:

Image 1 - Activating a conda environment (image by author)
Image 1 – Activating a conda environment (image by author)

As the last step, let’s install a couple of Python libraries through conda:

conda install numpy pandas matplotlib plotly scikit-learn jupyter jupyterlab

That’s all. Let’s make a couple of tests next.


Performance Testing

Open up a Jupyter Lab from the virtual environment if you’re following along. To start, let’s import the common data science suspect – Numpy, Pandas, and Scipy – just to verify everything works correctly:

Image 2 - Library imports and version checking (image by author)
Image 2 – Library imports and version checking (image by author)

Next, let’s make a simple loop without any libraries. Here’s the code:

Image 3 - Pure Python test (image by author)
Image 3 – Pure Python test (image by author)

As you can see, the cell took 7.5 seconds to complete. To verify the native Python version was used, and not the Intel version under Rosetta, we can check the Architecture values for Python3.8 in Activity Monitor:

Image 4 - Activity monitor for pure Python test (image by author)
Image 4 – Activity monitor for pure Python test (image by author)

Let’s do the next test with Numpy. The code on the following image generates a large array of random integers, calculates the logarithm and square root:

Image 5 - Numpy test (image by author)
Image 5 – Numpy test (image by author)

And here’s the activity monitor:

Image 6 - Activity monitor for Numpy test (image by author)
Image 6 – Activity monitor for Numpy test (image by author)

As you can see, Numpy works like a charm. Finally, let’s do the test with Pandas. We’ll do the same operations as with Numpy, so there’s no need for further explanations:

Image 7 - Pandas test (image by author)
Image 7 – Pandas test (image by author)

Let’s take a look at the activity monitor once again:

Image 8 - Activity monitor for Pandas test (image by author)
Image 8 – Activity monitor for Pandas test (image by author)

And there you have it – proof that both Python and its data science libraries can be configured without breaking a sweat. Let’s wrap things up next.


Final Thoughts

To conclude – there’s no need to bang your head against a wall when configuring a new M1 Mac for data science. Sure, the process isn’t the same as with Intel’s (unless you’re using Miniforge), but the process is still simple.

Stay tuned for more M1 tests and detailed comparisons with its bigger brother – 16" Intel i9 from 2019.

Thanks for reading.


Loved the article? Become a Medium member to continue learning without limits. I’ll receive a portion of your membership fee if you use the following link, with no extra cost to you.

Join Medium with my referral link – Dario Radečić


Learn More


Stay Connected


Related Articles