You want to program in Windows!?
When a budding data scientist decides to buy a new laptop for college, they inevitably come to their friends, classmates, or Stack Overflow to ask the question, "What operating system should I use for Data Science?" This question is as contentious among programmers as "Should I use tabs or spaces?" If you ask me, there is not a single best answer to this question. The answer for each person is based on individual preferences, experience, and external limitations.
For some people, the answer may be Windows. For example, are you limited to Windows because of workplace requirements? Do you not have the budget to switch to a new Apple machine? Do you not have the time or knowledge to install a Linux OS on your PC? Are Windows and Microsoft Office suite what you are most comfortable with? Or perhaps you don’t want to have to replace all the Windows software for which you have spent a small fortune purchasing licenses.
Windows can handle data science, especially if you aren’t going much further than installing Python, Anaconda, and R with some common packages. If you are in your first year of college, this setup will work beautifully. But, the more complex your work becomes, the more drawbacks you will encounter with Windows. You may run into issues with speed, compatibility is woefully lacking, debugging is much harder than it should be, and almost every Stack Overflow post gives you instructions that only work in Unix/Linux.
Why should I use Windows Subsystem for Linux?
To get past the limitations of programming in Windows, you could dual-boot Linux and Windows on the same machine and switch between the partitions as your tasks require. But you may not be comfortable with this process, and oh what a pain it is to log in and out every time you switch tasks. So next you may consider spinning up virtual machine environments in Windows. These are often difficult to set up, difficult to reproduce, and full of compatibility issues despite their supposed isolation from the Windows OS.
If you want to vastly improve your development experience on Windows, I suggest you forego the aforementioned options and instead consider Windows Subsystem for Linux (WSL). Not only will you get better performance with less issues, but you will be training yourself to program in an environment that is very close to what you would use on a full Linux or Apple OS. Switching to one of these systems later, or transferring your code to one of these systems should be easy.
However, this option might be too much for you if you are just starting out and are completely new to using the command line. WSL 2 currently does not support GUI apps in Linux. (Though Microsoft plans to support them in the future.) So while you can use Anaconda 100% via the command line, you will miss out on helpful GUIs like Anaconda Navigator. Eventually you will want to graduate to working without the GUI – if not for the speed, then for the incredible feeling of having family and friends believe you are a super hacker. But that transition will come with time, and there is nothing wrong with learning to walk before you run.
So what is exactly is WSL 2?
The original release of WSL was a layer on top of Windows that let you run Linux executables on Windows 10. That was a good start, but still nowhere close to offering full Linux/GNU support. The real magic came with Microsoft’s 2019 release of WSL 2. This new architecture offers its own Linux kernel instead of a compatibility layer. So you get faster performance, full compatibility for system calls, the ability to run a lot more apps (like Docker) on the Linux kernel, and updates will be released without waiting for Microsoft to "translate" the changes for WSL.
Basically, with WSL 2 you will be able to work smoothly in a fully functional Linux environment while on Windows! There are also features that blend the two systems quite elegantly. You can have the Linux environment open Windows programs like your browser or IDE. And you can browse files from both the Windows and WSL filesystems using the normal file explorer or the command line. On some IDEs, you can use the Linux kernel to run and debug your code, even though the IDE software is installed in Windows.
Wow! That sounds AWESOME. How do I try it?
Follow Microsoft’s instructions on how to install WSL 2. You must be on Windows 10, and you must be updated to version 2004, build 19041 or higher. Getting this update can be a little tricky. For some users (like myself), checking for updates and installing the latest available still did not get me to this required version. If this happens, you can manually install the required build by using the Windows Update Assistant as directed.
You will need to choose a Linux distribution to install. This can be changed later, and you can even install multiple distributions if you like. If you are not familiar or have no preference, choose Ubuntu. Be sure to note the username and password you set up for your distribution. Your Linux "home" folder will be located at "wsl$Ubuntuhome{username}" , and your password will be used for commands requiring admin privileges.
Uhh, but how do I use Linux?
Currently, you pretty much have to do everything through the command line terminal, file explorer, and apps that run in the browser (like Jupyter Notebook). Though Microsoft has announced it plans to add GUI app support as well as GPU hardware acceleration.
There are many flavors of terminal applications and terminal shells. I recommend starting with Windows Terminal. This recently released Microsoft application is snappy to use and allows you to customize things like theming, key bindings, and override default behaviors. The best part though, is that you can open tabbed terminals for different environments inside a single window. So you can have separate profiles for easily launching Powershell, Ubuntu, Bash, Anaconda, or a host of other supported environments.

Use the instructions on Microsoft’s GitHub to install Windows Terminal, then click the down caret and Settings to edit your settings.json file. You can read the documentation to get familiar with the settings options, or check out a gist of my settings file for example.
Profiles for your WSL distributions and Powershell are included be default. To customize these profiles and to add more, edit the "profiles" object in settings.json. The "defaults" object applies default settings to all profiles.
"profiles": {
"defaults": {
"acrylicOpacity": 0.85000002384185791,
"background": "#012456",
"closeOnExit": false,
"colorScheme": "Solarized Dark",
"cursorColor": "#FFFFFF",
"cursorShape": "bar",
"fontFace": "Fira Code",
"fontSize": 12,
"historySize": 9001,
"padding": "0, 0, 0, 0",
"snapOnInput": true,
"useAcrylic": true,
"startingDirectory": "%USERPROFILE%"
}
}
Next you will want to add profile settings for specific terminal applications in the "list" array. Each item in the list will need a unique guid identifier surrounded by braces. Generate a new guid by typing new-guid
in Powershell or uuidgen
in Ubuntu. You can then customize the name and icon of the tab, which executable is used to open the command line, and what directory to start in. Be sure not to leave any hanging commas at the end of lists, as this is invalid json syntax and will throw an error.
"profiles": {
"defaults": {
...
},
"list: [
{
"guid": "{2c4de342–38b7–51cf-b940–2309a097f518}",
"name": "Ubuntu",
"source": "Windows.Terminal.Wsl",
"startingDirectory": "//wsl$/Ubuntu/home/nadev"
},
{
"guid": "{61c54bbd-c2c6–5271–96e7–009a87ff44bf}",
"name": "PowerShell",
"tabTitle": "PowerShell",
"commandline": "powershell.exe"
}
]
}
Choose the profile that will open by default in new tabs by setting its guid and braces as "defaultProfile" at the top level of settings.json. My default is Ubuntu.
"defaultProfile": "{2c4de342–38b7–51cf-b940–2309a097f518}"
Now you are all set up and ready to start using Windows Subsystem for Linux!
Check out some common terminal commands to learn how to navigate directories from the command line.
Now, onto the environment setup
I will give you a rundown on the applications I have installed in WSL for Data Science.
A code editor, or IDE
Just like terminal applications, there are a lot of code editors to choose from. Everyone has their own preferences, and whatever your preference is after trying a few is just dandy! Visual Studio Code is very popular and features an extension specifically designed to work with WSL2. I had an easy time getting Jupyter Notebooks, Python code, and the debugger working with WSL2.
If you want to try something a little more fancy for Python coding, check out PyCharm from the JetBrains suite of products. Unlike VS Code, you can only use PyCharm with a paid license or apply for a free student license. PyCharm allows you to use WSL Python as your interpreter, and now supports Git in your WSL2 filesystem. This system may need more work to be fully compatible with WSL2. For example, I could not get the debugger or the Jupyter tool window working.
Git and Git Bash
Git is the standard for keeping track of code changes, versioning, and backing up our files on repositories. Windows and each Linux distribution you install have different filesystems, and you will need to install Git on each one.
For your Windows filesystem, install Git for Windows with the recommended options selected. This install also includes the Git Bash terminal application. I like to use Git Bash instead of Powershell because command syntax is just like bash in Linux, so you are using the same kinds of commands everywhere. Powershell is better suited for users who write automation scripts, and work with Windows servers. Since we will want to use this new terminal application regularly, add it as a new profile in your Windows Terminal settings.
"profiles": {
"defaults": {
...
},
"list: [
...,
{
"guid": "{8d04ce37-c00f-43ac-ba47–992cb1393215}",
"name": "Git Bash",
"tabTitle": "Bash",
// the -i -l commands below are added to load .bashrc
"commandline": ""%PROGRAMFILES%gitusrbinbash.exe" -i -l",
"icon": "%PROGRAMFILES%Gitmingw64sharegitgit-for- windows.ico"
}
]
}
On Linux distributions, git is usually already included, but you can run sudo apt-get install git
in the Ubuntu profile of Windows Terminal to either install it or update to the latest version.
If you do not have an account, go create one now. You might want to go into your GitHub account Settings > Emails and hide your personal email address. In this case, you will be given a generated email address to use when configuring Git. This allows you to commit to GitHub without revealing your personal email address to the public. The next time you try to push to GitHub, you will be asked for your username and password, and these credentials will be stored for future use.
Once you have a GitHub account created. Run the commands below in Ubuntu and Git Bash to configure your name and email address in Git. The config will be stored in a file called .gitconfig located in your home directory.
git config --global user.name "Your Name"
git config --global user.email "[email protected]"
git config --global credential.helper store
I will also mention a little more here about Bash. In Windows and in your distributions, there will be a few files that are executed when Bash opens. These are .bashrc, _.bashprofile, and optionally .profile. If you are not seeing them in File Explorer, you may need to set hidden files as visible. To view them in command line, enter ls ls -a
(the -a option shows all files, including hidden ones). I will leave it to you to research the purpose of these files, but in general the first two will be auto-generated and you will edit personal aliases and environment variables inside of .profile. After changing any of these files, you will need to run source ~/.bashrc
to load the changes.
Conda
If you plan to work in Python and R for Data Science, conda makes managing environments and packages much easier. You can either install Miniconda to get just what you need to start, or Anaconda which is fully preloaded with tons of packages. Again though, the Anaconda Navigator GUI is not going to work if installed in WSL, only in Windows. I installed Miniconda for Python 3 in Ubuntu with the following commands, but some people install it in Windows as well. (If you know why people are installing in both environments instead of just WSL, please let me know in the comments.)
# To install Anaconda for Python 3
wget https://repo.anaconda.com/archive/Anaconda3-2020.02-Linux-x86_64.sh
sh ./Anaconda3-2020.02-Linux-x86_64.sh
# To install Miniconda for Python 3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Windows-x86_64.exe
sh ./Miniconda3-latest-Windows-x86_64.exe
After you complete the installation, close and re-open your Ubuntu terminal and type which python
and which conda
. You should get one of the following sets of paths depending on which package you installed.
# Path to Python executable in Anaconda
/home/{username}/anaconda3/bin/python
/home/{username}/anaconda3/bin/conda
# Path to Python executable in Miniconda
/home/{username}/miniconda3/bin/python
/home/{username}/miniconda3/bin/conda
# Then update all installed packages
conda update --all
If not, open .profile and add one the following export lines as appropriate, enter source ~/.bashrc
, and try again.
# Add Anaconda Python to PATH
export PATH=/home/{username}/anaconda3/bin:$PATH
# Add Miniconda Python to PATH
export PATH=/home/{username}/miniconda3/bin:$PATH
Now you are ready to use conda to create environments. It is best to keep packages out of your base environment and create a new environment for different types of projects. This prevents compatibility issues. See the documentation for more on managing environments.
# To create an environment called 'pandasenv' with the pandas package installed.
conda create --name pandasenv pandas
Jupyter Notebook
To present your work to others, you will likely start with using Jupyter Notebook. This tool lets you run python code inside your conda environments, annotate the code with markdown, and display graphs and other visuals. Anaconda comes with Jupyter Notebook pre-included. On Miniconda, open your Ububtu terminal (your base conda environment will be automatically activated), and type the following to install.
conda install jupyter
Similar to the way we checked we are using the correct python executable, we want to also check jupyter using which jupyter
.
# Path to Jupyter executable in Anaconda
/home/{username}/anaconda3/bin/jupyter
# Path to Jupyter executable in Miniconda
/home/{username}/miniconda3/bin/jupyter
For me, this path was incorrect and it took some sleuthing to find the issue. During installation of Miniconda, its bin directory was added to $PATH inside of .bashrc. This is great, as it tells the OS to search that directory for executables when a command is run. But my local bin directory was also being added in .profile. This was leading to $HOME/.local/bin being checked for executables before $HOME/miniconda3/bin. To remedy this, I moved the Miniconda export line to be executed last by making part of my file at /home/{username}/.profile look like the following.
# set PATH so it includes user's private bin if it exists
if [ -d "$HOME/.local/bin" ] ; then
PATH="$HOME/.local/bin:$PATH"
fi
# set PATH so it includes miniconda's bin
if [ -d "$HOME/miniconda3/bin" ] ; then
PATH="$HOME/miniconda3/bin:$PATH"
fi
When launching Jupyter Notebook using the command jupyter notebook
you will notice that a warning message pops up and either your browser does not launch or it tries to launch a file that does not render Jupyter Notebook. To fix this we need to do a few things. First add a BROWSER variable to you path by adding the following to .profile.
# Path to your browser executable
export BROWSER='/mnt/c/Program Files (x86)/Google/Chrome/Application/chrome.exe'
Next, tell Jupyter Notebooks not to try to launch using the redirect file that is not working in WSL2 by generating a jupyter config file and uncommenting the line listed below.
# Generate a config file at /home/{username}/.jupyter/jupyter_notebook_config.py
jupyter lab --generate-config
# Uncomment the line
c.NotebookApp.use_redirect_file = False
And that’s it! Your Jupyter notebook should now launch appropriately in the browser at http://localhost:8888/.
I hope this setup serves you as well as it has me. If you have any issues or tips for improving this environment, leave a comment below.