The world’s leading publication for data science, AI, and ML professionals.

Why the CLI is Essential for Data Scientists

CLI examples and use cases to create more efficient Machine Learning workflows

Image from Unsplash
Image from Unsplash

When first learning Data Science, I did not place a heavy emphasis on understanding terms such as Unix/Linux and Bash. Coming from a non-Computer Science background it seemed quite alien and hard to understand, but I quickly came to realize how essential the Command Line Interface (CLI) is in managing your Data Science workloads. To become a strong Data Scientist/MLE or just work with software in general you need to be able to navigate and work with the CLI on your machine with ease. There’s so many use cases within Data Science for using the CLI outside of the comfortable Jupyter Notebook setting. For example, when running Computer Vision workloads people often use a Python CLI Interface library called argparse to pass in parameters to the ML scripts. If you’re migrating to AWS, Azure, or another Cloud provider for ML, the CLI is required to provision and move your resources appropriately. Even in the familiar Data Science hunting grounds of a Jupyter Notebook it’s possible to write cells containing Bash scripts which we will dive into. For this article in general I will be covering how to get started with the CLI, common use cases/tips for Bash with Data Science and programming as a whole.

NOTE: I use a Mac so all of my commands will be Unix/Linux based, for those on a Windows machine I’ve also attached some extra resources that you can follow along.

Table of Contents

  1. Basic Bash Commands
  2. Writing Your First Bash Script
  3. Shell Scripting in Jupyter Notebooks
  4. Argparse
  5. Conclusion

1. Basic Bash Commands

Before we can dive into creating actual Bash/shell scripts, we’ll cover a few of the more basic commands you will use on a daily basis.

pwd (printing work directory)

You want to think of your machine as a destination almost. The CLI is your map to navigate the destination and you are giving it commands/instructions as to where you would like to go. The pwd command helps tell you where you currently are on the map, or if you would like to know where you are in your machine.

Screenshot by Author
Screenshot by Author

As you can see we get the location that we are currently at.

mkdir (make directory)

The word folder and directory should become synonymous in your dictionary as a developer. While setting up a project you may need to create various directories to manage your data or move files to appropriate locations. To create a directory through CLI we use the mkdir command.

After executing this command I now see a directory on my Desktop.

Screenshot by Author
Screenshot by Author

cd (change directory)

Now that you’ve made a directory you may want to work within that directory, to be able to do what you need for your project. To change directory we use the cd command which will now point your machine to the directory you want.

Screenshot by Author
Screenshot by Author

As you can tell, I’m now in the MediumExample directory that we had created.

touch

Now say that we are in our working directory and we want to create a python file that we can start working on. The touch command allows us to create a new file of desired type. If we run touch test.py we then see a python file in our directory.

Screenshot by Author
Screenshot by Author

ls (list files)

If we wanted a list of all the files or directories we’re working with using the ls command allows you to visualize and see the contents of your directory. If we run ls in the directory that we created we will see the python file that we just made.

Screenshot by Author
Screenshot by Author

These are just some of the basic commands to get you started with the CLI. For further basic commands to navigate your CLI follow this link.

2. Writing Your First Bash Script

Now that we’ve covered some basic shell commands, it’s common to tie together several of these to orchestrate more complicated actions within your CLI to manage your workflow. A Bash Script allows for us to create a file that will execute the commands and logic that you are trying to build. Bash has its own syntax that we won’t go too deep into, but if you wanted a quick cheatsheet to get started click the following link. A Bash Script is denoted by .sh and we will create this file in the directory that we have been working with up till this point using the touch command to create an example.sh file. Every bash file starts with the following line.

Now we can add whatever commands we desire and manipulate our files as needed for the workflow. We can first add a command to create a new subdirectory and also return a message saying "Hello World" to our terminal.

Echo is a command that is essentially the equivalent of a print statement in Python. Now to run the bash script run bash example.sh with example being whatever you have named your file. You should see a "Hello World" message and a subdirectory created if you run the ls command after.

Screenshot by Author
Screenshot by Author

This example is pretty trivial but in the future bash scripts are often used to orchestrate workflows with various services/tools such as Docker or cloud providers such as AWS to be able to interact/talk with your local machine and resources.

3. Shell Scripting in Jupyter Notebooks

For all Jupyter Notebook stans there’s also a neat feature to make Bash cells that you can execute.

By adding %%sh at the top of your cell you have made this cell into a code block that can execute traditional bash scripts. This feature is very handy to use if you are not on a Jupyter Lab type environment where you cannot fire up a terminal to work with your resources.

Screenshot by Author
Screenshot by Author

4. Argparse

A common Python library you will see used in ML, Cloud, and Programming in general is Argparse. Argparse allows you to pass in Command Line arguments to your scripts. You will see it used frequently in Computer Vision projects as well as in AWS when you’re providing ML scripts to services in the cloud. To understand argparse we can make a simple calculator application that we can work with from the CLI.

First we import argparse and create a parser. The next step is to add the arguments we can pass in from the CLI.

Now that we’ve added arguments for numbers and an operator we need to be able to process these operators and apply the logic necessary to create a basic calculator application.

Now the arguments have been parsed and passed into our basic calculator logic that we’ve created. To see our argparse powered calculator in action we can go to the CLI.

Screenshot by Author
Screenshot by Author

We pass in python filename.py args in that structure to our CLI with whatever arguments we created and see our application working. Argparse is super easy to use and has great documentation and can help you create very powerful applications as well as stitch together various services in your project through the CLI.

Conclusion

I hope this article served as a good introduction to working with the CLI. Oftentimes in the journey of learning Data Science steps such as mastering the CLI are often skipped for people coming from non-CS or technical backgrounds. I cannot stress how essential the CLI is in managing your resources as well as to interact with other services and providers as you level up your applications from just a Jupyter Notebook.

Feel free to connect with me on LinkedIn or follow me on Medium for more of my writing. Share any thoughts or feedback, thank you for reading!


Related Articles