
Machine learning is among one of the trending topics these days. Python is the number one programing language among many users. However, Python is a general-purpose programming language meaning that it is used in so many different fields. To use Python for machine learning, you need to learn some additional Python Libraries in addition to general Python.
In this post, I’ll discuss the overview of the most fundamental Python libraries that will help you to begin your machine learning journey. I highly recommend you to be familiar with them as much as possible because they are the fundamentals. Without learning them properly, you’ll give up in the middle of the learning journey!
1. NumPy
NumPy is the mother library for many other libraries that are built on top of NumPy. NumPy stands for Numerical Python. The ndarray (N-dimensional array) object is the main data structure in NumPy. In Machine Learning, we often work with vectors (1D arrays) and matrices (2D arrays). NumPy provides easy methods to create those arrays. When working with image data, we deal with NumPy 3D arrays. NumPy also provides a large collection of mathematical functions especially for linear algebra.
Resources
- Official website
- Documentation
- My own resources: I’ve also published a series of articles for NumPy:

Installation
NumPy comes with the Anaconda installer by default. If you’ve installed Python through Anaconda, you do not need to install NumPy again. However, there are two ways to install NumPy.
conda installation
conda install -c anaconda numpy
#OR
conda install -c conda-forge numpy
pip installation
pip install numpy
Import convention
The community accepted import convention for NumPy is:
import numpy as np
2. Pandas
Pandas is the Python data manipulation and analysis library. It was built on top of NumPy meaning that it supports NumPy N-dimensional arrays. Pandas is so popular that the total number of its downloads can represent the entire data science community! Pandas provides methods for data loading, data cleaning, variable encoding, data transforming, and much more. Pandas also provides plotting functions since various plotting libraries have been integrated with it. Series and DataFrame are **** two main data structures in pandas. Pandas Series can be created with 1 dimensional NumPy arrays while Pandas DataFrames can be created with 2 dimensional NumPy arrays since Pandas was built on top of NumPy.
Resources
- Official website
- Documentation
- My own resources: I’ve also published a series of articles for Pandas:

Installation
Pandas comes with the Anaconda installer by default. If you’ve installed Python through Anaconda, you do not need to install Pandas again. However, there are two ways to install Pandas.
conda installation
conda install -c anaconda pandas
#OR
conda install -c conda-forge pandas
pip installation
pip install pandas
Import convention
The community accepted import convention for Pandas is:
import pandas as pd
3. Matplotlib
Matplotlib is a basic plotting library in Python. However, it provides tons of customization options for your plots. It is also the mother library for other advanced plotting libraries. The library has two different application programming interfaces (APIs) – Pyplot interface and Object-oriented interface.
Resources
Installation
Matplotlib comes with the Anaconda installer by default. If you’ve installed Python through Anaconda, you do not need to install Matplotlib again. However, there are two ways to install Matplotlib.
conda installation
conda install -c conda-forge matplotlib
pip installation
pip install matplotlib
Import convention
The community accepted import convention for Matplotlib is:
import matplotlib.pyplot as plt
4. Seaborn
Seaborn is a high-level data visualization library meaning that it automatically does many things for us! It also provides a lot of aesthetics for your plots. You can customize seaborn by using Matplotlib.
Resources
Installation
Seaborn comes with the Anaconda installer by default. If you’ve installed Python through Anaconda, you do not need to install Seaborn again. However, there are two ways to install Seaborn.
conda installation
conda install -c anaconda seaborn
pip installation
pip install seaborn
Import convention
The community accepted import convention for Seaborn is:
import seaborn as sns
5. Scikit-learn
Scikit-learn is a Python machine learning library. Its syntax is so consistent that it is very easy to get familiar with the entire library even for beginners by creating one or two models. Its official documentation provides all the support you need for using this library. It includes algorithms for classification, regression, clustering, dimensionality reduction models. It also provides advanced methods for data preprocessing.
Resources
- Official website
- Documentation
- My own resources: I’ve also published a series of articles for Scikit-learn:

Installation
Scikit-learn comes with the Anaconda installer by default. If you’ve installed Python through Anaconda, you do not need to install Scikit-learn again. However, there are two ways to install Scikit-learn.
conda installation
conda install -c anaconda scikit-learn
#OR
conda install -c conda-forge scikit-learn
pip installation
pip install scikit-learn
Import convention
We do not import the entire library at once. Instead, we import the classes and functions as we need them.
6. Yellowbrick
Yellowbrick is a machine learning visualization library. As the name suggests, it is suitable for machine learning-related visualizations. The syntax is very similar to the Scikit-learn library. With Yellowbrick, you can create advanced plots with just one or two lines of code!
Resources
- Official website and documentation
- My own resources: I’ve also published a series of articles for Yellowbrick:

Installation
Yellowbrick doesn’t come with the Anaconda installer by default. Therefore, you need to install it separately. There are two methods.
conda installation
conda install -c districtdatalabs yellowbrick
pip installation
pip install yellowbrick
Import convention
Like Scikit-learn, we do not import the entire library at once. Instead, we import the classes and functions as we need them.
7. XGBoost
When we consider the performance of machine learning models, XGBoost (Extreme Gradient Boosting) ** is the most preferred machine learning algorithm among data scientists and machine learning engineers. XGBoost (this time, the library) is available for many programming languages including Python. XGBoost’s Scikit-learn wrapper (Scikit-learn compatible API) has recently been released so that we can use XGBoost like Scikit-learn. There is also a non-Scikit-learn compatible API for XGBoost. However, it is difficult to use compared to the XGBoost’s Scikit-learn wrapper. Therefore, I recommend you to first use XGBoost’s** Scikit-learn wrapper and then go to the non-Scikit-learn version (if you want).
Resources
- Official website and documentation
- My own resources: I’ve also published a series of articles for XGBoost:

Installation
XGBoost doesn’t come with the Anaconda installer by default. Therefore, you need to install it separately using the following code.
conda install -c anaconda py-xgboost #Windows
conda install -c conda-forge xgboost #MacOS or Linux
Import convention
The community accepted import convention for XGBoost is:
import xgboost as xgb
8. TensorFlow
TensorFlow is a deep learning library created for deep learning tasks. Deep learning is a subset of machine learning. TensorFlow can also be used for general machine learning. It has two APIs – High-level API and Low-level API. Its main data structure is Tensor.
Resources
- Official website and documentation
- My own resources: I’ve also published a series of articles for TensorFlow:

Installation
TensorFlow doesn’t come with the Anaconda installer by default. Therefore, you need to install it separately. There are two methods.
conda installation
conda install -c conda-forge tensorflow #CPU-only
pip installation
pip install tensorflow #Both CPU and GPU support
Import convention
The community accepted import convention for TensorFlow is:
import tensorflow as tf
In what order should we learn these libraries?
As in the above order! Just begin with NumPy basics and array creation methods. Then perform NumPy array indexing and slicing. After you are familiar with them, go to the Pandas basics – creating Series and DataFrames. Now, you can do parallel learning – both NumPy and Pandas. In this stage, be familiar with NumPy arithmetic and linear algebra operations and also Pandas advanced theory – subsetting, data cleaning, variable encoding, data transforming, etc. Then, move into Matplotlib and Seaborn. Just begin with Matplotlib. My recommendation is that you use Pandas plotting functions for basic visualizations. If you need more customization, go for Matplotlib. For advanced visualizations, use seaborn. Now, it is time to learn Scikit-learn. Now, this will be easy as you are familiar with NumPy and Pandas. When you’re familiar with Scikit-learn, you can use Yellowbrick for machine learning visualizations. Then go for TensorFlow and the deep learning part. By doing in this way, you’ll never give up in the middle of the learning journey!
Have these methods worked for you? Let me know in the comment section.
Thanks for reading!
Until next time, happy learning for everyone!
Special credit goes to Chetan Kolte on Unsplash, **** who provides me with a nice cover image for this post.
Rukshan Pramoditha 2021–08–04