Pandas Basics Cheat Sheet (2021), Python for Data Science

The absolute basics for beginners learning Pandas in 2021

Zita
6 min readApr 6, 2021
Photo from Unsplash by Johannes Groll

The Pandas library is one of the most powerful libraries in Python. It is built on NumPy and provides easy-to-use data structures and data analysis tools for the Python programming language.

Check out the sections below to learn the various functions and tools Pandas offers.

Sections:
1. Pandas Data Structures
2. Dropping
3. Sort & Rank
4. Retrieving Series/DataFrame Information
5. DataFrame Summary
6. Selection
7. Applying Functions
8. Data Alignment
9. In/Out

Pandas Data Structures

There are two main types of data structures that the Pandas library is centered around. The first is a one-dimensional array called a Series, and the second is a two-dimensional table called a Data Frame.

  • Series — One dimensional labeled array
>>> s = pd.Series([3, -5, 7, 4], index = ['a','b','c','d'])
a 3
b -5
c 7
d 4
  • Data Frame — A two dimensional labeled data structure
>>> data = {'Country':['Belgium','India','Brazil'], 'Capital':['Brussels','New Delhi','Brasilia'], 'Population':['111907','1303021','208476']}>>> df = pd.DataFrame(data, columns = ['Country','Capital','Population'])   Country    Capital Population
0 Belgium Brussels 111907
1 India New Delhi 1303021
2 Brazil Brasilia 208476

Dropping

In this section, you’ll learn how to remove specific values from a Series, and how to remove columns or rows from a Data Frame.

s and df in the code below are used as examples of a Series and Data Frame throughout this section.

>>> sa    6
b -5
c 7
d 4
>>> df Country Capital Population
0 Belgium Brussels 111907
1 India New Delhi 1303021
2 Brazil Brasilia 208476
  • Drop values from rows (axis = 0)
>>> s.drop(['a','c']) b   -5
d 4
  • Drop values from columns (axis = 1)
>>> df.drop('Country', axis = 1)    Capital Population
0 Brussels 111907
1 New Delhi 1303021
2 Brasilia 208476

Sort & Rank

In this section, you’ll learn how to sort Data Frames by an index, or column, along with learning how to rank column values.

df in the code below is used as an example Data Frame throughout this section.

>>> df   Country    Capital  Population
0 Belgium Brussels 111907
1 India New Delhi 1303021
2 Brazil Brasilia 208476
  • Sort by labels along an axis
>>> df.sort_index()   Country    Capital Population
0 Belgium Brussels 111907
1 India New Delhi 1303021
2 Brazil Brasilia 208476
  • Sort by values along an axis
>>> df.sort_values(by = 'Country')   Country    Capital Population
0 Belgium Brussels 111907
2 Brazil Brasilia 208476
1 India New Delhi 1303021
  • Assign ranks to entries
>>> df.rank()   Country  Capital  Population
0 1.0 2.0 1.0
1 3.0 3.0 2.0
2 2.0 1.0 3.0

Retrieving Series/DataFrame Information

In this section, you’ll learn how to retrieve info from a Data Frame that includes the dimensions, column names column types, and index range.

df in the code below is used as an example Data Frame throughout this section.

>>> df   Country    Capital  Population
0 Belgium Brussels 111907
1 India New Delhi 1303021
2 Brazil Brasilia 208476
  • (rows, columns)
>>> df.shape
(3, 3)
  • Describe index
>>> df.index
RangeIndex(start=0, stop=3, step=1)
  • Describe DataFrame columns
>>> df.columns
Index(['Country', 'Capital', 'Population'], dtype='object')
  • Info on DataFrame
>>> df.info() <class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
Country 3 non-null object
Capital 3 non-null object
Population 3 non-null object
dtypes: object(3)
memory usage: 152.0+ bytes
  • Number of non-NA values
>>> df.count()Country       3
Capital 3
Population 3

DataFrame Summary

In this section, you’ll learn how to retrieve summary statistics of a Data Frame which include the sum of each column, min/max values of each column, mean values of each column, and others.

df in the code below is used as an example of a Data Frame throughout this section.

>>> df   Even  Odd
0 2 1
1 4 3
2 6 5
  • Sum of values
>>> df.sum()
Even 12
Odd 9
  • Cumulative sum of values
>>> df.cumsum()   Even  Odd
0 2 1
1 6 4
2 12 9
  • Minimum value
>>> df.min()
Even 2
Odd 1
  • Maximum value
>>> df.max()
Even 6
Odd 5
  • Summary statistics
>>> df.describe()       Even  Odd
count 3.0 3.0
mean 4.0 3.0
std 2.0 2.0
min 2.0 1.0
25% 3.0 2.0
50% 4.0 3.0
75% 5.0 4.0
max 6.0 5.0
  • Mean of values
>>> df.mean()
Even 4.0
Odd 3.0
  • Median of values
>>> df.median()
Even 4.0
Odd 3.0

Selection

In this section, you’ll learn how to retrieve specific values from a Series and Data Frame.

s and df in the code below are used as examples of a Series and Data Frame throughout this section.

>>> sa    6
b -5
c 7
d 4
>>> df Country Capital Population
0 Belgium Brussels 111907
1 India New Delhi 1303021
2 Brazil Brasilia 208476
  • Get one element
>>> s['b']
-5
  • Get subset of a DataFrame
>>> df[1:]  Country    Capital Population
1 India New Delhi 1303021
2 Brazil Brasilia 208476
  • Select single value by row & column
>>> df.iloc[0,0]
'Belgium'
  • Select single value by row and column labels
>>> df.loc[0,'Country']
'Belgium'
  • Select single row of subset rows
>>> df.ix[2]Country         Brazil
Capital Brasilia
Population 208476
  • Select a single column of subset of columns
>>> df.ix[:,'Capital']0     Brussels
1 New Delhi
2 Brasilia
  • Select rows and columns
>>> df.ix[1,'Capital']
'New Delhi'
  • Use filter to adjust DataFrame
>>> df[df['Population'] > 120000]  Country    Capital  Population
1 India New Delhi 1303021
2 Brazil Brasilia 208476
  • Set index a of Series s to 6
>>> s['a'] = 6 a    6
b -5
c 7
d 4

Applying Functions

In this section, you’ll learn how to apply a function to all values of a Data Frame or a specific column.

df in the code below is used as an example of a Data Frame throughout this section.

>>> df   Even  Odd
0 2 1
1 4 3
2 6 5
  • Apply function
>>> df.apply(lambda x: x*2)   Even  Odd
0 4 2
1 8 6
2 12 10

Data Alignment

In this section, you’ll learn how to add, subtract, and divide two series that have different indexes from one another.

s and s3in the code below are used as examples of Series throughout this section.

>>> sa    6
b -5
c 7
d 4
>>> s3a 7
c -2
d 3
  • Internal Data Alignment
>>> s + s3a    13.0
b NaN
c 5.0
d 7.0
#NA values are introduced in the indices that don't overlap
  • Arithmetic Operations with Fill Methods
>>> s.add(s3, fill_value = 0)a    13.0
b -5.0
c 5.0
d 7.0
>>> s.sub(s3, fill_value = 2)a -1.0
b -7.0
c 9.0
d 1.0
>>> s.div(s3, fill_value = 4)a 0.857143
b -1.250000
c -3.500000
d 1.333333

In/Out

In this section, you’ll learn how to read a CSV file, Excel file, and SQL Query into Python using Pandas. You will also learn how to export a Data Frame from Pandas into a CSV file, Excel file, and SQL Query.

  • Read CSV file
>>> pd.read_csv('file.csv')
  • Write to CSV file
>>> df.to_csv('myDataFrame.csv')
  • Read Excel file
>>> pd.read_excel('file.xlsx')
  • Write to Excel file
>>> pd.to_excel('dir/'myDataFrame.xlsx')
  • Read multiple sheets from the same file
>>> xlsx = pd.ExcelFile('file.xls')>>> df = pd.read_excel(xlsx, Sheet1')
  • Read SQL Query
>>> from sqlalchemy import create_engine
>>> engine = create_engine('sqlite:///:memory:')
>>> pd.read_sql('SELECT * FROM my_table;', engine)
>>> pd.read_sql_table('my_table', engine)
  • Write to SQL Query
>>> pd.to_sql('myDF', engine)

Python is the top dog when it comes to data science for now and in the foreseeable future. Knowledge of Pandas, one of its most powerful libraries is often a requirement for Data Scientists today.

Use this cheat sheet as a guide in the beginning and come back to it when needed, and you’ll be well on your way to mastering the Pandas library.

Join my email list with 1k+ people to get The Complete Python for Data Science Cheat Sheet Booklet for Free.

--

--

Zita

Teaching Creators How to Automate Their Content Creation Using AI