Pandas Series: A Lightweight Intro

Pandas Series: A Lightweight Intro

Daksh Gupta
Towards Data Science
6 min readOct 6, 2018

--

Pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

The Data Structures provided by Pandas are of two distinct types

  1. Pandas DataFrame &
  2. Pandas Series

We’ll look at Pandas Series in this post.

Note: I’ll highly recommend to read my earlier post on Pandas DataFrame before going ahead with this post for better understanding of Pandas Series

What is a Series?

Technically, Pandas Series is a one-dimensional labeled array capable of holding any data type.

In layman terms, Pandas Series is nothing but a column in an excel sheet. As depicted in the picture below, columns with Name, Age and Designation representing a Series

Pandas Series

So, in terms of Pandas DataStructure, A Series represents a single column in memory, which is either independent or belongs to a Pandas DataFrame.

Note: A Series can have its own independent existence without being part of a DataFrame.

How to Create a Series?

A Pandas Series can be created out of a Python list or NumPy array. It has to be remembered that unlike Python lists, a Series will always contain data of the same type. This makes NumPy array a better candidate for creating a pandas series

Here is how we can use both of the above to create a Pandas Series

series_list = pd.Series([1,2,3,4,5,6])
series_np = pd.Series(np.array([10,20,30,40,50,60]))

and here is how they will look like

Result of → series_list = pd.Series([1,2,3,4,5,6])
Result of → series_np = pd.Series(np.array([10,20,30,40,50,60]))

Just as while creating the Pandas DataFrame, the Series also generates by default row index numbers which is a sequence of incremental numbers starting from ‘0’

As you might have guessed that it’s possible to have our own row index values while creating a Series. We just need to pass index parameters which take a list of the same type or a NumPy array.

The example below uses a NumPy generated Sequence

series_index = pd.Series(
np.array([10,20,30,40,50,60]),
index=np.arange(0,12,2)
)
Result of → series_index = pd.Series(np.array([10,20,30,40,50,60]), index=np.arange(0,12,2) )

The example below usage strings as row index

series_index = pd.Series(
np.array([10,20,30,40,50,60]),
index=['a', 'b', 'c', 'd', 'e', 'f' ]
)
Result of → series_index = pd.Series(np.array([10,20,30,40,50,60]), index=[‘a’, ‘b’, ‘c’, ‘d’, ‘e’, ‘f’ ] )

We can use the row index of the series as

series_index.index

Which returns a NumPy array irrespective of whether we’ve passed a list or a NumPy array while creating the Series

Creating Pandas Series from python Dictionary

As we’ve seen during creation of Pandas DataFrame, it was extremely easy to create a DataFrame out of python dictionaries as keys map to Column names while values correspond to list of column values.

So how does it map while creating the Pandas Series?

If we create a Series from a python dictionary, the key becomes the row index while the value becomes the value at that row index.

As an example, let’s see what happens to a very simple dictionary with a single key value pair

t_dict = {'a' : 1, 'b': 2, 'c':3}# Creating a Series out of above dict
series_dict = pd.Series(t_dict)

And here is how the output looks like

Result of → Code block Above

Things don’t change if the values in the dictionary contain a list of items. The list items remain part of a single row index as in the case below

t_dict = {'a' : [1,2,3], 'b': [4,5], 'c':6, 'd': "Hello World"}# Creating a Series out of above dict
series_dict = pd.Series(t_dict)
Result of → series_dict = pd.Series(t_dict)

Getting a Series out of a Pandas DataFrame

Though Pandas Series is extremely useful in itself for doing data analysis and provides many useful helper functions, most of the time, however, the analytic requirements will force us to use DataFrame and Series together.

Let’s create a Pandas DataFrame first as we have created in Here

my_dict = { 
'name' : ["a", "b", "c", "d", "e"],
'age' : [10,20, 30, 40, 50],
'designation': ["CEO", "VP", "SVP", "AM", "DEV"]
}
df = pd.DataFrame( my_dict,
index = [
"First -> ",
"Second -> ",
"Third -> ",
"Fourth -> ",
"Fifth -> "])

And here is how the resultant DataFrame shall look like

Result of → DataFrame creation from dictionary

DataFrame provides two ways of accessing the column i.e by using dictionary syntax df['column_name'] or df.column_name . Each time we use these representation to get a column, we get a Pandas Series.

In the example above, we can get series (i.e a single column) just by accessing the column

series_name = df.name
series_age = df.age
series_designation = df.designation
series_name
Series_age
series_designation

Getting the Series by iterating through columns of a DataFrame

What if we don’t know the name of the columns?

Pandas DataFrame is iterable and we can iterate through individual columns to get the series

series_col = []
for col_name in df.columns:
series_col.append(df[col_name])

Creating DataFrame using the Series (Standalone or combination)

A Pandas DataFrame is nothing but a collection of one of more Series (1+). We can generate the DataFrame by using a Single Series or by combining multiple Series

For example, let’s generate a DataFrame from combining series_name and series_age

df_from_series = pd.DataFrame([series_name, series_age])

and to your surprise, the resultant DataFrame shall look

df_from_series

Yes, the row indexes of Series become the column while the columns become the row index value. You can consider this similar to transpose of a matrix. This is true even if we provide a single Series to create a DataFrame

df_from_series_single = pd.DataFrame([series_name])
df_from_series_single

However, this doesn’t happen when we remove list / array notation from Series. For example

df_from_series_single = pd.DataFrame(series_name)

Will result in preservation of Series column name and row indexes

df_from_series_single

NOTE: Unfortunately, this is limited only one series as DataFrame API doesn’t take more than one argument for the Series

DataFrame creation Behaviour with Python Dict

Same behaviour will be observed when we pass python dictionaries as arrays to create a DataFrame. Let’s look at the t_dict = {'a': 1, ‘b’: 2, ‘c’:3} which we have created earlier

ds = pd.DataFrame([t_dict])

and the resultant DataFrame shall look like

ds

Where keys are represented as Columns which otherwise would have been represented as row index if we had created a series

We can even combine multiple t_dict to create a DataFrame

ds = pd.DataFrame([t_dict, t_dict ], index=[1,2])
ds

Series Helper Functions

Just like pandas DataFrame, Series also has multiple sets of helper functions for data analysis.

Please note that all Column helper functions of Pandas DataFrame will work with Pandas Series. Some of the examples are

#Getting the mean of a Series
series_age.mean()
# Getting the size of the Series
series_age.size
# Getting all unique items in a series
series_designation.unique()
# Getting a python list out of a Series
series_name.tolist()

Iterating over Series

Just like many other data structures in python, it’s possible to iterate over series using a simple for loop as

for value in series_name:
print(value)

We can also iterate over series row indexed as

for row_index in series_name.keys():
print(row_index)

That’s all about basic usage of Pandas Series.

Thanks for reading…!!!

Daksh

--

--