Manipulating the data with Pandas using Python.

Let us calculate my part-time earnings with Pandas DataFrame using Python.

Tanu N Prabhu
Towards Data Science

--

Credits: GeoSpatial Training Services

Before getting started let me introduce you Pandas, Pandas is a python library which provided high-performance, easy to use data structures such as series, Data Frame and Panel for data analysis tools for Python programming language. In order to use the pandas library and its data structures all, you have to do it to install it and import it. See the documentation of the Pandas library for more better understanding and installing guidance.

Steps for calculating the part-time earnings.

  1. Importing the required (pandas) libraries.
  2. Storing values such as Date, Time Worked, and Money Earned in a DataFrame.
  3. Adding more rows to the existing DataFrame (updating the rows of the DataFrame).
  4. Calculating the sum of Money earned and the total duration worked.
  5. Plotting the bar graph with Date vs Money Earned.
  6. Including a search option in order to search for the respective dates worked.
  7. Finally adding the payroll option.

Let’s get started !!!

1. Importing the required (pandas) libraries.

In this tutorial we will only use the pandas library to perform the below calculations, the pandas library itself will provide us the option of calculating the sum and plotting a bar graph, you need not import matplotlib to plot a graph, pandas library will provide you the option of plotting a bar graph. This is would be very important to know to plot a bar graph using pandas DataFrame. You don’t have to install any pandas library if you are using Google Colab notebook then just import it. Otherwise, you have to manually install it in your command prompt by saying pip install pandas. The reason for aliasing (pd) is that I don’t have to write to pandas every single time when I want to use any methods by aliasing I can write pd.method name instead.

# Importing pandas library.
import pandas as pd

2. Storing values such as Date, Time Worked, and Money Earned in a DataFrame.

# Creating a data frame df.
df = pd.DataFrame({'Date':['11/05/19', '12/05/19', '19/05/19', '25/05/19', '26/05/19', '1/06/19'],'Time Worked': [3, 3, 4, 3, 3, 4],'Money Earned': [33.94, 33.94, 46, 33.94, 33.94, 46]})
# Head displays only the top 5 rows from the data frame.
df.head()
Storing the values in a data frame.

In this step I categorize all the data as Data, Time Worked and Money Earned into 3 columns. The Date column shows the date of the work in dd/mm/yy format and it will be stored as a String, the Time Worked shows the total amount of work done in a day (hours) stored as an integer, and the Money Earned showed the total money earned in a day (CAD dollar) it would be stored as an integer. Here for one hour, the minimum wage is 11.51 CAD. All these are just raw data, which is later stored in pandas DataFrame and allocated to a variable df. To do this just use “pd.DataFrame” and pass in all the data, by doing this the pandas will automatically convert the raw data into a DataFrame. I am using the head () because the data frame contains 10 rows of data so if I print them then they probably look big and cover most of the page, so instead the head() displays the top 5 data from the data frame.

3. Adding more rows to the existing DataFrame (updating the rows of the DataFrame)

In this step we will learn how to append or add more rows to the existing data frame, this is an important step because often many times you have to update your data frame by adding more rows, in this example I first create a new data frame called df2, and then call the append ( ) by passing the df2 as a parameter. You have to append the new data frame to the existing data frame like df.append(df2) (existing. append(new data frame)), now inside the append function we have some other parameters as ignore_index = True, this prevents the data frame from appending new index, so in this example all the index are in a continuous fashion (incrementing), and the next parameter is a sort = False this is because we don’t want to sort the data according to the index, otherwise our data would be completely a mixture, you can play with these parameters by changing the values as False and True respectively and note the difference. Last, the new appended data frame is stored into a new variable df.

# Adding more rows
df2 = pd.DataFrame({‘Date’: [‘10/06/19’, ‘12/06/19’, ‘14/06/19’],
‘Time Worked’: [3, 4, 3],
‘Money Earned’: [33.94, 46, 33.94]})
df2
Storing the values in a new data frame df2
# Appending the rows of the old data frame to the new data frame.df = df.append(df2, ignore_index=True, sort = False)
df.head()
Appending the rows of the old data frame to the new data frame.

4. Calculating the sum of Money earned and the total duration worked

This step is pretty much straightforward because we are just getting the sum of the “Money Earned” and “Time Worked” columns to do this all you have to do is just use the sum () which will return the sum of all the data from the columns. I’m just using the round () for the Total_earnings just to get the precise values. Make sure you pass the correct column name inside df, because if the column names mismatch then it can put you in trouble. Finally, I print the results in a more readable way.

Total_earnings = df[‘Money Earned’].sum()
Total_time = df[‘Time Worked’].sum()
print(“You have earned total of ====>” ,round(Total_earnings),“CAD”)
print(“ — — — — — — — — — — — — — — — — — — — — — — — — — — — ”)
print(“You have worked for a total of ====>”, Total_time, “hours”)
Printing the result of total earnings and total time worked.

5. Plotting the bar graph with Total duration vs Money Earned

As I mentioned earlier, to plot a graph, you don’t have to import matplot library, pandas have a plot () which will help you plot a graph up to a certain extent. I have used the plot () and passed the “Date” and the “Money Earned” as x and y values (because you need x and y values to plot a graph ;) and I want a bar graph so I used the kind as bar, you can also use line, scatter to the keyword kind. We then get a beautiful bar graph with all the values plotted according to our expectations.

# Plotting a bar graph using pandas library.
df.plot(x =’Date’, y=’Money Earned’, kind = ‘bar’)
Plotting the graph of Date vs Money Earned.

Shown above is not the best way to plot a graph, but I have shown you guys that pandas library can help you plot a graph, sometimes it’s handy to use this method where there would be fewer data and fewer calculations all you have to do is plot a graph with x and y values.

6. Including a search option in order to search for the respective dates worked.

This is an additional step, just to give it a feel like a database I threw in a few features, such as a search option. In real time projects, this is a handy feature where you often have to search for the data, there you cannot search it manually, so with the help of this below code snippet you just get the job done. In this example I have used the str.contains ( ) for the search operation and passed the data as a parameter, now the data is a variable which contains the user entered data to be searched. When you type in the date the str.contains() searches for the entered date and them displays the date and the corresponding values from the data frame. This would be helpful when you want to search for a particular data and the get the time and money earned, you can just type in the date and get the results quick rather than manually finding the required date.

# Including a search option.
date = input(“Enter the date you want to search ===> “)
df[df[‘Date’].str.contains(date)]
The result of the search option.

7. Finally adding the payroll option

This is more like an optional (bonus) step because this generates payroll for the entered data, this is not an industry level payroll generator, rather a simple payroll generator with a different function and print statements. The logic is very simple to understand all I have done is taken the name, hours and rate as user inputs and then multiplied the rate and hours and stored them in the total_money and enclosed it in a function.

# Function payroll contains calculation of total money.
def payroll():
name = input(“Enter the name of the employee: ==> “)
hours = int(input(“Enter the hours worked by the employee ==>“))
rate = float(input(“Enter the pay rate for one hour ==> “))
total_money = hours * rate print(“The total money earned by “, name, “for working “, hours, “hours”, “is ===> “, round(total_money), “CAD”)
The result of the payroll function.

Hence this is how you can calculate the part-time earnings using a pandas data frame with a python. I know this is easy to understand because I have given as much as an explanation of the code I could and the rest you can practice on your own. This is a good start if you know little about pandas library, these are some basic methods that should be at your fingertips. I’m going to stop here and let you practice the code. If you have any doubts, let me know in the comments section, or drop a mail to tanunprabhu95@gmail.com just let me know your issue and I will definitely solve your problem. See you have a good day !!!!

--

--

MSc in Computer Science | He/Him | Tech and pet enthusiast | Don’t believe me, read a couple of my writings | Writing since June 19, 2019 |