
Pandas is an incredibly useful Python library that enables data reading, generating statistics from data, transforming data, and much more. In this post, I will discuss three methods in Pandas that I use very often as a data scientist.
Let’s get started!
READING DATA
The first method we will discuss is the CSV file reading method in Pandas. If we have a ‘.csv’ file, we can use ‘read_csv()’ to read the data into a Pandas data frame. A Pandas data frame is a tabular data structure with rows and columns. For our example, let’s read in the Reddit – Data is Beautiful CSV file, which can be found here, into a Pandas data frame:
import pandas as pd
df = pd.read_csv("r_dataisbeautiful_posts.csv")
To show that we’ve successfully read in the data, we can print the first five rows:
print(df.head())

There are additional methods for other file extensions. For example, we have ‘read_html()’ for ‘.html’ files , ‘read_excel()’ for ‘.xls’ files, ‘read_parquet()’, for ‘.parquet’ files and much more. You can find a full list of IO tools in pandas here.
Now we will move on to selecting data with Pandas.
SELECTING DATA
Suppose we want to select the first 500 rows of data from the Reddit – Data is Beautiful data set we were looking at previously. We can use the ‘.iloc[]’ method to select the first 500 rows, by index. Let’s also print the length of the data frame before and after selection:
print("Length of data frame before row selection: ", len(df))
df = df.iloc[:500]
print("Length of data frame after row selection: ", len(df))

We can also select data based on the values in a column of our choice using ‘.loc[]’. Let’s select data where the value for ‘num_comments’, the number of comments, is greater than or equal to 50 and print the results:
print("Length of data frame before filtering: ", len(df))
df = df.loc[df.num_comments >= 50]
print("Length of data frame after filtering: ", len(df))

We can also print the set of values for ‘num_comments’ to verify that our changes have been made:
print(set(df['num_comments']))

We see that all of the unique values are greater than 50.
WRITING DATA TO A FILE
Similar to the suite of tools available for reading data in Pandas, there are several methods for writing data to a file. Suppose we want to write the filtered data frame we were working with previously to a ‘.csv’. We can use the ‘.to_csv()’ method to write the results to a ‘.csv’ file:
df.to_csv("new_file_r_dataisbeautiful_posts.csv")
We can then read the data into a data frame:
new_df = pd.read_csv("new_file_r_dataisbeautiful_posts.csv")
And print the result:
print("Length of new data frame: ", len(new_df))

Which is what we expect.
CONCLUSIONS
To summarize, in this post we discussed three useful methods in Pandas. We discussed how to use the ‘read_csv()’ method to read ‘.csv’ files into a data frame. Next, we went over how to select and filter data by row and column values. Finally, we walked through how to write the filtered data frame to a new ‘.csv’ file. I hope you found this post useful/interesting. The code from this post is available on GitHub. Thank you for reading!