The Web Scraping Template

Web scraping is a process of extracting data from the web automatically. Web scrapping involves getting data from the web pages and store them as you want.
In this post, I will be sharing my template that I use to save my time by not writing the same things again and again. I use the Python programming language for web scrapping.
Disclaimer: This template doesn’t work on all the websites because all websites are not the same. However, it works most of the time. This post is not a tutorial post.
TL;DR
If you want to check my template, check out here.
Load required libraries
The first step is to load all the required libraries. Let’s import all the necessary libraries. I am using the BeautifulSoup library for web scrapping. Apart from that, I am using Pandas and request libraries. Before importing, make sure you have all libraries installed in your system.
!pip3 install bs4, pandas, request
Parsing
Now all required libraries are loaded. Requesting a website is very important to scrap data from the web. Once the request is made successfully, the entire data/content of the website is available. Then we can parse URL to bs4, which makes content available in plain text.
Simply add URL that you want to scrap and run cell.
Extracting the required elements
Now we can use soup.find()and soup.find_all() methods to search required tags from the web page. Usually, my target is a table where data is stored. First, I always search for headings. Usually, headings can be found in tag. So let’s find them and store them in Python list.
Now our headings are stored in a list named headings. Now, let’s find the table body which can be found in
Now we have headings and content. It’s time to store them in DataFrame. Here, I created a data frame called data
.
Finally, we have data ready for future use. I like to perform some data analysis before saving data in CSV.
Data Analysis
It is essential to analyze data. Using pandas, we can perform data analysis using different methods like head(), describe(), and info(). Apart from that, you can check column names.
Once you have analyzed data, you might want to clean them. This step is optional, as it doesn’t always require when you are creating data. However, sometimes it is needed.
Data Cleaning
This template has some data cleaning processes, such as removing un-wanted symbols from data and renaming column names. You can even add more things if you want to.
Now our data is ready to save.
Save Data into CSV
Let’s save data in the CSV file. You need to change the file name.
That’s it for this post. Scrapping data from the web is not just limited to this template. There are plenty of things to do, but it all depends on the website. This template has a few things that you have to repeat, which saves your time. You can find a complete jupyter Notebook here.
Thanks for reading. Happy web scraping!
If you like my work and want to support me, I’d greatly appreciate it if you follow me on my social media channels:
- The best way to support me is by following me on Medium.
- Subscribe to my new YouTube channel.
- Sign up on my email list.
In case you have missed my Kaggle step by step guide
Getting started with Titanic Kaggle | Part 1
Getting started with Titanic Kaggle | Part 2
In case you have missed my Python series.
- Day 0: Introduction to Challenge
- Day 1: Python Basics – 1
- Day 2: Python Basics – 2
- Day 3: Python Basics – 3
- Day 4: Python Basics – 4
- Day 5: Python Basics – 5
I hope you will like my other articles.
Related Articles
-
Step-by-step code guide to building a Convolutional Neural Network
6 min read -
A beginner’s guide to forecast reconciliation
13 min read -
Here’s how to use Autoencoders to detect signals with anomalies in a few lines of…
12 min read -
Feature engineering, structuring unstructured data, and lead scoring
7 min read -
An illustrated guide on essential machine learning concepts
6 min read -
Derivation and practical examples of this powerful concept
7 min read -
Columns on TDS are carefully curated collections of posts on a particular idea or category…
4 min read -
With demos, our new solution, and a video
10 min read -
An illustrated guide to everything you need to know about Logistic Regression
8 min read