Web Scraping/Harvesting

A Step-by-Step Guide to Download Manga Comic Using Python

Technologies can be useful to solve our day to day problem.

Narhari Motivaras
Towards Data Science
4 min readApr 30, 2020

--

Those who aren’t familiar with manga and anime, let me clear things for them. In the Japanese culture manga are comics containing stories with the essence of cartoon characters in it. Anime is made from manga after it is published and popular enough to animate and make capital out of it. In Japan, people of all ages read manga.

One day, I started watching One-punch anime because one of my friend who is a Stan of anime recommended me to binge-watch it during this pandemic. I don’t watch anime often. But, I only watch recommended ones and which has high popularity on IMDB. So, It has a total of two seasons in it and completed both of them. But, It has more season to come later and due to this pandemic whole Japanese production of anime has stopped. So, I told my friend that I have complete two seasons and can’t hold my thirst while waiting for the third season, he told me there is a website where you can read the manga. It’s called mangapanda.com

I saw there were too many ads that pop up to generate revenue and it was too distracting. I started searching for one punch at the manga website and started reading for a while. I got fed up while reading because every time you see the main screen and ads which annoys you.

Source

As a computer geek, I started analyzing the website using a web developer tool available by default in any web browser by pressing CTRL+SHIFT+I. I found that the main item-container of the website, which contains the manga image had an HTTPS link with .jpg format.

Source:(Press CTRL+SHIFT+I on website and select the image to see this menu)

Clicking on that link opens image in browser with no ads. But opening toggle tool and then clicking link every-time sounds tedious, right?. So, a thought came to my mind, Is there a way to extract these images with other images in continuation? Turns out Web scraping can be helpful in these circumstances. Well, I have heard of it but never had time to use it.

Time has arrived to use our brains and coding skill to solve the problem.

For this purpose I am using google-colab . If Since you aren’t familiar with it check this out!

STEP 1:

Importing necessaries libraries to set up our self.

Source:Carbon(beautifying) +Colab(Code)

Step 2:

We will store all the HTTPS image link in img[] .

Source:Carbon(beautifying) +Colab(Code)

There are total two for loops to run through each section of manga.

Source:Screenshot from my laptop

The first for loop is for part number, and the second for loop is for section number. In the above example, it is part number 135 and section 5 of One-punch comic.

In the above code I have used range (1,2) to extract the first part of comic and in second loop range (1,200) as this comic doesn’t have pages over 200 in any part.

Get the link of page using request library and if the URL exists parse it using Beautiful soup library and store this parse contain in page_content.This page_content all the page information in the form of HTML tags.

Our Image link is in one of the script tag in page_content. So, we extract all script tags and append in a list called row_data. We find that Index number 2 has our image so we extract image using Regex and append in img[]. This was a difficult part on my side as I was not familiar with it.

Source:Carbon(beautifying) +Colab(Code)

Now we have all images in img[] so all we left is to download it using files library Which we imported in starting(from google.colab import files)

Source:Screenshot from my laptop

(Note: If you are using another environment for this purpose method to download images would be different. you can download using wget).

Now,you can take all this images and make a PDF and start reading manga without any ads. Hooray!!

If you want to download your favorite manga then go to manga-panda website and get URL and paste in URL variable.

Link to colab notebook

I am sure it will be useful to you and you will take back something from this article.Till then Happy Coding!!

--

--