The world’s leading publication for data science, AI, and ML professionals.

Creating custom image datasets for Deep Learning projects.

Some useful browser extensions to download image datasets

Image by Author
Image by Author

This weekend I created a simple fruit classifier for my preschool kid. It is a simple image classification app that predicts the fruit in the image. I presented it as a game for my son to see who predicts the name first – Computer or Human :). Here is a preview of the app.

For this app, I needed to download images of many fruits to train an image classifier. I discovered a few browser extensions in the process, which make it pretty easy to bulk download the images, and I have compiled and presented them in this article.

However, Before you begin using the extensions, there are two crucial things to keep in mind:

Copyright issues

Do not download any image that violates the copyright terms. Some times, you cannot reproduce copyright images without the owner’s permission. Images downloaded in this article are meant only for educational purposes.

Download Settings

Make sure the ‘Ask where to save each file before downloading‘ is not selected in your download settings, else, the downloader will ask your permission for every file that will be downloaded. Not desirable.The clip below demonstrates the process to access the option.


This article is part of a complete series on finding good datasets. Here are all the articles included in the series:

Part 1: Getting Datasets for Data Analysis tasks – Advanced Google Search

Part 2: Useful sites for finding datasets for Data Analysis tasks

Part 3: Creating custom image datasets for Deep Learning projects

Part 4: Import HTML tables into Google Sheets effortlessly

Part 5: Extracting tabular data from PDFs made easy with Camelot.

Part 6: Extracting information from XML files into a Pandas dataframe

Part 7: 5 Real-World datasets for honing your Exploratory Data Analysis skills


Let’s now look at some of the useful tools to download images easily:

1. Fatkun Batch Download Image

Fatkun Batch Download Image is a powerful and handy browser extension to download images from the web. Some of its capabilities are:

  • Possible to filter images based on resolution or link
  • Create Custom rules to download desired images, and
  • Ability to batch rename and bulk download images

🔗 Link to download the extension

Usage

Let’s now download images of apple fruit since we want to create a fruit classification detector. Since it is easier to show than write about the process, I have included a short video to show the download process step by step.


2. Imageye – Image downloader

Imageye is another browser extension that allows you to download all images on a web page. Imageye also gives you the following capabilities:

  • Filtering images based on pixel width and height. You can also filter images based on their URL.
  • Like Fatkun, you can bulk download all the images at once or select manually the ones you want to download.

🔗 Link to download the extension

Usage


3. Download All Images

This Chrome extension downloads all images from a web page and packs them into a zip file. It cannot filter images based on their sizes but is excellent for batch downloading images from sites like Unsplash, which only hosts images. It analyzes the current browser page to identify images and then downloads them into a single zip file. Start image download by clicking the extension icon in the top right corner. It will give you an estimate of how long it will take to finish.

🔗 Link to download the extension

Usage


4. ImageAssistant Batch Image Downloader

ImageAssistant Batch Image Downloader is an image extractor for sniffing, analyzing, and batch downloading images from the web page. It is pretty flexible and offers a lot of ways to customize the image download. For instance, you can either extract pictures on a webpage or prefetch image links or even batch extract URLs of the images. Additionally, a picture filter also offers the option to filter the display of the picture type through the picture expansion type or the resolution size.

🔗 Link to download the extension

Usage


5. The Fastai way

The last method doesn’t use any browser extension. This method I picked up from Zacchary Mueller’s Practical-Deep-Learning-for-Coders-2.0 resource, which he has shared on Github. This code has been given by Francisco Ingham and Jeremy Howard’s work, which in turn is inspired by Adrian Rosebrock.

The method requires you to install the fastai a Deep Learning library as it utilized some of its inherent functions. To understand what is happening under the hood, you would require some knowledge of the library, especially the data block API. Explaining that it is out of the scope of this article, but I would quickly go through the steps required to download the images:

  • Go to Google Images and search for the images you are interested in. Scroll down until you find the images you want to download. Let’s say we are interested in finding images of apples and mangoes.
  • Open the Javascript ‘Console’ in Chrome/Firefox, paste the following code lines, and execute. This will get all the URLs of the images and save them in a CSV file. Repeat the process for every category. Now you will have two CSV files, i.e., apple.csv and mango.csv.
urls=Array.from(document.querySelectorAll('.rg_i')).map(el=> el.hasAttribute('data-src')?el.getAttribute('data-src'):el.getAttribute('data-iurl')); 

window.open('data:text/csv;charset=utf-8,' + escape(urls.join('n')));
  • Next, create a folder for each category of images that you want to download.
folders = ['Apple','Mango']
files = ['apple.csv','mango.csv')
  • Finally, download the images
classes = ['Apple','Mango']
path = Path('fruits')path.mkdir(parents=True, exist_ok=True)
for i, n in enumerate(classes):
   print(n)
   path_f = Path(files[i])
   download_images(path/n, path_f, max_pics=50)
  • Verify if the images are correct
imgs = L()
for n in classes:
   print(n)
   path_n = path/n
   imgs += verify_images(path_n.ls())

Display the images

fruits = DataBlock(blocks=(ImageBlock, CategoryBlock),
       get_items=get_image_files,
       splitter=RandomSplitter(0.2),
       get_y=parent_label,
       item_tfms=RandomResizedCrop(460),
       batch_tfms=[*aug_transforms(size=224,max_warp=0),Normalize.from_stats(*imagenet_stats)])   
dls = fruits.dataloaders(path,  bs=32)
dls.show_batch(max_n=9)
Downloaded images (Image by Author)
Downloaded images (Image by Author)

Here is a video showing the entire process:


Conclusion

In this article, we saw various ways to gather image data for creating deep learning models. You can either go for browser extensions or can also code to get the same results. Whichever method you choose, please be mindful of the restrictions and the copyright issues. Also, do not forget to use these tools to gather data for your next project. In the meantime, here are some other articles in this series that you may find useful


Related Articles