The world’s leading publication for data science, AI, and ML professionals.

Getting Datasets for Data Analysis tasks – Advanced Google Search

Utilising Google Search more effectively for finding data

Image by Author
Image by Author

"Data! Data! Data!" he cried impatiently. "I cannot make bricks without clay."

Sherlock Holmes in "The Adventure of the Copper Beeches," Sir Arthur Conan Doyle


The importance of Data cannot be emphasized enough in a data science process. The outcomes of a data analysis task represent the kind of data that has been fed into it. However, sometimes getting the data in itself is also a big pain point. Recently, I did a short course titled Data Journalism and Visualization with Free Tools, and there were some great resources shared through that course. I’ll be sharing some of the valuable tips through a set of articles. In these articles, I’ll try to highlight some ways you can find data on the internet for free and then use it to create something meaningful out of it.


This article is part of a complete series on finding good datasets. Here are all the articles included in the series:

Part 1: Getting Datasets for Data Analysis tasks – Advanced Google Search

Part 2: Useful sites for finding datasets for Data Analysis tasks

Part 3: Creating custom image datasets for Deep Learning projects

Part 4: Import HTML tables into Google Sheets effortlessly

Part 5: Extracting tabular data from PDFs made easy with Camelot.

Part 6: Extracting information from XML files into a Pandas dataframe

Part 7: 5 Real-World datasets for honing your Exploratory Data Analysis skills


Advanced Google Search

Let’s begin with the advanced Google Search, one of the most common ways to access publicly available datasets. By merely typing the name of the required dataset in the search bar, we can access a plethora of resources. However, here is a simple trick that could ease this process to a great extent and help you find files with specific types on the internet.


1. Using Filename and extension of the file to be downloaded

Let’s say we have a task at hand to find healthcare-related data in CSV format. A CSV file indicates a comma-separated values file, allowing data to be saved in a tabular form. To get such files, go to the Google search bar and type the following:

filetype < the extension of the file to be downloaded>: <category of data> data
Image by Author
Image by Author

Google will list the links which closely match the search results. Most of the time, this will be a direct link to the specific files on the sites, which can then be downloaded onto the local system and analyzed later.


2. Using Filename, extension, and the site name

If you want to narrow down your search further, then this option will come in handy. Mentioning the file name will point to a lot of files. However, if you want to find data on a specific website, you can mention it too in the search bar, as follows:

filetype < the extension of the file to be downloaded> : site <website> <category of data> filetype xlsx: who.int health
Image by Author
Image by Author

All the results will now pertain to only WHO, which helps narrow down the search results considerably.


Files compatible with the search command

What are the different kinds of files compatible with the search command? This information can be accessed easily through the settings on the homepage as follows:

  • Click Settings > Advanced Search
  • Scroll Down to the file type option and look for the available types. You’ll see there are a lot of options, including pdf and ppt filetypes also.
Image by Author
Image by Author

Conclusion

In this article, we looked at ways to find our desired datasets faster and more efficiently via standard google search. We looked at how merely adding a filename extension and a site name could help filter the result more effectively. These techniques could be handy when we know what kind of data are we looking for.


Originally published at parulpandey.com.


Related Articles