
"Data! Data! Data!" he cried impatiently. "I cannot make bricks without clay."
Sherlock Holmes in "The Adventure of the Copper Beeches," Sir Arthur Conan Doyle
The importance of Data cannot be emphasized enough in a data science process. The outcomes of a data analysis task represent the kind of data that has been fed into it. However, sometimes getting the data in itself is also a big pain point. Recently, I did a short course titled Data Journalism and Visualization with Free Tools, and there were some great resources shared through that course. I’ll be sharing some of the valuable tips through a set of articles. In these articles, I’ll try to highlight some ways you can find data on the internet for free and then use it to create something meaningful out of it.
This article is part of a complete series on finding good datasets. Here are all the articles included in the series:
Part 1: Getting Datasets for Data Analysis tasks – Advanced Google Search
Part 2: Useful sites for finding datasets for Data Analysis tasks
Part 3: Creating custom image datasets for Deep Learning projects
Part 4: Import HTML tables into Google Sheets effortlessly
Part 5: Extracting tabular data from PDFs made easy with Camelot.
Part 6: Extracting information from XML files into a Pandas dataframe
Part 7: 5 Real-World datasets for honing your Exploratory Data Analysis skills
Advanced Google Search
Let’s begin with the advanced Google Search, one of the most common ways to access publicly available datasets. By merely typing the name of the required dataset in the search bar, we can access a plethora of resources. However, here is a simple trick that could ease this process to a great extent and help you find files with specific types on the internet.
1. Using Filename and extension of the file to be downloaded
Let’s say we have a task at hand to find healthcare-related data in CSV format. A CSV file indicates a comma-separated values file, allowing data to be saved in a tabular form. To get such files, go to the Google search bar and type the following:
filetype < the extension of the file to be downloaded>: <category of data> data

Google will list the links which closely match the search results. Most of the time, this will be a direct link to the specific files on the sites, which can then be downloaded onto the local system and analyzed later.
2. Using Filename, extension, and the site name
If you want to narrow down your search further, then this option will come in handy. Mentioning the file name will point to a lot of files. However, if you want to find data on a specific website, you can mention it too in the search bar, as follows:
filetype < the extension of the file to be downloaded> : site <website> <category of data> filetype xlsx: who.int health

All the results will now pertain to only WHO, which helps narrow down the search results considerably.
Files compatible with the search command
What are the different kinds of files compatible with the search command? This information can be accessed easily through the settings on the homepage as follows:
- Click
Settings
>Advanced Search
- Scroll Down to the
file type
option and look for the available types. You’ll see there are a lot of options, including pdf and ppt filetypes also.

Conclusion
In this article, we looked at ways to find our desired datasets faster and more efficiently via standard google search. We looked at how merely adding a filename extension and a site name could help filter the result more effectively. These techniques could be handy when we know what kind of data are we looking for.
Originally published at parulpandey.com.