The world’s leading publication for data science, AI, and ML professionals.

Great Python Libraries for accessing Public Data

Accessing Public Data with Python

Photo by Skitterphoto on Pexels
Photo by Skitterphoto on Pexels

There are many great APIs available that allow users to access public Data. This data can be used to generate signals for a financial models, analyze public sentiment on a particular topic, and find trends public behavior. In this post, I will briefly discuss two python libraries that allow you to access public data. Specifically, I’ll discuss Tweepy and Pytrends.

Getting Public Tweets with Tweepy

Tweepy is a Python library that allows you to access public tweets via the Twitter API. Upon creating a twitter developer account and application, you can use Tweepy to pull tweets containing keywords that you specify. For example, you can pull all recent tweets that have the keyword ‘election’, if you are interested in gauging public sentiment on the current state of the election. The Tweepy object returns tweet texts which can be used to create sentiment scores. This can be used for a variety of industry verticals, in addition to politics, including healthcare, finance, retail, entertainment and more. A previous article I wrote, Patient Sentiment for Pharmaceutical Drugs from Twitter, used tweets to analyze the public sentiment of popular pharmaceutical drugs. In another article, Analysis of Tweets about the Joker (2019 film) in Python, I analyzed public sentiment of the 2019 movie Joker. I encourage you to check out these tutorials and perform some data curation and analysis of your own. The steps for applying for a Twitter developer account and creating a Twitter application are outlined [here](https://tweepy.readthedocs.io/en/latest/getting_started.html). Documentation for tweepy can be found here.

Using Pytrends to Pull Trending Topics Data

Pytrends is a python library that allows you to access data that represents how much a keyword or topic is being searched on Google. Similar to Tweepy, you can provide keyword and location information and the Pytrends object will return a time series of indices representing normalized Google search values. This has applications in retail and finance as well, but also can be fun to see what types of keywords are trending in different regions. For a friendly introduction to Pytrends, check out Coronavirus Google Trends in Python and Choosing a Halloween Costume using the Google Trends API in Python. The documentation for Pytrends can be found here.

Conclusion

To summarize, in this post we discussed two great python libraries that can be used to pull public data. The Tweepy library is great for pulling tweets from Twitter and performing sentiment analysis as well as trend analysis. The Pytrends library is great for analyzing Google trending topics for specific time periods and regions around the world. I hope you found this post interesting. Thank you for reading!


Related Articles