
Introduction
In my previous post 5 Interesting Python Libraries That You Might Have Missed, I talked about 5 underestimated Python Libraries that I had rarely heard about, one of which is Wikipedia API. As I did some more reading into this library, I figured out that this is way cooler than I had expected. In this article, I will share with you some examples using this simple, convenient and useful library.
What to expect
As Wikipedia API can retrieve almost all contents from a Wikipedia page, we do not need to rely heavily on different Data scraping techniques in this case. Fetching data is much simpler with a line of code using Wikipedia API.
_However, one important note to remember is that the original purpose of this library is not for advanced use. Therefore, as the document suggests, if you intend to do any scraping projects or automated requests, consider alternatives such as Pywikipediabot or MediaWiki API, which has other superior features._
Alright, first, install this cool library and let’s see what this package can bring us.
!pip install wikipedia
import wikipedia
How it works
1. Getting summary of a specific key words
If you wish to get a particular number of summary sentences for anything, just pass that as an argument to the summary()
function. For example, I’m trying to figure out what Covid-19 is in 4 sentences.
Output:

2. Searching article titles
Search()
function helps us search all titles that contain a specific keywords. For instance, if I want to get all post titles relating to "KFC", I will pass "KFC" inside the search function.
As a result, a list of all articles in Wikipedia that include information about KFC is retrieved as you can see below.
Output:

You can also state how many titles you want to appear as a result.
3. Searching keywords
In case you have something to look up in your mind but you cannot remember exactly what it is, you can consider suggest()
method. The function returns related words.
Suppose I want to find the exact name of the German Chancellor, but do not remember how her name is spelled. I can write what I remember, which is "Angela Markel" and let suggest()
do the rest for me.
As you can see, the function returns a correct one, which is "Angela Merkel".
Output:

5. Extracting content
If you wish to extract all the contents of a Wiki page in a plain text format, try the content
attribute of the page
object.
In the example below, I demonstrate how to get "History of KFC" article. The result doesn’t include pictures or tables,…just plain text.
Here is what the results looks like. Simple, right?

You can even create a loop to fetch different contents of different articles related to your defined topic by combine search()
and page().content
. Let’s try to combine several article for Oprah Winfrey.
6. Extracting URL of a page
You can easily extract URL of any page in Wikipedia with url
attribute of the page
object.
Output:

7. Extracting reference URLs
You can even extract all the reference URLs on Wikipedia page with page
object and a different attribute this time, which is references.
Output:
Well, a list of external references is extracted as following:

8. Getting page category
What if I want to figure out how my article is categorized by Wikipedia? Another property of page
object is used, which is categories.
I will try to find out all external links to the above article, "History of KFC".
Output:

9. Extracting page images
Images can also be retrieved with a line of command. By using page().images,
you will get the link of the image. Continue with my example, I will try to get the second picture from "History of KFC" page.
Look what I got here:
Output:
https://upload.wikimedia.org/wikipedia/commons/b/b1/Col_Sanders_Restaurant.png
The link gets you to Sanders’ Restaurant!

10. Changing language output
The language can be changed to any language if the page exists in that language. Set_lang()
method is used for this case. A little bit outside of the topic, but I think this is such a great way to learn new languages. You can try different languages to understand a specific paragraph. The translations are all on the screen.
Above is how I translated summary about "Vietnam" to English.
Output:

Last words
This is quite interesting, right? Wikipedia is one of the largest sources of information on the Internet, and is a natural place for data gathering. With various features of Wikipedia API, this will be much easier.
If you have any interesting libraries, please do not hesitate to share with me.
In order to receive updates regarding my upcoming posts, kindly subscribe as a member using the provided Medium Link.