How to Identify Gaps in Public Discourse Using SEO and Text Mining

Dmitry Paranyushkin
Towards Data Science
5 min readJul 24, 2018

--

Search engine optimization (SEO) is normally used to push content to the top of Google search results page. However, using the data available today as well as text analysis and visualization techniques, we can apply SEO approachfor something much more interesting: discovering the gaps in public discourse. Those gaps are discrepancies between what people are looking for and what they actually find. Once we identify these gaps, we can produce relevant content to fulfil the lack between the demand and supply, be it political discourse, a niche market, or scientific research.

The core of the approach that we propose is based on a combination of text network analysis, visualization techniques, and text mining.

Question #1: What Is The State of the Current Discourse?

The first step is to identify the objectives and interests. For example, in one of our previous studies we focused on the topic of “text mining”. Our goal was to attract the audience who’s searching for this topic on the internet to the website of our research lab, Nodus Labs, offering InfraNodus text network visualization tool to this audience.

We used InfraNodus to visualize the current search results for “text mining” — what people actually see when they look for it on Google.

Text network visualization of Google search results for “Text Mining” — what people actually find when they look for this search query, visualized using InfraNodus

We found out that there’s, obviously, a dominant topical cluster of
text — mining — analytics

as well as other topical clusters such as
process — analyze — large
and
data — unstructured—analyse
and
pattern — turn — interesting

Which shows us that when people search for “text mining” on Google what they actually find is mostly general descriptions of what it’s good for (identifying patterns in unstructured data) and how it works.

Of course, in order to get a better picture it is recommended to look into several different relevant keywords in order to construct a better representation. For example, in our case we could add “data mining”, “text analytics” etc.

Question #2: What Do People Actually Look For?

Once we identified the state of the current discourse, we can now find out what people actually look for when they want to get into that discourse. This will help us see if there’s any discrepancy between what people want and what they actually get, so we can later target that discrepancy with the new content, idea, products or services.

In order to do that we can use Google Keyword Planner tool, which is available as part of the AdWords platform. First, we need to see a list of relevant search terms to our search query “text mining”.

These include “text analytics”, “sentiment analysis”, “text data mining” etc. Then we can download them as a CSV file, open it in Google Sheets, then copy and paste the keywords column of the table into InfraNodus to visualize a graph of the main terms and the connections between them:

What people search for when they look for “text mining” — related queries from Google Keywords Tool visualized with InfraNodus

We can see that the terms “text mining” and “data analysis” are quite prominent in the list of keywords that people use when they search for this topic. Those are quite self-evident, so we can remove them (and others) from the graph to see what topics are left when we remove all the ones we already know about:

“Text mining” associated search queries with “text”, “analytics”, “analysis”, “data”, “mining” removed from the graph

We can now see that when people search for “text mining” (and associated searches), there’s a topical cluster in their search queries, which is comprised of:

software — tool — online

Meaning that people look for online software tools to do text mining, identifying a potential gap both in the content and in the market for text processing tools.

Question #3: What’s the Difference Between What People Look For and What They Find?

Now that we’ve identified a special topical cluster of what people look for when they search for “text mining”, we can now compare it with the text graph of Google search results. In order to do that we can use InfraNodus comparison feature. It shows us which terms from the second graph (what people look for) are not present in the first graph (the search results people get), and ranges them according to their importance:

The black nodes on the graph are the terms that people look for (in relation to “text mining”) but don’t really find in Google search results — identifying a gap and, thus, a potential content / product / service opportunity.

These terms are:
web — retrieval — application — algorithm

As we can see there’s a need for finding web-based applications for text retrieval and various text mining algorithms that could be used online.

Therefore, if we are to create the new content, which would be relevant, have low competition, and fulfil a certain need, which is not fulfilled by other content, we’d need to write about “text mining” in relation to web-based applications and algorithms. This is what people look for but don’t find.

Moreover, this also shows us a potentially interesting gap in the market, a niche that we could fill it with a new product or service.

Of course, a more thorough study needs to be made including all the search terms around “text mining”, but this general simplified example presented above can be used to study any public discourse and to identify potentially interesting areas to add value to this discourse based on the audience’s needs.

If you would like to try this approach in action, you can set up a new account on www.infranodus.com. To import Google search results, go to infranodus.com/google.

--

--