NLP and POS based chunking to generate Amazon style Key phrases from Reviews

Using a regex parser based on grammar to extract key phrases

Divya Choudhary
Towards Data Science

--

The goal of this article is to introduce the concept of POS chunking with the example of Amazon review tags.

I am planning to upgrade from a 2017 Moto G5 plus to a new phone. In my research for a new phone, I ended up going through a lot of phones listed on Amazon and scouring through their reviews.

Screenshot captured by Author

And just like me, you’d have noticed a list of tags on top of the verbose reviews. These tags saved me a lot of time by highlighting the most talked about points regarding the phone.

Key phrases extracted from reviews. Screenshot by Author

This piqued my interest… Well how do they do it?

To start with we need to extract the phrase of interest from this sea of raw text.

We will make use of a concept in Natural Language processing known as Chunking to divide the sentence into smaller segments of interest.

I used the Amazon reviews dataset from my Post on “Building a Search Engine using SQL”.

I highly recommend you read that post, but you can proceed with rest of the article without comprehension being affected.

The rest of the post contains the following topics:

1. POS Tagging

2. Identifying POS of interest

3. Defining and accessing the chunk

4. Other use cases of POS Chunking

1. POS Tagging

No need to whip out your Wren and Martin grammar book yet. One doesn’t need to be an expert in English grammar for identifying Parts of speech. A basic grasp over grammar is sufficient. Thankfully nltk libraries do that for us.

Output looks like:

[‘Battery’, ‘life’, ‘is’, ‘very’, ‘good’, ‘(‘, ‘I’, ‘am’, ‘not’, ‘a’, ‘gamer’, ‘)’, ‘Display’, ‘is’, ‘fantastic..’, ‘I’, ‘have’, ‘used’, ‘galaxy’, ‘s9’, ‘plus’, ‘in’, ‘the’, ‘past’, ‘and’, ‘I’, ‘love’, ‘the’, ‘display’, ‘of’, ‘this’, ‘phone..’]

POS tagged words= [(‘Battery’, ‘NNP’), (‘life’, ‘NN’), (‘is’, ‘VBZ’), (‘very’, ‘RB’), (‘good’, ‘JJ’), (‘(‘, ‘(‘), (‘I’, ‘PRP’), (‘am’, ‘VBP’), (‘not’, ‘RB’), (‘a’, ‘DT’), (‘gamer’, ‘NN’), (‘)’, ‘)’), (‘Display’, ‘NNP’), (‘is’, ‘VBZ’), (‘fantastic..’, ‘JJ’), (‘I’, ‘PRP’), (‘have’, ‘VBP’), (‘used’, ‘VBN’), (‘galaxy’, ‘NN’), (‘s9’, ‘NN’), (‘plus’, ‘CC’), (‘in’, ‘IN’), (‘the’, ‘DT’), (‘past’, ‘JJ’), (‘and’, ‘CC’), (‘I’, ‘PRP’), (‘love’, ‘VBP’), (‘the’, ‘DT’), (‘display’, ‘NN’), (‘of’, ‘IN’), (‘this’, ‘DT’), (‘phone..’, ‘NN’)]

That's nice, but which ones interest me..?

2. Identifying POS of interest

I strongly benefitted from this online POS tagger because it color codes the POS.

Review Example 1

Pick a review from Amazon and paste it in the online POS tagger

Colorful POS output from online tagger

For the human eyes, recognizing patterns in color comes more naturally than in constructs like POS.

Hmm… what patterns emerge..?

Grey and orange words seem to convey some features. That is an Adjective- Noun combination

Lets see a few more examples..

Review Example 2
Looking for more patterns

Grey — Grey combinations too pop out. Meaning nouns in a sequence.

But also…

Grey and orange with other colors in between ; meaning Noun and Adjective with other POS in between.

So we go through multiple examples and identify a few POS patterns to extract from the reviews.

This brings us to..

3. Defining the chunk

Its best to define multiple patterns to extract most out of the review text.

For a full tutorial on creating POS regex, please see the following link:

Let us look at the output by accessing the chunks.

For the input sentence :

It’s an awesome phone. Cameras are wonderful. I m very happy with its overall performance. Battery seems to last longer after a days use. Nice display.

Chunk 1 : Adjective followed by a Noun

Chunk 2 : Noun and Adjective with other POS in between

Chunk 3: Sequence of Nouns

Lets see a few more examples on other reviews:

There you have it — a list of feature descriptives for a product extracted from the reviews.

Let us give chunking its well-deserved credit by looking at some interesting use cases.

4. Interesting use cases of chunking

The ability to parse through text based on group of words conveying a certain type of meaning opens up a host of possibilities. Continue reading to sample a few:

Use case1 : I want to scour through a customer care call transcript and identify the common complaints with which customers call, so that I can automate those frequent workflows.

Looks like a <Verb> <other POS> <Noun> kind of pattern will work here.

Identifying patterns for Customer Care call center use case

Use case 2 : Extract Doctor’s Suggestions from Consultation notes

Doctor consultation notes
Create new pattern

The extracted chunks read:

  1. Use Cocyx cusion
  2. Apply ice
  3. Continue physiotherapy
  4. strengthening exercises

Once we have the key phrases, we can run a clustering algorithm to group the similar phrase together (eg: group good battery, and high-performance battery). I’ll cover that in a later post.

For today, I hope you leave knowing the power of having a tool like Chunking in your NLP toolkit.

Thanks for reading. Let me know how you plan to use POS chunking.

I’m available in the comments section and LinkedIn.

--

--