Linguistic Knowledge in Natural Language Processing

Hafidz Zulkifli
Towards Data Science
4 min readAug 26, 2018

--

Ever since diving into Natural Language Processing (NLP), I’ve always wanted to write something rather introductory about it at a high level, to provide some structure in my understanding, and to give another perspective of the area — in contrast to the popularity of doing NLP using Deep Learning.

Stages of analysis in NLP

NLP Pyramid — Coursera Natural Language Processing

Given a sentence, traditionally the following are the different stages on how a sentence would be analyzed to gain deeper insights.

1. Morphology

At this stage we care about the words that make up the sentence, how they are formed, and how do they change depending on their context. Some examples of these include:

  • Prefixes/suffixes
  • Singularization/pluralization
  • Gender detection
  • Word inflection (modification of word to express different grammatical categories such tenses, case, voice etc..). Other forms of inflection includes conjugation (inflection of verbs) and declension (inflection of nouns, adjectives, adverbs etc…).
  • Lemmatization (the base form of the word, or the reverse of inflection)
  • Spell checking

2. Syntax (Parsing)

In this stage, we focus more on the relationship of the words within a sentence — how a sentence is constructed.

In a way, syntax is what we usually refer to as grammar — NLP For Hackers

To derive this understanding, syntactical analysis is usually done at a sentence-level, where as for morphology the analysis is done at word level. When we’re building dependency trees or processing parts-of-speech — we’re basically analyzing the syntax of the sentence.

3. Semantics

Once we’ve understood the syntactic structures, we are more prepared to get into the “meaning” of the sentence (for a fun read on what meaning can actually mean in NLP — head over here to dive into a Twitter discussion on the subject ).

Some example of tasks performed at this stage include:

  • Named Entity Recognition (NER)

--

--