The world’s leading publication for data science, AI, and ML professionals.

Vector Space Models

Notes from Natural Language Processing Specialization Course 1, Week 3

Natural Language Processing Notes

Photo by NASA on Unsplash
Photo by NASA on Unsplash

Continuing on from our Natual Language Processing Notes series, you may have noticed I skipped Week 2. This is not by accident, I realized I have already made considerable notes on Bayes Theorem and Naive Bayes (links below) since this is all that has changed from week 1 to week 2 (the algorithm we use to predict the sentiment of the tweet).

Marginal, Joint and Conditional Probabilities explained By Data Scientist

Algorithms From Scratch: Naive Bayes Classifier


What is a Vector Space Model?

Vector space models are algebraic models that are often used to represent text (although they can represent any object) as a vector of identifiers. With these models, we are able to identify whether various texts are similar in meaning, regardless of whether they share the same words.

Figure 1: Example of how words may share similar words but have different meanings and vice versa (Image by Author)
Figure 1: Example of how words may share similar words but have different meanings and vice versa (Image by Author)

The idea is based on a famous saying by an English linguist (and a leading figure in British linguistics during the 1950s) named John Rupert Firth…

"You shall know a word by the company it keeps" – J.R.Firth

There are numerous instances we may decide to employ a vector spaced model, for instance:

  • Information Filtering
  • Information Retrieval
  • Machine Translation
  • Chatbots

And many more!

In general, Vector space models allow us to represent words and documents as vectors.

Word By Word & Word By Doc

For us to represent our text as vectors we may decide to use a word-by-word or word-by-doc design. Performing this task involves first creating a co-occurrence matrix.

Although how we perform each task is quite similar, we will discuss each design one at a time, nonetheless, the objective is the same. We want to go from our co-occurrence matrix to a vector representation.

Figure 2: Mapping of our co-occurrence matrix to a vector representation (Image By Author)
Figure 2: Mapping of our co-occurrence matrix to a vector representation (Image By Author)

Word By Word: This design counts the number of times words occur within a certain distance k.

Figure 3: Example of word by word co-occurrence matrix where k=2 (Image By Author)
Figure 3: Example of word by word co-occurrence matrix where k=2 (Image By Author)

In the word by word design, the co-occurrence matrix is between 1 and N entries.

Word By Doc: The number of times words from the vocabulary appear in documents that belong to certain categories.

Figure 4: Example of word by doc co-occurrence matrix (Image By Author)
Figure 4: Example of word by doc co-occurrence matrix (Image By Author)

Using these vector representations, we can now represent our text or documents in vector space. This is perfect because in vector space we can determine the relationships between types of documents, such as their similarity.

Figure 5: Representing word by doc in vector space (Image By Author)
Figure 5: Representing word by doc in vector space (Image By Author)

Euclidean Distance

A similarity metric we may use to determine how far apart 2 vectors are from one another is the Euclidean distances, which is merely the length of a straight line that connects 2 vectors.

Figure 6: Formula for Euclidean Distance (Image By Author)
Figure 6: Formula for Euclidean Distance (Image By Author)

Let’s use the formula in Figure 6 to calculate the more similar documents using our vector representations from Figure 5.

Figure 7: Calculating the Euclidean distances (Image By Author)
Figure 7: Calculating the Euclidean distances (Image By Author)

The results tell us that the economy and Machine Learning documents are more similar since distance-based metrics prioritize objects with lower values to detect similarity. With that being said, it is important to note that the euclidean distance is not scale-invariant and it’s often recommended to scale your data.

Cosine Similarity

The problem with the Euclidean distance is that it is biased by size difference in representations. Therefore, we may decide to use the cosine similarity which would determine how similar text is using the inner angle.

Figure 8: Formula for Cosine similarity (Image By Author)
Figure 8: Formula for Cosine similarity (Image By Author)

Cosine similarity is one of the most popular similarity metrics used in NLP. To calculate similarity, we take the cosine similarity of an angle between two vectors.

Figure 9: Calculating cosine similarities (Image By Author)
Figure 9: Calculating cosine similarities (Image By Author)

When the cosine value is equal to 0 this means the two vectors are orthogonal to one another and have no match. Whereas, a cosine value closer to 1 would imply that there is a greater match between the two values (since the angles are smaller). Therefore, from our results, Economy and Machine Learning are the most similar – Read more about the Cosine Similarity metric on Wikipedia.

Manipulating Words in Vector Space

By performing some simple vector arithmetic, we are able to infer unknown representations among words.

For instance, if we know the relationship between two similar such as King and Man. In order to find the vector representation of the word "Queen", we can add the vector representation we retrieved from determining the relationship between King and Man (we retrieve this vector by subtracting the vectors i.e. King – Man) to the vector representation of Woman and inferring that the most similar vector representation (which would be Queen in this instance) is the vector we wanted to find.

Note: For more on this read Mikolov et al, 2013, Distributed Representations of Words and Phrases and their Compositionality.

Figure 10: Visual Representation (Image by Author)
Figure 10: Visual Representation (Image by Author)

Wrap Up

In conclusion, we may use vector space models to represent our text or documents in vector space, and when our data is in vector space, we can use the vectors to determine the relationships between text (or documents).

Let’s keep the conversation going on LinkedIn…

Kurtis Pykes – AI Writer – Towards Data Science | LinkedIn


Related Articles