GloVe + Fun - Boring

Ludi Rehak
Towards Data Science
2 min readNov 12, 2017

--

GloVe, an acronym derived from Global Vectors, is an unsupervised learning method for representing words as vectors. It trains on word co-occurrence statistics of a corpus, with an objective function specially chosen to encode meaningful differences between word vectors. One of the more famous examples is that the result of “king” + “woman” - “man” is most similar to the vector for “queen”. The learner manages to quantify the concepts of royalty and gender so that they can be decomposed and reassembled to arrive at the word that is the female analogue of king.

I wanted to see if these word vectors could be useful in creating a smarter thesaurus. Nowadays, if you’re searching for a word that has a similar meaning to another word “x”, you’d google “synonyms of x”. What would be more efficient and help narrow the search space is to give a particular direction for the desired synonym. For example, if you’re looking for a word that means “inexperienced” but without the negative connotation, you might prefer “fresh” or “new” over “inept”. To that end, I downloaded word vectors pre-trained on a dataset comprised of 2014 Wikipedia articles and Gigaword 5, a collection of news wires. I found words most similar (as defined by cosine similarity) to the vectors resulting from adding and subtracting other vectors together. Here are some of the analogies along the good-bad axis. As shown in the table, “scheme” + “good” - “bad” ≈ “plan”, and reversing the signs, “scheme” - “good” + “bad” ≈ “scam”, correctly capturing semantics along the spectrum of goodness.

              + good - bad      - good + bad
scheme plan scam
intelligence knowledge spy
naive idealistic stupid
possibility potential consequences
dream wish nightmare
notion concept stereotype
bold courageous risky

Some analogies on the fun-boring axis were amusing.

                + fun - boring    - fun + boring
presentation showcase powerpoint
woman girl housewife
acquaintance friend coworker
music pop orchestral
test challenge exam
premiere festival opera

And here are some examples from the elegant-clumsy axis.

             + elegant - clumsy    - elegant + clumsy
persuade convince pressuring
walked strolled stumbled
threw tossed hurled
placed adorned mishandled
stylish sleek uninspired

According to Richard Socher’s lecture, there is no mathematical proof that guarantees that these analogies fall out of the model. In practice, it seemed sometimes the analogies were correct, but many times they were not. The inklings of a smarter thesaurus are here, but it may need a future method that better captures semantic relationships.

--

--