What is the difference between word2Vec and Glove ?

Word2Vec is a Feed forward neural network based model to find word embeddings. The Skip-gram model, modelled as predicting the context given a specific word, takes the input as each word in the corpus, sends them to a hidden layer (embedding layer) and from there it predicts the context words. Once trained, the embedding for a particular word is obtained by feeding the word as input and taking the hidden layer value as the final embedding vector. 

GloVe, global vectors, is based on matrix factorization techniques on the word-context matrix. It first constructs a large matrix of (words x context) co-occurrence information, i.e. for each “word” (the rows), you count how frequently we see this word in some “context” (the columns) in a large corpus.  The number of “contexts” is of course large, since it is essentially combinatorial in size. So then we factorize this matrix to yield a lower-dimensional (word x features) matrix, where each row now yields a vector representation for each word. In general, this is done by minimizing a “reconstruction loss”. This loss tries to find the lower-dimensional representations which can explain most of the variance in the high-dimensional data.

Loss function is given by

    \[\sum_{u\in W} \sum_{v \in W} f(n_{uv})(<\phi_u, \theta_v> + b_u + b'_v - log\ n_{uv})^2\]

As you can see the loss function is a squared loss but the loss is weighted by function f(n_{uv}), as shown in the below figure. This ensures too frequent words like stop-words do not get too much weight. n_{uv} is the count when u and v occur together. \phi_u and \theta_v correspond to the matrix factorization task. b_u and b_v are the bias terms. Note that instead of log\ n_{uv}, one can also have PMI between u and v.

GloVe loss function
Weighing function used in GloVe squared loss function

The original paper on GloVe can be found here.

The word vectors in an abstract way represent different facets of the meaning of a word. Some notable properties are :

  1. Such word vectors are good at answering analogy questions. The relationship between words is derived by distance between words.
  2. We can also use element-wise addition of vector elements to ask questions such as ‘German + airlines’   

word2vec

Analogy using word vectors

More intuition behind the difference between word2vec and Glove.

 

Leave a Reply

Your email address will not be published. Required fields are marked *