Suppose you build word vectors (embeddings) with each word vector having dimensions as the vocabulary size(V) and feature values as pPMI between corresponding words: What are the problems with this approach and how can you resolve them ?

Please read here to understand what is PMI and pPMI. Problems As the vocabulary size (V) is large, these vectors will be large in size. They will be sparse as a word may not have co-occurred with all possible words. Resolution Dimensionality Reduction using approaches like Singular Value Decomposition (SVD) of the term document matrix…

What is negative sampling when training the skip-gram model ?

Recap: Skip-Gram model tries to represent each word in a large text as a lower dimensional vector in a space of K dimensions such that similar words are closer to each other. This is achieved by training a feed-forward network where we try to predict the context words given a specific word, i.e.,     …

What is the difference between word2Vec and Glove ?

Word2Vec is a Feed forward neural network based model to find word embeddings. The Skip-gram model, modelled as predicting the context given a specific word, takes the input as each word in the corpus, sends them to a hidden layer (embedding layer) and from there it predicts the context words. Once trained, the embedding for a particular…