## How to evaluate word vectors ?

Word vectors whether derived from word2vec or glove or by using co-occurrence statistics, they need to be evaluated for performance reasons. This can be done in 2 major ways as mentioned below: Intrinsic ways are used when word vectors are build or evaluated for a specific or an intermediate subtask. Such evaluations are fast to compute…

## What are the challenges building in word embeddings from tweets vs that for wikipedia data ? Can wikipedia data be used to build embeddings for words in twitter data ?

Twitter data differs from wikipedia data in a number of ways: Twitter data, in form of tweets, is very noisy due to the following reasons. Spelling errors Abbreviations Code mixing as multiple languages are used Grammatical mistakes Tweets are very short in comparison to any normal sentence on wikipedia or news articles. This could be…

## Suppose you build word vectors (embeddings) with each word vector having dimensions as the vocabulary size(V) and feature values as pPMI between corresponding words: What are the problems with this approach and how can you resolve them ?

Please read here to understand what is PMI and pPMI. Problems As the vocabulary size (V) is large, these vectors will be large in size. They will be sparse as a word may not have co-occurred with all possible words. Resolution Dimensionality Reduction using approaches like Singular Value Decomposition (SVD) of the term document matrix…

## What is negative sampling when training the skip-gram model ?

Recap: Skip-Gram model tries to represent each word in a large text as a lower dimensional vector in a space of K dimensions such that similar words are closer to each other. This is achieved by training a feed-forward network where we try to predict the context words given a specific word, i.e.,     …

## What is the difference between word2Vec and Glove ?

Word2Vec is a Feed forward neural network based model to find word embeddings. The Skip-gram model, modelled as predicting the context given a specific word, takes the input as each word in the corpus, sends them to a hidden layer (embedding layer) and from there it predicts the context words. Once trained, the embedding for a particular…