How to evaluate word vectors ?

Word vectors whether derived from word2vec or glove or by using co-occurrence statistics, they need to be evaluated for performance reasons. This can be done in 2 major ways as mentioned below: Intrinsic ways are used when word vectors are build or evaluated for a specific or an intermediate subtask. Such evaluations are fast to compute…

What are Elmo based embeddings ?

ELMO embeddings are deep contextual embedding which takes into account all hidden layers to generate embeddings :   They combine a linear combination of all layers of a deep pre-trained neural network instead of just the last layer to get superior embeddings that model syntactic and semantic characteristics of the word use and polysemy –…

What are the challenges building in word embeddings from tweets vs that for wikipedia data ? Can wikipedia data be used to build embeddings for words in twitter data ?

Twitter data differs from wikipedia data in a number of ways: Twitter data, in form of tweets, is very noisy due to the following reasons. Spelling errors Abbreviations Code mixing as multiple languages are used Grammatical mistakes Tweets are very short in comparison to any normal sentence on wikipedia or news articles. This could be…

Suppose you build word vectors (embeddings) with each word vector having dimensions as the vocabulary size(V) and feature values as pPMI between corresponding words: What are the problems with this approach and how can you resolve them ?

Please read here to understand what is PMI and pPMI. Problems As the vocabulary size (V) is large, these vectors will be large in size. They will be sparse as a word may not have co-occurred with all possible words. Resolution Dimensionality Reduction using approaches like Singular Value Decomposition (SVD) of the term document matrix…

What is negative sampling when training the skip-gram model ?

Recap: Skip-Gram model tries to represent each word in a large text as a lower dimensional vector in a space of K dimensions such that similar words are closer to each other. This is achieved by training a feed-forward network where we try to predict the context words given a specific word, i.e.,     …

How do you design a system that reads a natural language question and retrieves the closest FAQ answer?

There are multiple approaches for FAQ based question answering Keyword based search (Information retrieval approach): Tag each question with keywords. Extract keywords from query and retrieve all relevant questions answers. Easy to scale with appropriate indexes reverse indexing. Lexical matching approach : word level overlap between query and question. These approaches might be harder to…

What are the different ways of representing documents ?

Bag of words : commonly called BoW, creates a vocabulary of words and represent the document as a count vector. The number of dimensions are equivalent to the vocabulary size where each dimension represents the number of times a specific word occurred in the document. Sometimes, TF-IDF is used to reduce the dimensionality of the…