What are Elmo based embeddings ?

ELMO embeddings are deep contextual embedding which takes into account all hidden layers to generate embeddings :   They combine a linear combination of all layers of a deep pre-trained neural network instead of just the last layer to get superior embeddings that model syntactic and semantic characteristics of the word use and polysemy –…

Suppose you build word vectors (embeddings) with each word vector having dimensions as the vocabulary size(V) and feature values as pPMI between corresponding words: What are the problems with this approach and how can you resolve them ?

Please read here to understand what is PMI and pPMI. Problems As the vocabulary size (V) is large, these vectors will be large in size. They will be sparse as a word may not have co-occurred with all possible words. Resolution Dimensionality Reduction using approaches like Singular Value Decomposition (SVD) of the term document matrix…

What is negative sampling when training the skip-gram model ?

Recap: Skip-Gram model tries to represent each word in a large text as a lower dimensional vector in a space of K dimensions such that similar words are closer to each other. This is achieved by training a feed-forward network where we try to predict the context words given a specific word, i.e.,     …

How do you design a system that reads a natural language question and retrieves the closest FAQ answer?

There are multiple approaches for FAQ based question answering Keyword based search (Information retrieval approach): Tag each question with keywords. Extract keywords from query and retrieve all relevant questions answers. Easy to scale with appropriate indexes reverse indexing. Lexical matching approach : word level overlap between query and question. These approaches might be harder to…

What are the different ways of representing documents ?

Bag of words : commonly called BoW, creates a vocabulary of words and represent the document as a count vector. The number of dimensions are equivalent to the vocabulary size where each dimension represents the number of times a specific word occurred in the document. Sometimes, TF-IDF is used to reduce the dimensionality of the…

What are popular ways of dimensionality reduction in NLP tasks ? Do you think this is even important ?

One of the most simplest way of representing document as a vector is bag of words(BoW). Though it is the simplest approach but it leads to high dimensional vectors given the large vocabulary size. Some common ways for performing dimensionality reduction in NLP are : TF-IDF : Term frequency inverse document frequency is the best…