- Why are bigrams or any n-grams important in NLP(task like sentiment classification or spam detection) or important enough to find them explicitly?
- What are Elmo based embeddings ?
- What are the advantages and disadvantages using bag of words feature vector?
- Why NLP related models are prone to overfitting ?
- How do you detect sarcasm?
- How to evaluate word vectors ?
- How do you design a system that reads a natural language question and retrieves the closest FAQ answer?
- What is PMI ?
- What is a language model ? Why do you need a language model ?
- Why is smoothing applied in language model ?
- What is speaker segmentation in speech recognition ? How do you use it ?
- You are building a natural language search box for a website. How do you accommodate spelling errors?
- What are the different independence assumptions in hMM & Naive Bayes ?
- How do you deal with dataset imbalance in a problem like spam filtering ?
- What are the advantages and disadvantages of using naive bayes for spam detection?
- What is cross entropy loss in deep learning?
- What are the different ways of representing documents ?
- What is the difference between translation and transliteration
- How will you build the smart reply feature on an app like gmail or linkedIn?
- How will you build an auto suggestion feature for a messaging app or google search?
- What can you say about the most frequent and most rare words ? Why are they important or not important ?
- What are knowledge graphs? When would you need a knowledge graph over say a database to store information?
- How to measure the performance of the language model ?
- Given the following two sentences, how do you determine if Teddy is a person or not? “Teddy bears are on sale!” and “Teddy Roosevelt was a great President!”
- Suppose you are modeling text with a HMM, What is the complexity of finding most the probable sequence of tags or states from a sequence of text using brute force algorithm?
- Explain latent dirichlet allocation – where is it typically used ?
- What would you care more about – precision or recall for spam filtering problem?
- You are given some documents and asked to find prevalent topics in the documents – how do you go about it ?
- How many parameters are there for an hMM model?
- If the average length of a sentence is 100 in all documents, should we build 100-gram language model ?
- What is the formula for tf.idf ? Why do we use ‘log’ in idf formula ?
- If you don’t have a stop-word dictionary or are working on a new language, what approach would you take to remove stop words?
- How do you deal with out of vocabulary words during run time when you build a language model ?
- What are some knowledge graphs you know. What is different between these ?
- What will happen if you do not convert all characters to a single case (either lower or upper) during the pre-processing step of an NLP algorithm?
- Given a bigram language model, in what scenarios do we encounter zero probabilities? How should we handle these situations ?
- What is the difference between paraphrasing and textual entailment ?
- Suppose you build word vectors (embeddings) with each word vector having dimensions as the vocabulary size(V) and feature values as pPMI between corresponding words: What are the problems with this approach and how can you resolve them ?
- What are some common tools available for NER ? Named Entity Recognition ?
- Say you have generated a language model using Bag of Words (BoW) with 1-hot encoding and your training set has lot of sentences with the word “good” but none with the word “great”. Suppose there is a sentence “Have a great day” then p(great)=0.0 using this training set. How can you solve this problem leveraging the fact that good and great are similar words?
- How can you increase the recall of a search query (on search engine or e-commerce site) result without changing the algorithm ?
- What is the difference between stemming and lemmatisation?
- What are common tools for speech recognition ? What are the advantages and disadvantages of each?
- Where would you not want to remove stop words ?
- What is negative sampling when training the skip-gram model ?
- You have come up with a Spam classifier. How do you measure accuracy ?
- How is long term dependency maintained while building a language model?
- How do you train an hMM model in practice ?
- How do you find the most probable sequence of POS tags from a sequence of text?
- What are the advantages and disadvantages of using Rule based approaches in NLP?
- What are the challenges building in word embeddings from tweets vs that for wikipedia data ? Can wikipedia data be used to build embeddings for words in twitter data ?
- Can text generation be modelled with regression ?Why do we need a language model?
- Why is named entity recognition hard ?
- What is the difference between word2Vec and Glove ?
- What is shallow parsing
- What order of Markov assumption does n-grams model make ?
- What is the significance of n-grams in a language model ?
- You are trying to cluster documents using a Bag of Words method. Typically words like if, of, is and so on are not great features. How do you make sure you are leveraging the more informative words better during the feature Engineering?
- Which is better to use while extracting features character n-grams or word n-grams? Why?
- Can you find the antonyms of a word given a large enough corpus? For ex. Black => white or rich => poor etc. If yes then how, otherwise justify your answer.
- What is perplexity ? Where do you typically use perplexity ?
- How do you generate text using an hMM model given below:
![Rendered by QuickLaTeX.com \[p(x, y)=p(x|y)p(y)\,=\,\prod_{t=1}^{T}p(x_{t}|y_{t})p(y_{t}|y_{t-1})\]](https://machinelearningaptitude.com/wp-content/ql-cache/quicklatex.com-da9e1e96bca390bd552f19abd2ecce7a_l3.png)
- What are popular ways of dimensionality reduction in NLP tasks ? Do you think this is even important ?
- What are the state of the art techniques for Machine Translation ?
- What are the drawbacks of an n-gram language model ?
- What is a monolingual text alignment ? How do you go about it ?