- What is PMI ?
- What is precision recall tradeoff ?
- Why is named entity recognition hard ?
- How do you design a system that reads a natural language question and retrieves the closest FAQ answer?
- Why may your Machine Learning model not work in production ?
- What are Elmo based embeddings ?
- What would you care more about – precision or recall for spam filtering problem?
- Can you give an example of a classifier with high bias and high variance?
- How to evaluate word vectors ?
- What is the most efficient way of serialising the machine learning models?
- How does KNN algorithm work ? What are the advantages and disadvantages of KNN ?
- What are some common tools available for NER ? Named Entity Recognition ?
- What are popular ways of dimensionality reduction in NLP tasks ? Do you think this is even important ?
- You are given some documents and asked to find prevalent topics in the documents – how do you go about it ?
- I have designed a 2 layered deep neural network for a classifier with 2 units in the hidden layer. I use linear activation functions with a sigmoid at the final layer. I use a data visualization tool and see that the decision boundary is in the shape of a sine curve. I have tried to train with 200 data points with known class labels and see that the training error is too high. What do I do ?
- What are evaluation metrics for multi-class classification problem ?
- What is Bayes Error ? What is the best approximation to bayes error ?
- What is cross entropy loss in deep learning?
- What are the challenges building in word embeddings from tweets vs that for wikipedia data ? Can wikipedia data be used to build embeddings for words in twitter data ?
- What is the difference between parametric and nonparametric models ?
- How to measure the performance of the language model ?
- Why does ensemble methods have better chances of giving a better model than an individual model ?
- You are building a natural language search box for a website. How do you accommodate spelling errors?
- How to do error analysis efficiently in machine learning?
- I have used a 4 layered fully connected network to learn a complex classifier boundary. I have used tanh activations throughout except the last layer where I used sigmoid activation for binary classification. I train for 10K iterations with 100K examples (my data points are 3 dimensional and I initialized my weights to 0 to begin with). I see that my network is unable to fit the training data and is leading to a high training error. What is the first thing I try ?
- What is the difference between word2Vec and Glove ?
- What are the different ways of preventing over-fitting in a deep neural network ? Explain the intuition behind each
- What are knowledge graphs? When would you need a knowledge graph over say a database to store information?
- Why is logistic regression a linear classifier?
- How does bias and variance error gets introduced ?
- How do you detect sarcasm?
- What are the commonly used activation functions ? When are they used.
- Why do you typically see overflow and underflow when implementing an ML algorithms ?
- What is the complexity of Viterbi algorithm ?
- What are the challenges of imbalanced dataset in machine learning?
- Overfitting is a result of which of the following causes :
- What is page rank algorithm ?
- What is the difference between deep learning and machine learning?
- Whether to reduce bias error or variance error ?
- What are the drawbacks of an n-gram language model ?
- How do you train an hMM model in practice ?
- What is speaker segmentation in speech recognition ? How do you use it ?
- What are the drawbacks of oversampling minority class in imbalanced class problem of machine learning ?
- What are some knowledge graphs you know. What is different between these ?
- How do you deal with dataset imbalance in a problem like spam filtering ?
- How will you build an auto suggestion feature for a messaging app or google search?
- You have come up with a Spam classifier. How do you measure accuracy ?
- Suppose you are modeling text with a HMM, What is the complexity of finding most the probable sequence of tags or states from a sequence of text using brute force algorithm?
- How do you deploy machine learning models in production?
- With the maximum likelihood estimate are we guaranteed to find a global minimum ?
- Why NLP related models are prone to overfitting ?
- How do you detect outliers in data ? How do you deal with them ?
- What is overfitting and underfitting ? Why do they occur? How do you overcome them?
- What is the problem in random or uniform sampling of test set from the entire dataset ?
- What is the best strategy for choosing evaluation metric ?
- If the average length of a sentence is 100 in all documents, should we build 100-gram language model ?
- Which of the following data problems is solved using stratified sampling ?
- Can text generation be modelled with regression ?Why do we need a language model?
- What are the different ways of representing documents ?
- How many parameters are there for an hMM model?
- What is negative sampling when training the skip-gram model ?
- Why do you need training set, test set and validation set ?
- What are the advantages and disadvantages of using naive bayes for spam detection?
- What is a monolingual text alignment ? How do you go about it ?
- Error analysis in supervised machine learning
- Is the run-time of an ML algorithm important? How do I evaluate whether the run-time is OK?
- How do you measure quality of Machine translation ?
- What is dev set in machine learning? What are its requirements ?
- What is the bias variance trade-off ?
- What is perplexity ? Where do you typically use perplexity ?
- How can you increase the recall of a search query (on search engine or e-commerce site) result without changing the algorithm ?
- Suppose you build word vectors (embeddings) with each word vector having dimensions as the vocabulary size(V) and feature values as pPMI between corresponding words: What are the problems with this approach and how can you resolve them ?
- Machine Learning Evaluation Metrics
- How to handle incorrectly labeled samples in the training or dev set ?
- How do you handle missing data in an ML algorithm ?
- Why don’t we tune hyper-parameters using test set and need a separate set like validation set?
- What are the different independence assumptions in hMM & Naive Bayes ?
- What is stratified sampling and why is it important ?
- How do you generate text using an hMM model given below:
![Rendered by QuickLaTeX.com \[p(x, y)=p(x|y)p(y)\,=\,\prod_{t=1}^{T}p(x_{t}|y_{t})p(y_{t}|y_{t-1})\]](https://machinelearningaptitude.com/wp-content/ql-cache/quicklatex.com-da9e1e96bca390bd552f19abd2ecce7a_l3.png)
- How do you serialise and deserialise machine learning model after training?
- How do you eliminate underfitting ?