- I have designed a 2 layered deep neural network for a classifier with 2 units in the hidden layer. I use linear activation functions with a sigmoid at the final layer. I use a data visualization tool and see that the decision boundary is in the shape of a sine curve. I have tried to train with 200 data points with known class labels and see that the training error is too high. What do I do ?
- What is the difference between deep learning and machine learning?
- What is the best strategy for choosing evaluation metric ?
- Why do you typically see overflow and underflow when implementing an ML algorithms ?
- What are Elmo based embeddings ?
- Given the following two sentences, how do you determine if Teddy is a person or not? “Teddy bears are on sale!” and “Teddy Roosevelt was a great President!”
- How to evaluate word vectors ?
- How is long term dependency maintained while building a language model?
- What are the different ways of preventing over-fitting in a deep neural network ? Explain the intuition behind each
- Suppose you build word vectors (embeddings) with each word vector having dimensions as the vocabulary size(V) and feature values as pPMI between corresponding words: What are the problems with this approach and how can you resolve them ?
- What are the challenges building in word embeddings from tweets vs that for wikipedia data ? Can wikipedia data be used to build embeddings for words in twitter data ?
- How to do error analysis efficiently in machine learning?
- What are the commonly used activation functions ? When are they used.
- What is Bayes Error ? What is the best approximation to bayes error ?
- What is negative sampling when training the skip-gram model ?
- What are the optimization algorithms typically used in a neural network ?
- Can you give an example of a classifier with high bias and high variance?
- Given a deep learning model, what are the considerations to set mini-batch size ?
- How to handle incorrectly labeled samples in the training or dev set ?
- I have used a 4 layered fully connected network to learn a complex classifier boundary. I have used tanh activations throughout except the last layer where I used sigmoid activation for binary classification. I train for 10K iterations with 100K examples (my data points are 3 dimensional and I initialized my weights to 0 to begin with). I see that my network is unable to fit the training data and is leading to a high training error. What is the first thing I try ?
- What is cross entropy loss in deep learning?