- I have designed a 2 layered deep neural network for a classifier with 2 units in the hidden layer. I use linear activation functions with a sigmoid at the final layer. I use a data visualization tool and see that the decision boundary is in the shape of a sine curve. I have tried to train with 200 data points with known class labels and see that the training error is too high. What do I do ?
- Given the following two sentences, how do you determine if Teddy is a person or not? “Teddy bears are on sale!” and “Teddy Roosevelt was a great President!”
- What are the optimization algorithms typically used in a neural network ?
- What is negative sampling when training the skip-gram model ?
- What are the commonly used activation functions ? When are they used.
- What are Elmo based embeddings ?
- Given a deep learning model, what are the considerations to set mini-batch size ?
- Can you give an example of a classifier with high bias and high variance?
- How is long term dependency maintained while building a language model?
- I have used a 4 layered fully connected network to learn a complex classifier boundary. I have used tanh activations throughout except the last layer where I used sigmoid activation for binary classification. I train for 10K iterations with 100K examples (my data points are 3 dimensional and I initialized my weights to 0 to begin with). I see that my network is unable to fit the training data and is leading to a high training error. What is the first thing I try ?
- Suppose you build word vectors (embeddings) with each word vector having dimensions as the vocabulary size(V) and feature values as pPMI between corresponding words: What are the problems with this approach and how can you resolve them ?
- What are the different ways of preventing over-fitting in a deep neural network ? Explain the intuition behind each