Suppose you build word vectors (embeddings) with each word vector having dimensions as the vocabulary size(V) and feature values as pPMI between corresponding words: What are the problems with this approach and how can you resolve them ?

Please read here to understand what is PMI and pPMI. Problems As the vocabulary size (V) is large, these vectors will be large in size. They will be sparse as a word may not have co-occurred with all possible words. Resolution Dimensionality Reduction using approaches like Singular Value Decomposition (SVD) of the term document matrix…

What are the different ways of preventing over-fitting in a deep neural network ? Explain the intuition behind each

L2 norm regularization : Make the weights closer to zero prevent overfitting. L1 Norm regularization : Make the weights closer to zero and also induce sparsity in weights. Less common form of regularization Dropout regularization : Ensure some of the hidden units are dropped out at random to ensure the network does not overfit by…

I have designed a 2 layered deep neural network for a classifier with 2 units in the hidden layer. I use linear activation functions with a sigmoid at the final layer. I use a data visualization tool and see that the decision boundary is in the shape of a sine curve. I have tried to train with 200 data points with known class labels and see that the training error is too high. What do I do ?

Increase number of units in the hidden layer Increase number of hidden layers ¬†Increase data set size Change activation function to tanh Try all of the above The answer is d. When I use a linear activation function, the deep neural network is realizing a linear combination of linear¬† functions which leads to modeling only…