Suppose you build word vectors (embeddings) with each word vector having dimensions as the vocabulary size(V) and feature values as pPMI between corresponding words: What are the problems with this approach and how can you resolve them ?

Please read here to understand what is PMI and pPMI.


  1. As the vocabulary size (V) is large, these vectors will be large in size.
  2. They will be sparse as a word may not have co-occurred with all possible words.


  1. Dimensionality Reduction using approaches like
    1. Singular Value Decomposition (SVD) of the term document matrix to get a K dimensional approximation.
    2. Other Matrix factorisation techniques can be employed for dimensionality reduction.

Possible followup question : What is the information lost in approximating a  V dimensional word representation with a K dimensional representation. Answer: SVD finds the best possible K dimensional approximation of the term-document matrix from a information theoretic perspective.


Leave a Reply

Your email address will not be published. Required fields are marked *