Given a bigram language model, in what scenarios do we encounter zero probabilities? How should we handle these situations ?

  1. Recall the Bi-gram model can be expressed as :
  2.     \[p(w)\,=\prod_{i=1}^{k+1} p(w_{i} | w_{i-1}),\]

  3. Following scenarios can lead to zero probability in the above expression :
    1. Out of vocabulary(OOV) words – such words may not be present during training and hence any probability term involving OOV words will be 0.0 leading entire term to be zero.
      1. This is solved by replacing OOV words by UNK tokens in both training and test set and adding UNK to the vocabulary.
    2. Not all bi-grams(n-grams in case of n-gram language model) exist in training set but might be present in the test set. For ex., If the training set contains only single sentence which is “This is the only sentence in the corpus”, and you need to find the probability of a test sequence like “this is the sentence in the corpus”, then </span>p(<em><span style="font-weight: 400;">sentence </span></em>| <em>the</em>) <b>= 0.0 as the bi-gram “the sentence” doesn’t occur in the training set. However, the test sequence is highly probable given the training set.   
      1. This problem is solved by smoothing techniques such as adding a constant in numerator and denominator both, such that probabilities don’t nullify but are very small in default.
      2. For ex., Laplacian smoothing is add-k smoothing where k >= 1
      3. Instead of p(w_{i}|w_{i-1}) = \frac{count\,of\,w_{i-1}w_{i}}{count\,of\,w_{i-1}}, take p(w_{i}|w_{i-1}) = \frac{count\,of\,w_{i-1}w_{i} + 1}{count\,of\,w_{i-1}+V}, where V is the vocabulary size.

Leave a Reply

Your email address will not be published. Required fields are marked *