Given a bigram language model, in what scenarios do we encounter zero probabilities? How should we handle these situations ?

Recall the Bi-gram model can be expressed as :     Following scenarios can lead to zero probability in the above expression : Out of vocabulary(OOV) words – such words may not be present during training and hence any probability term involving OOV words will be 0.0 leading entire term to be zero. This is solved…

Why is smoothing applied in language model ?

Smoothing is applied because of the following reason: There might be some n-grams in the test set but may not be present in the training set. For ex., If the training corpus is      and you need to find the probability of a sequence like         where <START> is the token applied…