A language model is a probability distribution over sequences of words given by
It enables us to measure the relative likelihood of different phrases. Measuring the likelihood of a sequence of words is useful in many NLP tasks such as speech recognition, machine translation, POS tagging, parsing, and so on.
Example : In any generative model where a target sequence is generated from a source sequence, for instance, in machine translation
Target seq = p(target_seq, source seq)
= p(target_seq) p(source_seq | target_seq)
p(target_seq) is typically the language model while p(source_seq | target_seq) depends on the specific statistical model used for machine translation.
Another Example : For speech recognition, the task involves converting a sequence of sounds into word sequences. The language model enables distinguishing between target_sequence phrases that sound similar based on relative likelihood of the phrase occurring. Example: I am eating an ice cream is more likely than I am eating and I scream during the speech recognition task. Through language modeling, probability of both these sequences will be different and the one with highest probability will be chosen.
Common language models involve n-gram model where each word depends on previous n words (unigram or bag of words, bi-gram, tri-gram), HMM based models and neural language models.