## What are the drawbacks of an n-gram language model ?

n-gram language model is a non deep learning method to generate language model. Probability of a word, w, (after sequence of 2 words) for a 2 gram model is given by – P(w | “word_1 word_2”) = count( “word_1 word_2 w”) / count( “word_1 word_2” ) , where “word_1 word_2” is ordered sequence of two…

## Can text generation be modelled with regression ?Why do we need a language model?

To restate the question:  Given a sentence “I am about to complete this ”, can regression be used to predict the next word in this sentence?  No It cannot be modeled with a regression task. There are multiple reasons : Any form of temporal data(text) will have a dependency or correlation between consecutive and even non-consecutive…

## How do you generate text using an hMM model given below:

One possible interpretation of the latent variables in the HMM model is that they are POS tags. We will go with this interpretation for simplicity, though the latent states could mean other things as well. To generate text using an HMM, we need to know the transition matrix (the probability of going from one tag…

## What order of Markov assumption does n-grams model make ?

An n-grams model makes order n-1 Markov assumption. This assumption implies: given the previous n-1 words, probability of  word is independent of words prior to words. Suppose we have k words in a sentence, their joint probability can be expressed as follows using chain rule:      Now, the Markov assumption can be used to make…

## How is long term dependency maintained while building a language model?

Language models can be built using the following popular methods – Using n-gram language model n-gram language models make assumption for the value of n. Larger the value of n, longer the dependency. One can refer to what is the significance of n-grams in a language model for further reading. Using hidden Markov Model(HMM) HMM maintains long…

## What is the significance of n-grams in a language model ?

n-grams is a term used for a sequence of n consecutive words/tokens/grams. In general, n-grams can either preserve the ordering or indicate what level of dependency is required in order to simplify the modeling task. While using bag of Words, n-grams come handy to preserve ordering between words but for language modeling, they signify the…

## Given a bigram language model, in what scenarios do we encounter zero probabilities? How should we handle these situations ?

Recall the Bi-gram model can be expressed as :     Following scenarios can lead to zero probability in the above expression : Out of vocabulary(OOV) words – such words may not be present during training and hence any probability term involving OOV words will be 0.0 leading entire term to be zero. This is solved…

## Why is smoothing applied in language model ?

Smoothing is applied because of the following reason: There might be some n-grams in the test set but may not be present in the training set. For ex., If the training corpus is      and you need to find the probability of a sequence like         where <START> is the token applied…

## How to measure the performance of the language model ?

While building language model, we try to estimate the probability of the sentence or a document. Given sequences(sentences or documents) like     Language model(bigram language model) will be :     for each sequence given by above equation. Once we apply Maximum Likelihood Estimation(MLE), we should have a value for the term . Perplexity…

## What is a language model ? Why do you need a language model ?

A language model is a probability distribution over sequences of words given by     It enables us to measure the relative likelihood of different phrases. Measuring the likelihood of a sequence of words is useful in many NLP tasks such as speech recognition, machine translation, POS tagging, parsing, and so on. Example :  In…