How do you find the most probable sequence of POS tags from a sequence of text?

  1. This problem can be solved with an HMM.
  2. Using an HMM involves finding the transition probabilities (what is the probability of going from one POS tag to another and emission/output probabilities (what is the probability of observing a word given a POS tag) as explained in the question How do you train an hMM.
  3. Once hMM is trained using a large enough corpus, then use the Viterbi algorithm to find the most probable sequence of tags.

Interview Tip: Note that one might be in hurry to answer Viterbi algorithm. It is true that Viterbi algorithm is used, but we don’t start with Viterbi rather first train an HMM model and then apply the Viterbi algorithm with the learnt transition and emission matrices to answer the question of finding the POS tags.

Leave a Reply

Your email address will not be published. Required fields are marked *