What is a monolingual text alignment ? How do you go about it ?

Given a source sentence and a target sentence, textual alignment is the process of matching each segment of the source sentence with a segment of the target sentence. When the two sentences are in the same language, it is a called monolingual alignment.

  • Word level alignment : Individual words of source sentence are aligned with individual words of the target sentence.

                   We are ready to begin the program

                   We are set to start the program

  • Chunk level alignment : Chunks in the source sentence can be aligned with chunks in the target sentence. In this process, the first step involves segmenting the sentences into meaningful chunks.

                   [A man] [reclines] [with a baby in his lap]

                   [A man] [sits in a chair] [holding a baby]   

Some common techniques used to do the above alignments :

  • Brute force heuristic based techniques. Find the similarity between each chunk (or word) in source sentence with that in target and greedily assign pairings.
  • CRFs formulations are commonly used to model alignment problems where the actually alignment is computed through the viterbi algorithm.
  • Most recently attention based RNNs and CNNs have been used to solve this problem.

More formal reading on monolingual alignment can be done here. A CNN based approach for modeling sentence pairs.                    

Leave a Reply

Your email address will not be published. Required fields are marked *