Machine translation can be done by either of the following techniques :
- Rule based machine translation (Older techniques) : Uses dictionary between words of the two languages along with syntactic, semantic morphological analysis of the source sentence to define context. Linguistic Rules are defined to translate a specific word in a given context into target language. One of the advantages of this approach is that it doesn’t require a parallel corpora. But it is hard to define the rules between each language pair.
- Statistical Machine Translation : This technique involves working with parallel corpora, that are aligned sentence by sentence (sometimes word to word). IBM models have been very popular for a long time, a series of increasingly complex probabilistic models based learning on word to word translation and alignment. Basically, if S is the source sentence and T is the target sentence, we want to find the target sentence T that maximizes the conditional probability P(T|S)
- Neural Machine Translation : This is the technique employed by google translate and the state of the art involving sequence to sequence models. Here is a tutorial on neural machine translation. This is the paper by Bengio on neural machine translation.