OmegaSoft Homepage  |  Products & Services  |  About OmegaSoft


 

OmegaSoft Development Blog

History of Machine Translation

12 January 2007 - 10:04 PM

 

Machine translation (also known as MT) enables someone to use a computer to translate a block of text automatically for them - without any human interaction.

 

Im just going to note some of the development steps between the first translators and where the future is going to be.

 

Keywords: SL, source language; TL, target language

 

Word-for-word translations

Takes each word in the sentence and replaces it with the counterpart.

 

Right away, this is obviously not the best method, as the TL might not have an eqivilent counterpart. Also, some languages have different word ordering to English. While some languages also have additional words.

 

A reasonable MT needs a good knowledge of both the source language and the target language. Especially their similarities and differences.

 

The problems you face when translating text are: Dealing with morphology, lexical ambiguity, structual ambiguity, multi-word units, language differences, dealing with meaning.

 

Direct MT

 

This method is easy to implement.

 

Similar to word-for-word translation techniques, but also translate phrases-to-phrase as well. And then attempts to reorder ambiguous setencces.

 

But the problems persit, because it does not analyse linguistic information of the SL before translation.

 

While it is a robust method, it only foucues on one language pair, and is quite often uni-directional.

 

Transfer-Based MT

 

Looks at the SL first of all. Capturing its lingusitic information about each sentence.

 

It them maps the SL and TL components with their counterparts, taking into consideration the lingustic information formally aquired.

 

This method is bi-directional.

 

It places much more focus on the required language analysis, to see what is actually being translated, so it can map it a little more accurately. It effectively examines the difference between the two languages.

 

Interlingual MT Systems

 

A step in the right direction. Produces more accurate results than previously described methods.

 

It uses an intermediate language between the SL and TL. Thus forcing two translations during the process.

 

And it is theoretically bi-directional.

 

This intermediate layer is known as an interlingua.

 

It is also rather good for translating between various language pairs.

 

While it is a step in the right direction, its not quite perfect, as it fails to understand ungrammatical, errornous inputs.

 

Example-Basedd Machine Translation (EMBT)

 

This is where the future is!

 

When humans translate something, (unless they are experts or speak more than one language natively) they will use a bilingual dictionary. Which list the SL words and the TL equivients. But it also shows different TL's for each SL word based on a select number of examples provided in the entry.

 

This is effectively how EBMT work. It uses a corpus of bi-lingual examples. It then puts the SL into context and looks up the appropriate TL example, based on a 'best match'.

 

The corpus (the multi-lexical database) groups example terms based on their semantic similarity. Esentially using translation templates to translate structually similar sentences.

 

EMBT is robust. It deals with problems encountered by al methods discussed previously.

 

It's strenghts lye in; it not being domain specific, no complex analysing rules, alignment of terms can be done automatically, supports multi languages (not just one language pair) and it can be easily intergrated into basic MT models.

 

While it corrects all problems identified, and offers more strengths, there does come some new weaknesses and problems not encountered by previous methods.

 

Such as; it needs a good range of bilingual texts (data), needs cleaning up every so often by human interaction, data may need finely aligning, calcuations can be a fairly lengthy process, as it's searching through thousands of different examples.

 

Simple unambiguous sentences maybe better suited for more primitive methods, as it would be more efficient - but only where the SL is fairly straight forward.

 

Seb Harvey

 

Back to blog



© Copyright 2008  |  Terms of Use  |  OmegaSoft Homepage  |  Feedback