I feel the need, the need for… machine translation

Following an interesting late-night discussion with fellow linguists, I feel the need to chat a little more about MT. The argument was that you cannot translate unless you understand meaning. I don’t believe that’s true. I believe the number of possible word and letter combinations is finite – very large, but finite nonetheless. So, all you need is a computer system that can handle an incredible amount of data – say, billions of lines of text – plus some kind of a clever AI engine that calculates probabilities. If we can eloquently translate the weather forecast – and I have not cross-checked this, so let’s just say that we can – then why should a system not be able to tackle a much more complex text?

Enter Google. A recent NYT article states that Google is using a so-called statistical approach and a few hundred billion words to create a model of a language (Source). This sounds very plausible to me. Now, there are some obvious design flaws with this. If the system checks thousands or millions of passages and their human-generated translations, who is to say that the human translation was flawless to begin with? But if I read this MT translation of The Little Prince, it is almost – eerily – better than its human equivalent! To be fair, the other MT translations in that same article did not impress me at all – but The Little Prince did.

So, in conclusion, I believe that decent-quality MT translation is not that far off, maybe in the next 10-15 years. But don’t quit your day job just yet – a sophisticated or literary text will always need to be proof-red and fact-checked by a human. Only change is constant – we just need to adapt and change the way we, as translators, work. And that’s not necessarily a bad thing.

  1. @Bell_Blog: So, in conclusion, I believe that decent-quality MT translation is not that far off, maybe in the next 10-15 years.

    The term ‘decent quality’ is subjective. Many organisations successfully use MT now.

    @Bell_Blog: But don’t quit your day job just yet – a sophisticated or literary text will always need to be proof-red and fact-checked by a human.

    MT is not suitable for literary text. With literary text, writers play with words. For some literary text, MT will never be able to supply the correct translation, because ambiguity is deliberate.

  2. Are we there yet? Do you know an organisation where MT is used without human intervention, such as on a website? Does it sound ‘fluent’ or still like a translation?

    Not sure about literary text… You know what they say about monkeys and typewriters – it’s only a matter of time 😉 I still believe that a human proof-reader will be needed, but I think MT might eventually get pretty close.

  3. “Microsoft’s Knowledge Base materials have been translated into nine languages by MSR-MT. This approach lowered the cost barrier to obtaining customized, higher-quality MT and Microsoft’s support group is now able to provide usable translations for its entire online KB. It can also keep current with updates and additions on a weekly basis – something that was previously unthinkable both in terms of time and expense.” (http://research.microsoft.com/en-us/projects/mt/)

    I guess that the Microsoft translations are not fluent. I do not know of MT software that produces fluent translations.

    If you read Spanish or Norwegian, read the machine translations that are on http://www.international-english.co.uk/mt-evaluation.html. Professional translators evaluated translations from English to Spanish, English to Norwegian, and English to Welsh. For the Spanish translation and the Norwegian translation, most of the translators say that the text is understandable. Frequently, the text is not fluent. (The machine translation from English to Welsh is not satisfactory.)

    Google is good, but customised software can give better translations. Usually, customised software is used for a particular domain of knowledge. (The software ‘knows’ whether the word ‘bank’ means a financial institution or a river bank.) Frequently, organisations that use customised MT also use controlled language. Each word has a specific meaning, and text is optimised for MT. For some basic rules, see Muegge’s ‘Rules for Machine Translation’ (http://www.muegge.cc/controlled-language.htm).

  4. I love the idea of controlled language (there was an interesting article about it in ATA magazine) – really difficult to write at first (I guess), but 100% unambiguous and “relatively” easily translated because of it – maybe using MT. Thanks for the link!

  5. To write with a controlled language is difficult. However, software can help a writer to conform to the rules. See ‘ASD Simplified Technical English’ (http://www.techscribe.co.uk/techw/se.htm).

    An alternative to controlled language is ‘Global English’ (http://www.globalenglishstyle.com). Global English was developed at SAS. In 2008, SAS had revenue of US $2.26 billion.

    Kohl works at SAS. Kohl gives detailed grammatical guidelines about how to optimise English for machine translation. For my review of Kohl’s ‘The Global English style guide’, see http://www.techscribe.co.uk/ta/global-english-style-guide.htm.

  6. Hi Babellon:

    I’m glad you are blogging again.

    I have a suggestion for you for a really cheap alternative to expensive Adobe software, but I don’t want to discuss it here. Ask me about it at some point if you remember. I have been using it for years.

    Another thing that I would like to mention in connection with your boundless optimism with regard to machine translation: I was delighted by the machine translation tool available on the World Intellectual Property (WIPO) website. It is a new feature that was added just last month and it is based on the Google Translate (statistical model).

    So I put it to test: I selected a short paragraph from a Japanese patent at random and translated it and then ran it through the MT function. Based on my experience, the results are even worse that what I am used to from the MT tool which is available on the Japan Patent Office website, which I believe is based on the Systran model. I would not trust preselected samples of tests of MT capabilities. The people who preselect these things are hardly neutral observers. They are salesmen who are likely to cheat. Do your own test, have a German passage translated by software and then translate it yourself. You’ll see what I mean.

    You can read about my test of Google Translate on my blog.

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: