On machine translation…

My attention was directed a couple of years ago to New York’s Metropolitan Transportation Authority, which had apparently made the decision to forego human translation, deciding instead to offer non-English speaking visitors machine translations of site pages. Presumably this was done as a cost-cutting move, despite the fact that the resulting translations were, by the MTA’s own modest admission, “imperfect.”

On a page devoted to electric car evacuation procedures, for example, the term “Long Island Rail Road” was rendered in Russian as “Длинняя дорога рельса острова” (literally – and it’s hard to do this because the endings make no sense – “Long road of island rail”), and from the same page (devoted, by the way, to safety issues), there was this:

Whatever the emergency, rest assured that your LIRR crew will keep you informed and assist you in any way necessary.

Аварийная ситуация, остальные убедили что ваш экипаж LIRR будет держать вас после того как он сообщен и помогать вы в любой дороге обязательно.

(Believe me, the Russian is a terrible mishmash of Cyrillic words that fails to even communicate even the gist of what was intended. Ultimately it is a waste of pixels.)

I went back to the site a little while back to see if any changes have been made at the site since my visit two years ago. There have.

The MTA still firmly believes in machine translation, but has switched from whatever it had been using to Google Translate. Now, for the examples cited above, no attempt is made (and properly so) to translate “Long Island Rail Road” and the translation offered for the sentence shown in the quote above has improved, though it still falls short of a colloquial, grammatically correct translation:

Независимо от чрезвычайных ситуаций, будьте уверены, что ваш LIRR экипаж будет информировать Вас и помочь Вам в любой ситуации необходимо.

The gist is there, certainly.

Though the overall trend has been toward improvement, machine translation still has a long way to go, and while there are some that feel machine translation will never amount to much, progress in a related area of artificial intelligence should not be ignored.

The idea of programming computers to play chess was depicted in broad brush strokes at roughly the same time as the idea of machine-based language translation. Both eventually became part of a coordinated interest in machine-based intelligence.

In the early years of computer chess, it was commonly thought that really successful programs would have to be programmed to replicate the way humans think in order to achieve good results. (Contrary to what you might think, human grandmasters invariably consider no more than two or three different moves in any given position, and rarely calculate exhaustive variations to any great depth.)

It turns out, however, that massive brute-force searches and refinements in the way programs evaluated chess positions were able to make up for the lack of imagination, and in fact, research into finding ways of selecting likely “candidate moves” the way humans do didn’t advance very far.

So it may very well be that we’re merely awaiting a spectacular paradigm shift in the field of machine translation – and perhaps also in the very way humans use language in an increasingly binary world – that will usher in an era of acceptable machine translation. The best information and smart money indicate that currently, the statistical approach – such as the one apparently used by Google – yields only slightly better results than traditional methods.

How long it will take machine translators to achieve the same kind of success as chess-playing algorithms, which went from playing beginner-level games in the early 1970s to beating the reigning World Champion by the end of the century?

That remains to be seen, but in my opinion, the day is still far off, if the state of machine-translation art is any indication.

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>