
Lost in Translation – statistical learning isn’t panacea
BBC Future published an excellent feature article on the status quo of machine translation.
Another good example of our naiveness, we think we can eventually achieve our goal – bringing peoples closer – through technology. For centuries we’ve been working on that, telephones, Skype, FB, and now Google translation. History has proved the futile nature of these kind attempts. The sarcastic tone at the end, originated from the Hitchhiker’s Guide to the Galaxy, sounds well grounded.
What the BBC article focuses on is a criticism of the current approach of statistical learning and its challenges it raises, including the vicious loop Google may end up falling into when using massive web text to train its translator: that it might be learning on its own translation (inaccurate and unnatural, shown by our experience, as well as the now infamous Malaysian website).
Natural language – modeling and manipulation (including translation and understanding) was a hot topic in the early days of AI, and has regained its status since the advances on statistical learning – and I argue it will probably remain a challenging problem deep into an unforeseeable future. Perhaps a real breakthrough can only occur when data such as our speech and pictures are truly conceptually grasped by a computer, when it not only does numbers crunching, logical reasoning, but also understands emotions – they will need to have emotions so as to be empathetic?
Do we really need that colossal amount of information – as much as Google now has – to master a language, or two? Is it possible to build a life-long learning machine, much like an infant (who grows into maturity)? Our speech is shaped by memory for sure, but has its own dynamics and uncertainties. Will a largely chaotic system that combines memory, pattern recognition, reasoning, and more “mental” processes together, be able to make a better translator?
Machine learning has come a long way, but the dawn is still not there yet.


