What is machine translation?
MachineTranslation ( MT) is the use of intelligent software capable of translating large amounts of source data into various languages. Broadly speaking, there are three main categories of machine translation:
The so-called “generic” machine translation is a solution designed for non-specialized texts, and refers to translation engines such as Google’s that translate written text from one language to another, but that are not focused on any area of specialisation. This type of translation is used by personal users or companies to instantly translate very short texts. Generic machine translation tends to contain more grammatical and syntactical errors and is much less accurate than “custom” machine translation.
“Customized” machine translation requires that the translation software be “trained” or adapted to recognize expressions specific to a field, industry or company. Using basic statistics or rule-based translation technologies, “custom” machine translation offers companies a high level of accuracy with large word counts. However, even in cases where these engines are coupled with pre-processing of data and a friendly user interface, the quality of the translation may be compromised by the limitations of the translation technology used.
“Corporate” machine translation seems to be the next generation of “augmented” machine translation engines. These machines used a sophisticated technology that reproduces style, formatting and terminology more faithfully than other solutions. “Corporate” machine translation meets the needs of today’s international business, such as high volume and high speed localisation and real-time multilingual communications.
Machine Translation: “General”, “Custom” or “Corporate” Machine Translation? While the three categories described may use statistically based engines, there are significant differences in performance. “Generic” machine translation engines include huge amounts of data in the hope that quality will improve over time. They do not have technologies that allow syntactic or stylistic subtleties of the company’s corporate language to be added to the translations. These are tailor-made solutions that “teach” or “feed” engines and usually provide better quality translations, and such quality depends directly on the quality of the input data. They also have limitations due to the “data dilution” effect and the machine translation technology used.
By comparison, “corporate” machine translation, thanks to these augmented technologies, is designed to overcome some of the limitations of statistically based engines, such as formatting problems or problems of compliance with the style guidelines set by the company.
What are the main types of machine translation?
Rule-based machine translation systems use a set of rules developed manually by experts that reflect the structures of the source language in the target language. The human factor in rule-based systems helps to provide fairly good machine translations with predictable results. However, this costly manual work makes rule-based systems very expensive and very complex to implement and maintain. As rules are added and updated, these systems risk generating ambiguities and the quality of their translations get worse.
Statistical machine translation systems use computer algorithms to produce the best possible statistical translation from millions of combinations. The statistical models are made up of words and phrases learned automatically from bilingual segments with which bilingual databases of translations are created. The good side of statistical systems lies in the level of automation, as new engines are generated that take advantage of their learning capacity. The result is fast response times and low cost for the implementation and management of these statistical models. The major drawback of this type of engine is the “data dilution” effect caused by the scarcity of data to “feed” or “train” data-based systems.
Hybrid Machine Translation. To address quality and time-to-market limitations, many rule-based engine developers are improving their technology by combining it with statistical machine translation, and have created hybrid solutions. Hybrid systems offer considerable quality improvements, but the costs of rule-based systems are expensive. Added to this is the complexity of combining both systems.
The new generation. The new “augmented” machine translation engines are improving the possibilities of statistical machine translation and overcoming its limitations. These new statistical machine translation solutions introduce sophisticated data pre-processing (language transformation) systems, language optimisation technologies and terminology management solutions. This type of engine aims to incorporate the quality improvements available in hybrid systems.
Who is machine translation for?
Today, machine translation is spreading among users and companies. Many consumers use it for on-the-spot translations, and large multinationals incorporate it to communicate with customers and employees located anywhere in the world.
The truth is that machine translation is really useful for large multinational companies that need to translate really huge amounts of content, such as user manuals and complex multilingual websites that continuously update their content. Large corporations starting a localization process use machine translation to regularly localise content into several languages. Other companies offering user help and support through chats or social networking sites use machine translation to provide a more customised user experience.
Machine translation is also used by some translation agencies, always in combination with post-editing and proofreading by human translators.
Is machine translation “good enough”?
We should ask oursevelses, good enough for what? In most cases, and if the quality of machine translation is good enough, a post-editing stage by a human translator will be required. For example, if the content is to be published, it is essential that the result of the automatic translation be reviewed and edited by a human. In other cases, such as real-time communications and translations we need to get an idea of what is being talked about, an “understandable” machine translation will be good enough, and there will be no time or means for a human to edit the translation.
Corporate machine translation, focused on large corporations with a more or less controlled language and predefined style guidelines, is capable of producing machine translations of sufficient quality in which the human factor can become minimal, thus improving the cost-effectiveness ratio. Augmented machine translation technologies are capable of producing translations specific to a particular company, market and brand. This is done by using rule-based and statistical machine translation, by aligning large volumes of translations and content translated by the company itself. Obviously, the use of these translation engines is not cheap.