- Главная
- Английский язык
- Machine translation
Содержание
- 2. Machine translation Machine translation, sometimes referred to by the abbreviation MT is a sub-field of computational
- 4. History The idea of machine translation may be traced back to the 17th century. In 1629,
- 5. MT on the web started with SYSTRAN Offering free translation of small texts (1996), followed by
- 6. René Descartes
- 7. Translation process The human translation process may be described as: 1)Decoding the meaning of the source
- 9. Approaches Machine translation can use a method based on linguistic rules, which means that words will
- 10. Approaches Rule-based Transfer-based machine translation Interlingual Dictionary-based Statistical Example-based Hybrid MT Neural MT
- 11. Applications While no system provides the holy grail of fully automatic high-quality machine translation of unrestricted
- 12. With the recent focus on terrorism, the military sources in the United States have been investing
- 13. Evaluation There are many factors that affect how machine translation systems are evaluated. These factors include
- 14. Relying exclusively on unedited machine translation ignores the fact that communication in human language is context-embedded
- 15. Using machine translation as a teaching tool Although there have been concerns about machine translation's accuracy,
- 17. Скачать презентацию
Machine translation
Machine translation, sometimes referred to by the abbreviation MT is a sub-field
Machine translation
Machine translation, sometimes referred to by the abbreviation MT is a sub-field
On a basic level, MT performs simple substitution of words in one language for words in another, but that alone usually cannot produce a good translation of a text because recognition of whole phrases and their closest counterparts in the target language is needed. Solving this problem with corpus statistical, and neural techniques is a rapidly growing field that is leading to better translations, handling differences in linguistic typology, translation of idioms, and the isolation of anomalies.
Current machine translation software often allows for customization by domain or profession (such as weather reports), improving output by limiting the scope of allowable substitutions. This technique is particularly effective in domains where formal or formulaic language is used. It follows that machine translation of government and legal documents more readily produces usable output than conversation or less standardised text.
Improved output quality can also be achieved by human intervention: for example, some systems are able to translate more accurately if the user has unambiguously identified which words in the text are proper names. With the assistance of these techniques, MT has proven useful as a tool to assist human translators and, in a very limited number of cases, can even produce output that can be used as is (e.g., weather reports).
The progress and potential of machine translation have been debated much through its history. Since the 1950s, a number of scholars have questioned the possibility of achieving fully automatic machine translation of high quality. Some critics claim that there are in-principle obstacles to automating the translation process.
History
The idea of machine translation may be traced back to the
History
The idea of machine translation may be traced back to the
The French Textile Institute also used MT to translate abstracts from and into French, English, German and Spanish (1970); Brigham Young University started a project to translate Mormon texts by automated translation (1971); and Xerox used SYSTRAN to translate technical manuals (1978). Beginning in the late 1980s, as computational power increased and became less expensive, more interest was shown in statistical models for machine translation. Various MT companies were launched, including Trados (1984), which was the first to develop and market translation memory technology (1989). The first commercial MT system for Russian / English / German-Ukrainian was developed at Kharkov State University (1991).
MT on the web started with SYSTRAN Offering free translation of
MT on the web started with SYSTRAN Offering free translation of
The idea of using digital computers for translation of natural languages was proposed as early as 1946 by A. D. Booth and possibly others. Warren Weaver wrote an important memorandum "Translation" in 1949. The Georgetown experiment was by no means the first such application, and a demonstration was made in 1954 on the APEXC machine at Birkbeck College (University of London) of a rudimentary translation of English into French. Several papers on the topic were published at the time, and even articles in popular journals (see for example Wireless World, Sept. 1955, Cleave and Zacharov). A similar application, also pioneered at Birkbeck College at the time, was reading and composing Braille texts by computer.
René Descartes
René Descartes
Translation process
The human translation process may be described as:
1)Decoding the meaning of the source text; and
2)Re-encoding this meaning in the
Translation process
The human translation process may be described as:
1)Decoding the meaning of the source text; and
2)Re-encoding this meaning in the
Behind this ostensibly simple procedure lies a complex cognitive operation. To decode the meaning of the source text in its entirety, the translator must interpret and analyse all the features of the text, a process that requires in-depth knowledge of the grammar, semantics, syntax, idioms, etc., of the source language, as well as the culture of its speakers. The translator needs the same in-depth knowledge to re-encode the meaning in the target language.
Therein lies the challenge in machine translation: how to program a computer that will "understand" a text as a person does, and that will "create" a new text in the target language that "sounds" as if it has been written by a person.
In its most general application, this is beyond current technology. Though it works much faster, no automated translation program or procedure, with no human participation, can produce output even close to the quality a human translator can produce. What it can do, however, is provide a general, though imperfect, approximation of the original text, getting the "gist" of it (a process called "gisting"). This is sufficient for many purposes, including making best use of the finite and expensive time of a human translator, reserved for those cases in which total accuracy is indispensable.
This problem may be approached in a number of ways, through the evolution of which accuracy has improved.
Approaches
Machine translation can use a method based on linguistic rules, which means
Approaches
Machine translation can use a method based on linguistic rules, which means
It is often argued that the success of machine translation requires the problem of natural language understanding to be solved first.
Generally, rule-based methods parse a text, usually creating an intermediary, symbolic representation, from which the text in the target language is generated. According to the nature of the intermediary representation, an approach is described as interlingual machine translation or transfer-based machine translation. These methods require extensive lexicons with morphological, syntactic, and semantic information, and large sets of rules.
Given enough data, machine translation programs often work well enough for a native speaker of one language to get the approximate meaning of what is written by the other native speaker. The difficulty is getting enough data of the right kind to support the particular method. For example, the large multilingual corpus of data needed for statistical methods to work is not necessary for the grammar-based methods. But then, the grammar methods need a skilled linguist to carefully design the grammar that they use.
To translate between closely related languages, the technique referred to as rule-based machine translation may be used.
Approaches
Rule-based
Transfer-based machine translation
Interlingual
Dictionary-based
Statistical
Example-based
Hybrid MT
Neural MT
Approaches
Rule-based
Transfer-based machine translation
Interlingual
Dictionary-based
Statistical
Example-based
Hybrid MT
Neural MT
Applications
While no system provides the holy grail of fully automatic high-quality
Applications
While no system provides the holy grail of fully automatic high-quality
Despite their inherent limitations, MT programs are used around the world. Probably the largest institutional user is the European Commission. The MOLTO project, for example, coordinated by the University of Gothenburg, received more than 2.375 million euros project support from the EU to create a reliable translation tool that covers a majority of the EU languages. The further development of MT systems comes at a time when budget cuts in human translation may increase the EU's dependency on reliable MT programs. The European Commission contributed 3.072 million euros (via its ISA programme) for the creation of MT@EC, a statistical machine translation program tailored to the administrative needs of the EU, to replace a previous rule-based machine translation system.
Google has claimed that promising results were obtained using a proprietary statistical machine translation engine. The statistical translation engine used in the Google language tools for Arabic <-> English and Chinese <-> English had an overall score of 0.4281 over the runner-up IBM's BLEU-4 score of 0.3954 (Summer 2006) in tests conducted by the National Institute for Standards and Technology.
With the recent focus on terrorism, the military sources in the
With the recent focus on terrorism, the military sources in the
The notable rise of social networking on the web in recent years has created yet another niche for the application of machine translation software – in utilities such as Facebook, or instant messaging clients such as Skype, GoogleTalk, MSN Messenger, etc. – allowing users speaking different languages to communicate with each other. Machine translation applications have also been released for most mobile devices, including mobile telephones, pocket PCs, PDAs, etc. Due to their portability, such instruments have come to be designated as mobile translation tools enabling mobile business networking between partners speaking different languages, or facilitating both foreign language learning and unaccompanied traveling to foreign countries without the need of the intermediation of a human translator.
Despite being labelled as an unworthy competitor to human translation in 1966 by the Automated Language Processing Advisory Committee put together by the United States government,the quality of machine translation has now been improved to such levels that its application in online collaboration and in the medical field are being investigated. In the Ishida and Matsubara lab of Kyoto University, methods of improving the accuracy of machine translation as a support tool for inter-cultural collaboration in today's globalized society are being studied.The application of this technology in medical settings where human translators are absent is another topic of research however difficulties arise due to the importance of accurate translations in medical diagnoses.
Evaluation
There are many factors that affect how machine translation systems are
Evaluation
There are many factors that affect how machine translation systems are
Different programs may work well for different purposes. For example, statistical machine translation (SMT) typically outperforms example-based machine translation (EBMT), but researchers found that when evaluating English to French translation, EBMT performs better.[48] The same concept applies for technical documents, which can be more easily translated by SMT because of their formal language.
In certain applications, however, e.g., product descriptions written in a controlled language, a dictionary-based machine-translation system has produced satisfactory translations that require no human intervention save for quality inspection.[49]
There are various means for evaluating the output quality of machine translation systems. The oldest is the use of human judges[50] to assess a translation's quality. Even though human evaluation is time-consuming, it is still the most reliable method to compare different systems such as rule-based and statistical systems.[51] Automated means of evaluation include BLEU, NIST, METEOR, and LEPOR.[52]
Relying exclusively on unedited machine translation ignores the fact that communication
Relying exclusively on unedited machine translation ignores the fact that communication
In addition to disambiguation problems, decreased accuracy can occur due to varying levels of training data for machine translating programs. Both example-based and statistical machine translation rely on a vast array of real example sentences as a base for translation, and when too many or too few sentences are analyzed accuracy is jeopardized. Researchers found that when a program is trained on 203,529 sentence pairings, accuracy actually decreases.[48] The optimal level of training data seems to be just over 100,000 sentences, possibly because as training data increasing, the number of possible sentences increases, making it harder to find an exact translation match.
Using machine translation as a teaching tool[edit]
Using machine translation as a teaching tool
Although there have been concerns
Using machine translation as a teaching tool
Although there have been concerns