Achieving Human Parity on Automatic Chinese to English News Translation

arXiv:1803.05567

Publication

Machine translation has made rapid advances in recent years. Millions of people are using it today in online translation systems and mobile applications in order to communicate across language barriers. The question naturally arises whether such systems can approach or achieve parity with human translations. In this paper, we first address the problem of how to define and accurately measure human parity in translation. We then describe Microsoft’s machine translation system and measure the quality of its translations on the widely used WMT 2017 news translation task from Chinese to English. We find that our latest neural machine translation system has reached a new state-of-the-art, and that the translation quality is at human parity when compared to professional human translations. We also find that it significantly exceeds the quality of crowd-sourced non-professional translations.

Publication Downloads

Translator Human Parity Data

March 14, 2018

Human evaluation results and translation output for the Translator Human Parity Data release, as described in this blog post. The Translator Human Parity Data release contains all human evaluation results and translations related to our paper "Achieving Human Parity on Automatic Chinese to English News Translation", published on March 14, 2018. We have released this data to 1) allow external validation of our claim of having achieved human parity and 2) to foster future research by releasing two additional human references for the Reference-WMT test set. The package includes 1) two new references for newstest2017, one based on human translation from scratch (Reference-HT), the other based on human post-editing (Reference-PE); 2) human parity translations generated by our research systems Combo-4, Combo-5, and Combo-6, as well as translation output from online machine translation service Online-A-1710, collected on October 16, 2017; and 3) all data points collected in our human evaluation campaigns. This includes annotations for Subset-1, Subset-2, Subset-3, and Subset-4. We share the (anonymized) annotator IDs, segment IDs, system IDs, type ID (either TGT or CHK, the second being a repeated judgment for the first), raw scores r in [0,100], as well as annotation start and end times. Additionally, we share the combined data for Meta-1 campaign on Subset-1.

Found in Translation: Achieving Human Parity on Chinese to English News Translation

Machine translation has made rapid advances in recent years. In this talk we describe recent advances of Microsoft's machine translation system using Neural Machine Translation that lead to achieving a new state-of-the-art, and achieving human parity when compared to professional human translations. We will discuss the technical contributions, results and how we defined and accurately measured human parity in translation.