Consistent Human Evaluation of Machine Translation across Language Pairs

2205.08533

Authors

Janice Lam,Francisco Guzman,Mona Diab,Philipp Koehn,Daniel Licht

Abstract

Obtaining meaningful quality scores for machine translation systems through human evaluation remains a challenge given the high variability between human evaluators, partly due to subjective expectations for translation quality for different language pairs. We propose a new metric called XSTS that is more focused on semantic equivalence and a cross-lingual calibration method that enables more consistent assessment.

We demonstrate the effectiveness of these novel contributions in large scale evaluation studies across up to 14 language pairs, with translation both into and out of English.

Resources

View on Hugging Face Read PDF ArXiv

Stay in the loop

Every AI paper that matters, free in your inbox daily.

Tools

Details

takara.ai
Custom AI and machine learning from the Frontier Research Team.
Content is sourced from third-party publications.