SciSurf: Index of 'Learning to Translate with Multiple Objectives'

Topics

BLEU (22)
NIST (7)
evaluation metric (6)
translation quality (5)
MT systems (4)
TER (4)
weight vector (3)

Topics

BLEU (22)
NIST (7)
evaluation metric (6)
translation quality (5)
MT systems (4)
TER (4)
weight vector (3)

Learning to Translate with Multiple Objectives

Duh, Kevin and Sudoh, Katsuhito and Wu, Xianchao and Tsukada, Hajime and Nagata, Masaaki

Published in Proc. ACL, 2012

Article Structure

Abstract

We introduce an approach to optimize a machine translation (MT) system on multiple metrics simultaneously.

Introduction

Weight optimization is an important step in building machine translation (MT) systems.

Theory of Pareto Optimality 2.1 Definitions and Concepts

The idea of Pareto optimality comes originally from economics (Pareto, 1906), where the goal is to characterize situations when a change in allocation of goods does not make anybody worse off.

Multi-objective Algorithms

3.1 Computing the Pareto Frontier

Experiments

4.1 Evaluation Methodology

Related Work

Multi-objective optimization for MT is a relatively new area.

Opportunities and Limitations

We introduce a new approach (PMO) for training MT systems on multiple metrics.

Topics

BLEU

Appears in 22 sentences as: BLEU (22) bleu (2)

In Learning to Translate with Multiple Objectives

BLEU , TER) focus on different aspects of translation quality; our multi-objective approach leverages these diverse aspects to improve overall quality.
Page 1, “Abstract”
These methods are effective because they tune the system to maximize an automatic evaluation metric such as BLEU , which serve as surrogate objective for translation quality.
Page 1, “Introduction”
However, we know that a single metric such as BLEU is not enough.
Page 1, “Introduction”
For example, while BLEU (Papineni et al., 2002) focuses on word-based n-gram precision, METEOR (Lavie and Agarwal, 2007) allows for stem/synonym matching and incorporates recall.
Page 1, “Introduction”
Can we really claim that a system is good if it has high BLEU , but very low METEOR?
Page 1, “Introduction”
Experiments on NIST Chinese-English and PubMed English-Japanese translation using BLEU , TER, and RIBES are presented in Section 4.
Page 2, “Introduction”
For example, suppose K = 2, M1(h) computes the BLEU score, and M2(h) gives the METEOR score of h. Figure 1 illustrates the set of vectors {M in a lO-best list.
Page 2, “Theory of Pareto Optimality 2.1 Definitions and Concepts”
If we had used BLEU scores rather than the {0,1} labels in line 8, the entire PMO-PRO algorithm would revert to single-objective PRO.
Page 4, “Multi-objective Algorithms”
As metrics we use BLEU and RIBES (which demonstrated good human correlation in this language pair (Goto et al., 2011)).
Page 5, “Experiments”
As metrics we use BLEU and NTER.
Page 5, “Experiments”
o BLEU = BP >< (Hprecn)1/4.
Page 5, “Experiments”

See all papers in Proc. ACL 2012 that mention BLEU.

See all papers in Proc. ACL that mention BLEU.

Back to top.

NIST

Appears in 7 sentences as: NIST (7)

In Learning to Translate with Multiple Objectives

Experiments on NIST Chinese-English and PubMed English-Japanese translation using BLEU, TER, and RIBES are presented in Section 4.
Page 2, “Introduction”
(2) The NIST task is Chinese-to-English translation with OpenMT08 training data and MT06 as devset.
Page 5, “Experiments”
Train Devset #Feat Metrics PubMed 0.2M 2k 14 BLEU, RIBES NIST 7M 1.6k 8 BLEU,NTER
Page 6, “Experiments”
Our MT models are trained with standard phrase-based Moses software (Koehn and others, 2007), with IBM M4 alignments, 4gram SRILM, leXical ordering for PubMed and distance ordering for the NIST system.
Page 6, “Experiments”
Figures 2 and 3 show the results for PubMed and NIST , respectively.
Page 6, “Experiments”
bleu Figure 3: NIST Results
Page 6, “Experiments”
- p - NIST —9— PubMed I” 30 ,V -B b .E p’ O I n- 25 I, o *5 x“ a )’ n. 20 >’ “5 >- > L I 8 15* ’r) g .’ Z 10—
Page 7, “Experiments”

See all papers in Proc. ACL 2012 that mention NIST.

See all papers in Proc. ACL that mention NIST.

Back to top.

evaluation metric

Appears in 6 sentences as: evaluation metric (3) evaluation metrics (3)

In Learning to Translate with Multiple Objectives

These methods are effective because they tune the system to maximize an automatic evaluation metric such as BLEU, which serve as surrogate objective for translation quality.
Page 1, “Introduction”
While many alternatives have been proposed, such a perfect evaluation metric remains elusive.
Page 1, “Introduction”
As a result, many MT evaluation campaigns now report multiple evaluation metrics (Callison—Burch et al., 2011; Paul, 2010).
Page 1, “Introduction”
Different evaluation metrics focus on different aspects of translation quality.
Page 1, “Introduction”
If a good evaluation metric could not be used for tuning, it would be a pity.
Page 8, “Related Work”
Leveraging the diverse perspectives of different evaluation metrics has the potential to improve overall quality.
Page 8, “Opportunities and Limitations”

See all papers in Proc. ACL 2012 that mention evaluation metric.

See all papers in Proc. ACL that mention evaluation metric.

Back to top.

translation quality

Appears in 5 sentences as: translation quality (5)

In Learning to Translate with Multiple Objectives

BLEU, TER) focus on different aspects of translation quality ; our multi-objective approach leverages these diverse aspects to improve overall quality.
Page 1, “Abstract”
These methods are effective because they tune the system to maximize an automatic evaluation metric such as BLEU, which serve as surrogate objective for translation quality .
Page 1, “Introduction”
Ideally, we want to tune towards an automatic metric that has perfect correlation with human judgments of translation quality .
Page 1, “Introduction”
Different evaluation metrics focus on different aspects of translation quality .
Page 1, “Introduction”
We want to build a MT system that does well with respect to many aspects of translation quality .
Page 1, “Introduction”

See all papers in Proc. ACL 2012 that mention translation quality.

See all papers in Proc. ACL that mention translation quality.

Back to top.

MT systems

Appears in 4 sentences as: MT system (1) MT systems (2) MT system’s (1)

In Learning to Translate with Multiple Objectives

Discrimi-native optimization methods such as MERT (Och, 2003), MIRA (Crammer et al., 2006), PRO (Hopkins and May, 2011), and Downhill-Simplex (Nelder and Mead, 1965) have been influential in improving MT systems in recent years.
Page 1, “Introduction”
We want to build a MT system that does well with respect to many aspects of translation quality.
Page 1, “Introduction”
Here, the MT system’s Decode function, parameterized by weight vector w, takes in a foreign sentence f and returns a translated hypothesis h. The argmax operates in vector space and our goal is to find to leading to hypotheses on the Pareto Frontier.
Page 3, “Theory of Pareto Optimality 2.1 Definitions and Concepts”
We introduce a new approach (PMO) for training MT systems on multiple metrics.
Page 8, “Opportunities and Limitations”

See all papers in Proc. ACL 2012 that mention MT systems.

See all papers in Proc. ACL that mention MT systems.

Back to top.

TER

Appears in 4 sentences as: TER (3) tered (1)

In Learning to Translate with Multiple Objectives

BLEU, TER ) focus on different aspects of translation quality; our multi-objective approach leverages these diverse aspects to improve overall quality.
Page 1, “Abstract”
TER (Snover et al., 2006) allows arbitrary chunk movements, while permutation metrics like RIBES (Isozaki et al., 2010; Birch et al., 2010) measure deviation in word order.
Page 1, “Introduction”
Experiments on NIST Chinese-English and PubMed English-Japanese translation using BLEU, TER , and RIBES are presented in Section 4.
Page 2, “Introduction”
tered are necessarily pareto-optimal.
Page 4, “Multi-objective Algorithms”

See all papers in Proc. ACL 2012 that mention TER.

See all papers in Proc. ACL that mention TER.

Back to top.

weight vector

Appears in 3 sentences as: weight vector (2) weight vectors (1)

In Learning to Translate with Multiple Objectives

Here, the MT system’s Decode function, parameterized by weight vector w, takes in a foreign sentence f and returns a translated hypothesis h. The argmax operates in vector space and our goal is to find to leading to hypotheses on the Pareto Frontier.
Page 3, “Theory of Pareto Optimality 2.1 Definitions and Concepts”
For each sentence pair (f, e) in the devset, we first generate an N-best list L E {h} using the current weight vector w (line 5).
Page 4, “Multi-objective Algorithms”
Input: Devset, max number of iterations I Output: A set of (pareto-optimal) weight vectors 1: Initialize 111.
Page 5, “Multi-objective Algorithms”

See all papers in Proc. ACL 2012 that mention weight vector.

See all papers in Proc. ACL that mention weight vector.

Back to top.