Index of papers in Proc. ACL 2009 that mention
  • TER
Galley, Michel and Manning, Christopher D.
Abstract
Our results show that augmenting a state-of-the-art phrase-based system with this dependency language model leads to significant improvements in TER (0.92%) and BLEU (0.45%) scores on five NIST Chinese-English evaluation test sets.
Conclusion and future work
We use dependency scores as an extra feature in our MT experiments, and found that our dependency model provides significant gains over a competitive baseline that incorporates a large 5-gram language model (0.92% TER and 0.45% BLEU absolute improvements).
Introduction
In our experiments, we build a competitive baseline (Koehn et al., 2007) incorporating a 5-gram LM trained on a large part of Gigaword and show that our dependency language model provides improvements on five different test sets, with an overall gain of 0.92 in TER and 0.45 in BLEU scores.
Machine translation experiments
In the final evaluations, we report results using both TER (Snover et al., 2006) and the original BLEU metric as described in (Papineni et al., 2001).
Machine translation experiments
For BLEU evaluations, differences are significant in four out of six cases, and in the case of TER , all differences are significant.
Machine translation experiments
On the other hand, the difference on MT08 is significant in terms of TER .
TER is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Pado, Sebastian and Galley, Michel and Jurafsky, Dan and Manning, Christopher D.
Abstract
We compare this metric against a combination metric of four state—of—the—art scores (BLEU, NIST, TER , and METEOR) in two different settings.
Experimental Evaluation
We therefore verified that the three nontrivial “baseline” regression models indeed confer a benefit over the default component combination scores: BLEU—1 (which outperformed BLEU-4 in the MetricsMATR 2008 evaluation), NIST-4, and TER (with all costs set to 1).
Experimental Evaluation
We start with the standard TER score and the number of each of the four edit operations.
Introduction
A number of metrics have been designed to account for paraphrase, either by making the matching more intelligent ( TER , Snover et al.
TER is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Li, Mu and Duan, Nan and Zhang, Dongdong and Li, Chi-Ho and Zhou, Ming
Experiments
Actually we find that the TER score between two member decoders’ outputs are significantly reduced (as shown in Table 3), which indicates that the outputs become more similar due to the use of consensus information.
Experiments
For example, the TER score between SYS2 and SYS3 of the NIST 2008 outputs are reduced from 0.4238 to 0.2665.
Experiments
Table 3: TER scores between co-decoding translation outputs
TER is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: