Index of papers in Proc. ACL 2013 that mention

TER

Seen in text as:

TER (75)

Seen in 64 sentences in 9 papers.

1. Online Relative Margin Maximization for Statistical Machine Translation

Eidelman, Vladimir and Marton, Yuval and Resnik, Philip

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We evaluate our optimizer on Chinese-English and Arabic-English translation tasks, each with small and large feature sets, and show that our learner is able to achieve significant improvements of 1.2-2 BLEU and 1.7-4.3 TER on average over state-of-the-art optimizers with the large feature set.
Additional Experiments	As can be seen in Table 4, in the smaller feature set, RM and MERT were the best performers, with the exception that on MT08, MIRA yielded somewhat better (+0.7) BLEU but a somewhat worse (-0.9) TER score than RM.
Additional Experiments	On the large feature set, RM is again the best performer, except, perhaps, a tied BLEU score with MIRA on MT08, but with a clear 1.8 TER gain.
Conclusions and Future Work	Experimentation in statistical MT yielded significant improvements over several other state-of-the-art optimizers, especially in a high-dimensional feature space (up to 2 BLEU and 4.3 TER on average).
Discussion	RM’s loss was only up to 0.8 BLEU (0.7 TER) from MERT or MIRA, while its gains were up to 1.7 BLEU and 2.1 TER over MIRA.
Discussion	Small set Large set Optimizer BLEU TER BLEU TER MERT 0.4 2.6 - -MIRA 0.5 3.0 1.4 4.3 PRO 1.4 2.9 2.0 1.7 RAMPION 0.6 1.6 1.2 2.8
Discussion	Error Analysis: The inconclusive advantage of RM over MIRA (in BLEU vs. TER scores) on Arabic-English MT08 calls for a closer look.
Experiments	As can be seen from the results in Table 3, our RM method was the best performer in all Chinese-English tests according to all measures — up to 1.9 BLEU and 6.6 TER over MIRA — even though we only optimized for BLEU.5 Surprisingly, it seems that MIRA did not benefit as much from the sparse features as RM.
Experiments	The results are especially notable for the basic feature setting — up to 1.2 BLEU and 4.6 TER improvement over MERT — since MERT has been shown to be competitive with small numbers of features compared to high-dimensional optimizers such as MIRA (Chiang et al., 2008).
Experiments	5In the small feature set RAMPION yielded similar best BLEU scores, but worse TER .

TER is mentioned in 11 sentences in this paper.

Topics mentioned in this paper:

feature set (25)
BLEU (18)
TER (11)

2. Integrating Translation Memory into Phrase-Based Machine Translation during Decoding

Wang, Kun and Zong, Chengqing and Su, Keh-Yih

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Furthermore, integrated Model-III achieves overall 3.48 BLEU points improvement and 2.62 TER points reduction in comparison with the pure SMT system.
Experiments	In this work, the translation performance is measured with case-insensitive BLEU-4 score (Papineni et al., 2002) and TER score (Snover et al., 2006).
Experiments	In the tables, the best translation results (either in BLEU or TER ) at each interval have been marked in bold.
Experiments	It can be seen that TM significantly exceeds SMT at the interval [0.9, 1.0) in TER score, which illustrates why professional translators prefer TM rather than SMT as their assistant tool.
Introduction	Compared with the pure SMT system, the proposed integrated Model-III achieves 3.48 BLEU points improvement and 2.62 TER points reduction overall.

TER is mentioned in 14 sentences in this paper.

Topics mentioned in this paper:

TER (14)
BLEU (11)
SMT system (11)

3. Two-Neighbor Orientation Model with Cross-Boundary Global Contexts

Setiawan, Hendra and Zhou, Bowen and Xiang, Bing and Shen, Libin

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	On NIST MT08 set, our most advanced model brings around +2.0 BLEU and -1.0 TER improvement.
Experiments	MT08 nw MT08 wb BLEU \ TER BLEU \ TER
Experiments	The best TER and BLEU results on each genre are in bold.
Experiments	For BLEU, higher scores are better, while for TER , lower scores are better.

TER is mentioned in 10 sentences in this paper.

Topics mentioned in this paper:

4. A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation

Liu, Yang

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	Experiments show that our approach significantly outperforms both phrase-based (Koehn et al., 2007) and string-t0-dependency approaches (Shen et al., 2008) in terms of BLEU and TER .
Introduction	\| features \| BLEU \| TER \|
Introduction	Adding dependency language model (“depLM”) and the maximum entropy shift-reduce parsing model (“maxent”) significantly improves BLEU and TER on the development set, both separately and jointly.

TER is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

5. Advancements in Reordering Models for Statistical Machine Translation

Feng, Minwei and Peter, Jan-Thorsten and Ney, Hermann

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Results on five Chinese-English NIST tasks show that our model improves the baseline system by 1.32 BLEU and 1.53 TER on average.
Conclusion	Experimental results show that our model is stable and improves the baseline system by 0.98 BLEU and 1.21 TER (trained by CRFs) and 1.32 BLEU and 1.53 TER (trained by RNN).
Experiments	0 BLEU (Papineni et al., 2001) and TER (Snover et al., 2005) reported all scores calculated in lowercase way.
Experiments	An Index column is added for score reference convenience (B for BLEU; T for TER ).
Experiments	For the proposed model, significance testing results on both BLEU and TER are reported (B2 and B3 compared to B1, T2 and T3 compared to T1).

TER is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

CRFs (19)
LM (14)
BLEU (8)

6. Incremental Topic-Based Translation Model Adaptation for Conversational Spoken Language Translation

Hewavitharana, Sanjika and Mehay, Dennis and Ananthakrishnan, Sankaranarayanan and Natarajan, Prem

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	On an English-to-Iraqi CSLT task, the proposed approach gives significant improvements over a baseline system as measured by BLEU, TER , and NIST.
Experimental Setup and Results	Table 1 summarizes test set performance in BLEU (Papineni et a1., 2001), NIST (Doddington, 2002) and TER (Snover et a1., 2006).
Experimental Setup and Results	In the ASR setting, which simulates a real-world deployment scenario, this system achieves improvements of 0.39 (BLEU), -0.6 ( TER ) and 0.08 (NIST).
Introduction	With this approach, we demonstrate significant improvements over a baseline phrase-based SMT system as measured by BLEU, TER and NIST scores on an English-to-Iraqi CSLT task.

TER is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

7. Name-aware Machine Translation

Li, Haibo and Zheng, Jing and Ji, Heng and Li, Qi and Wang, Wen

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Besides the new name-aware MT metric, we also adopt two traditional metrics, TER to evaluate the overall translation performance and Named Entity Weak Accuracy (NEWA) (Hermj akob et al., 2008) to evaluate the name translation performance.
Experiments	TER measures the amount of edits required to change a system output into one of the reference translations.
Experiments	TER = 10 average # of reference words ( )
Name-aware MT Evaluation	Traditional MT evaluation metrics such as BLEU (Papineni et al., 2002) and Translation Edit Rate ( TER ) (Snover et al., 2006) assign the same weights to all tokens equally.

TER is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

BLEU (19)
word alignment (17)
LM (12)

8. English-to-Russian MT evaluation campaign

Braslavski, Pavel and Beloborodov, Alexander and Khalilov, Maxim and Sharoff, Serge

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Evaluation methodology	In addition to human evaluation, we also ran system-level automatic evaluations using BLEU (Papineni et al., 2001), NIST (Doddington, 2002), METEOR (Banerjee and Lavie, 2005), TER (Snover et al., 2009), and GTM (Turian et al., 2003).
Results	While TER and GTM are known to provide better correlation with post-editing efforts for English (O’Brien, 2011), free word order and greater data sparseness on the sentence level makes TER much less reliable for Russian.
Results	Sentence level Corpus Metric Median Mean Trimmed level BLEU 0.357 0.298 0.348 0.833 NIST 0.357 0.291 0.347 0.810 Meteor 0.429 0.348 0.393 0.714 TER 0.214 0.186 0.204 0.619 GTM 0.429 0.340 0.392 0.714

TER is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

9. The effect of non-tightness on Bayesian estimation of PCFGs

Cohen, Shay B. and Johnson, Mark

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Bayesian inference for PCFGs	OC 6£r(t)> ego—1) TER TER
Bayesian inference for PCFGs	TER
PCFGs and tightness	TER

TER is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: