Index of papers in Proc. ACL 2009 that mention

BLEU score

Seen in text as:

BLEU score (62)
BLEU scores (25)
BLEU Score (4)

Seen in 88 sentences in 13 papers.

1. Revisiting Pivot Language Approach for Machine Translation

Wu, Hua and Wang, Haifeng

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Discussion	Table 6: CRR translation results ( BLEU scores ) by using different RBMT systems
Discussion	The BLEU scores are 43.90 and 29.77 for System A and System B, respectively.
Discussion	If we compare the results with those only using SMT systems as described in Table 3, the translation quality was greatly improved by at least 3 BLEU scores , even if the translation ac-
Experiments	Translation quality was evaluated using both the BLEU score proposed by Papineni et al.
Experiments	The results also show that our translation selection method is very effective, which achieved absolute improvements of about 4 and l BLEU scores on CRR and ASR inputs, respectively.
Experiments	As compared with those in Table 3, the translation quality was greatly improved, with absolute improvements of at least 5.1 and 3.9 BLEU scores on CRR and ASR inputs for system combination results.
Translation Selection	In this paper, we modify the method in Albrecht and Hwa (2007) to only prepare human reference translations for the training examples, and then evaluate the translations produced by the subject systems against the references using BLEU score (Papineni et al., 2002).
Translation Selection	We use smoothed sentence-level BLEU score to replace the human assessments, where we use additive smoothing to avoid zero BLEU scores when we calculate the n-gram precisions.
Translation Selection	In the context of translation selection, 3/ is assigned as the smoothed BLEU score .

BLEU score is mentioned in 14 sentences in this paper.

Topics mentioned in this paper:

2. Joint Decoding with Multiple Translation Models

Liu, Yang and Mi, Haitao and Feng, Yang and Liu, Qun

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion	As our decoder accounts for multiple derivations, we extend the MERT algorithm to tune feature weights with respect to BLEU score for max-translation decoding.
Experiments	Table 2: Comparison of individual decoding 21111 onds/sentence) and BLEU score (case-insensitive).
Experiments	With conventional max-derivation decoding, the hierarchical phrase-based model achieved a BLEU score of 30.11 on the test set, with an average decoding time of 40.53 seconds/sentence.
Experiments	We found that accounting for all possible derivations in max-translation decoding resulted in a small negative effect on BLEU score (from 30.11 to 29.82), even though the feature weights were tuned with respect to BLEU score .
Introduction	0 As multiple derivations are used for finding optimal translations, we extend the minimum error rate training (MERT) algorithm (Och, 2003) to tune feature weights with respect to BLEU score for max-translation decoding (Section 4).

BLEU score is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

3. Phrase-Based Statistical Machine Translation as a Traveling Salesman Problem

Zaslavskiy, Mikhail and Dymetman, Marc and Cancedda, Nicola

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion	BLEU score 0 iv A (A)
Experiments	BLEU score
Experiments	In Figure 5b, we report the BLEU score of the reordered sentences in the test set relative to the original reference sentences.
Experiments	Figure 6 presents Decoder and Bleu scores as functions of time for the two corpuses.
Future Work	BLEU score

BLEU score is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

language model (14)
BLEU (11)
LM (11)

4. Efficient Minimum Error Rate Training and Minimum Bayes-Risk Decoding for Translation Hypergraphs and Lattices

Kumar, Shankar and Macherey, Wolfgang and Dyer, Chris and Och, Franz

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	MERT is then performed to optimize the BLEU score on a development set; For MERT, we use 40 random initial parameters as well as parameters computed using corpus based statistics (Tromble et al., 2008).
Experiments	We consider a BLEU score difference to be a) gain if is at least 0.2 points, b) drop if it is at most -0.2 points, and c) no change otherwise.
Experiments	When MBR does not produce a higher BLEU score relative to MAP on the development set, MERT assigns a higher weight to this feature function.
Introduction	Lattice MBR decoding uses a linear approximation to the BLEU score (Pap-ineni et al., 2001); the weights in this linear loss are set heuristically by assuming that n-gram pre-cisions decay exponentially with n. However, this may not be optimal in practice.
Introduction	We employ MERT to select these weights by optimizing BLEU score on a development set.
Introduction	In contrast, our MBR algorithm directly selects the hypothesis in the hypergraph with the maximum expected approximate corpus BLEU score (Tromble et al., 2008).
MERT for MBR Parameter Optimization	We now have a total of N +2 feature functions which we optimize using MERT to obtain highest BLEU score on a training set.
Minimum Bayes-Risk Decoding	(2008) extended MBR decoding to translation lattices under an approximate BLEU score .

BLEU score is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

n-gram (21)
BLEU (20)
phrase-based (11)

5. Variational Decoding for Statistical Machine Translation

Li, Zhifei and Eisner, Jason and Khudanpur, Sanjeev

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Results	Table l: BLEU scores for Viterbi, Crunching, MBR, and variational decoding.
Experimental Results	Table 1 presents the BLEU scores under Viterbi, crunching, MBR, and variational decoding.
Experimental Results	Moreover, a bigram (i.e., “2gram”) achieves the best BLEU scores among the four different orders of VMs.

BLEU score is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

n-gram (32)
Viterbi (23)
BLEU (15)

6. Improving Tree-to-Tree Translation with Packed Forests

Liu, Yang and Lü, Yajuan and Liu, Qun

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Table 3: Comparison of BLEU scores for tree-based and forest-based tree-to-tree models.
Experiments	Table 3 shows the BLEU scores of tree-based and forest-based tree-to-tree models achieved on the test set over different pruning thresholds.
Experiments	With the increase of the number of rules used, the BLEU score increased accordingly.

BLEU score is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

7. Active Learning for Multilingual Statistical Machine Translation

Haffari, Gholamreza and Sarkar, Anoop

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

AL-SMT: Multilingual Setting	The translation quality is measured by TQ for individual systems M Fd_, E; it can be BLEU score or WEM’ER (Word error rate and position independent WER) which induces a maximization or minimization problem, respectively.
AL-SMT: Multilingual Setting	This process is continued iteratively until a certain level of translation quality is met (we use the BLEU score , WER and PER) (Papineni et al., 2002).
Experiments	The number of weights 2121- is 3 plus the number of source languages, and they are trained using minimum error-rate training (MERT) to maximize the BLEU score (Och, 2003) on a development set.
Experiments	Avg BLEU Score
Experiments	Avg BLEU Score

BLEU score is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

8. Dependency Based Chinese Sentence Realization

He, Wei and Wang, Haifeng and Guo, Yuqing and Liu, Ting

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Trained on 8,975 dependency structures of a Chinese Dependency Treebank, the realizer achieves a BLEU score of 0.8874.
Experiments	In addition to BLEU score , percentage of exactly matched sentences and average NIST simple string accuracy (SSA) are adopted as evaluation metrics.
Experiments	We observe that the BLEU score is boosted from 0.1478 to 0.5943 by using the RPD method.
Experiments	All of the four feature functions we have tested achieve considerable improvement in BLEU scores .
Log-linear Models	BLEU score , a method originally proposed to automatically evaluate machine translation quality (Papineni et al., 2002), has been widely used as a metric to evaluate general-purpose sentence generation (Langkilde, 2002; White et al., 2007; Guo et al.
Log-linear Models	3 The BLEU scoring script is supplied by NIST Open Machine Translation Evaluation at ftp://iaguarncsl.nist.gov/mt/resources/mteval-vl lb.pl

BLEU score is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

9. Incorporating Information Status into Generation Ranking

Cahill, Aoife and Riester, Arndt

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We show that it achieves a statistically significantly higher BLEU score than the baseline system without these features.
Conclusions	In comparison to a baseline model, we achieve statistically significant improvement in BLEU score .
Generation Ranking Experiments	We evaluate the string chosen by the log-linear model against the original treebank string in terms of exact match and BLEU score (Papineni et al.,
Generation Ranking Experiments	The difference in BLEU score between the model of Cahill et al.

BLEU score is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

10. A Syntax-Free Approach to Japanese Sentence Compression

Hirao, Tsutomu and Suzuki, Jun and Isozaki, Hideki

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Evaluation	For MCE learning, we selected the reference compression that maximize the BLEU score (Pap-ineni et al., 2002) (2 argmaxreRBLEUO‘, R\7“)) from the set of reference compressions and used it as correct data for training.
Results and Discussion	Our method achieved the highest BLEU score .
Results and Discussion	For example, ‘w/o PLM + Dep’ achieved the second highest BLEU score .
Results and Discussion	Compared to ‘Hori—’, ‘Hori’ achieved a significantly higher BLEU score .

BLEU score is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

11. Collaborative Decoding: Partial Hypothesis Re-ranking Using Translation Consensus between Decoders

Li, Mu and Duan, Nan and Zhang, Dongdong and Li, Chi-Ho and Zhou, Ming

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	In our experiments all the models are optimized with case-insensitive NIST version of BLEU score and we report results using this metric in percentage numbers.
Experiments	Figure 3 shows the BLEU score curves with up to 1000 candidates used for re-ranking.
Experiments	Figure 4 shows the BLEU scores of a two-system co-decoding as a function of re-decoding iterations.

BLEU score is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

12. Topological Ordering of Function Words in Hierarchical Phrase-based Translation

Setiawan, Hendra and Kan, Min Yen and Li, Haizhou and Resnik, Philip

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Discussion and Future Work	When we visually inspect and compare the outputs of our system with those of the baseline, we observe that improved BLEU score often corresponds to visible improvements in the subjective translation quality.
Experimental Results	These results confirm that the pairwise dominance model can significantly increase performance as measured by the BLEU score , with a consistent pattern of results across the MT06 and MT08 test sets.
Experimental Setup	all experiments, we report performance using the BLEU score (Papineni et al., 2002), and we assess statistical significance using the standard bootstrapping approach introduced by (Koehn, 2004).

BLEU score is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

13. Forest-based Tree Sequence to String Translation Model

Zhang, Hui and Zhang, Min and Li, Haizhou and Aw, Aiti and Tan, Chew Lim

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiment	The 9% tree sequence rules contribute 1.17 BLEU score improvement (28.83-27.66 in Table 1) to FTS2S over FT2S.
Experiment	Even if in the 5000 Best case, tree sequence is still able to contribute l.l BLEU score improvement (28.89-27.79).
Experiment	2) The BLEU scores are very similar to each other when we increase the forest pruning threshold.

BLEU score is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: