SciSurf: Index of "BLEU" in Proc. ACL 2010

Index of papers in Proc. ACL 2010 that mention

BLEU

Seen in text as:

BLEU (196)
Bleu (6)

Seen in 184 sentences in 14 papers.

1. Tackling Sparse Data Issue in Machine Translation Evaluation

Bojar, Ondřej and Kos, Kamil and Mareċek, David

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	BLEU ) when applied to morphologically rich languages such as Czech.
Introduction	Section 2 illustrates and explains severe problems of a widely used BLEU metric (Papineni et al., 2002) when applied to Czech as a representative of languages with rich morphology.
Introduction	cu-bOJar uedin 0.4 l l l l 0.06 0.08 0.10 0.12 0.14 BLEU
Introduction	Figure l: BLEU and human ranks of systems participating in the English-to-Czech WMT09 shared task.
Problems of BLEU	BLEU (Papineni et al., 2002) is an established language-independent MT metric.
Problems of BLEU	The unbeaten advantage of BLEU is its simplicity.
Problems of BLEU	We plot the official BLEU score against the rank established as the percentage of sentences where a system ranked no worse than all its competitors (Callison-Burch et al., 2009).

BLEU is mentioned in 22 sentences in this paper.

Topics mentioned in this paper:

2. Improving Statistical Machine Translation with Monolingual Collocation

Liu, Zhanyi and Wang, Haifeng and Wu, Hua and Li, Sheng

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	As compared to baseline systems, we achieve absolute improvements of 2.40 BLEU score on a phrase-based SMT system and 1.76 BLEU score on a parsing-based SMT system.
Experiments on Parsing-Based SMT	Experiments BLEU (%) Joshua 30.05 + Improved word alignments 31.81
Experiments on Parsing-Based SMT	The system using the improved word alignments achieves an absolute improvement of 1.76 BLEU score, which indicates that the improvements of word alignments are also effective to improve the performance of the parsing-based SMT systems.
Experiments on Phrase-Based SMT	We use BLEU (Papineni et al., 2002) as evaluation metrics.
Experiments on Phrase-Based SMT	Experiments BLEU (%) Moses 29.62 + Phrase collocation probability 30.47
Experiments on Phrase-Based SMT	If the same alignment method is used, the systems using CM-3 got the highest BLEU scores.
Experiments on Word Alignment	Experiments BLEU (%) Baseline 29.62 CM-l 30.85 WA-l CM-2 31.28 CM-3 31.48 CM-l 3 l .00 Our methods WA-2 CM-2 3 l .33 CM-3 31.51 CM-l 3 l .43 WA-3 CM-2 31.62 CM-3 31.78
Introduction	The alignment improvement results in an improvement of 2.16 BLEU score on phrase-based SMT system and an improvement of 1.76 BLEU score on parsing-based SMT system.
Introduction	SMT performance is further improved by 0.24 BLEU score.

BLEU is mentioned in 13 sentences in this paper.

Topics mentioned in this paper:

3. Constituency to Dependency Translation with Forests

Mi, Haitao and Liu, Qun

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Medium-scale experiments show an absolute and statistically significant improvement of +0.7 BLEU points over a state-of-the-art forest-based tree-to-string system even with fewer rules.
Experiments	We use the standard minimum error-rate training (Och, 2003) to tune the feature weights to maximize the system’s BLEU score on development set.
Experiments	The baseline system extracts 31.9M 625 rules, 77.9M 525 rules respectively and achieves a BLEU score of 34.17 on the test set3.
Experiments	As shown in the third line in the column of BLEU score, the performance drops 1.7 BLEU points over baseline system due to the poorer rule coverage.
Introduction	BLEU
Introduction	Medium data experiments (Section 5) show a statistically significant improvement of +0.7 BLEU points over a state-of-the-art forest-based tree-to-string system even with less translation rules, this is also the first time that a tree-to-tree model can surpass tree-to-string counterparts.
Model	(2009), their forest-based constituency-to-constituency system achieves a comparable performance against Moses (Koehn et al., 2007), but a significant improvement of +3.6 BLEU points over the 1-best tree-based constituency-to-constituency system.

BLEU is mentioned in 16 sentences in this paper.

Topics mentioned in this paper:

4. Training Phrase Translation Models with Leaving-One-Out

Wuebker, Joern and Mauser, Arne and Ney, Hermann

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Using this consistent training of phrase models we are able to achieve improvements of up to 1.4 points in BLEU .
Alignment	We perform minimum error rate training with the downhill simplex algorithm (Nelder and Mead, 1965) on the development data to obtain a set of scaling factors that achieve a good BLEU score.
Experimental Evaluation	The scaling factors of the translation models have been optimized for BLEU on the DEV data.
Experimental Evaluation	‘ BLEU ‘ TER ‘
Experimental Evaluation	The metrics used for evaluation are the case-sensitive BLEU (Papineni et al., 2002) score and the translation edit rate (TER) (Snover et al., 2006) with one reference translation.
Introduction	Our results show that the proposed phrase model training improves translation quality on the test set by 0.9 BLEU points over our baseline.
Introduction	We find that by interpolation with the heuristically extracted phrases translation performance can reach up to 1.4 BLEU improvement over the baseline on the test set.

BLEU is mentioned in 19 sentences in this paper.

Topics mentioned in this paper:

5. Boosting-Based System Combination for Machine Translation

Xiao, Tong and Zhu, Jingbo and Zhu, Muhua and Wang, Huizhen

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Background	As in other state-of-the-art SMT systems, BLEU is selected as the accuracy measure to define the error function used in MERT.
Background	Since the weights of training samples are not taken into account in BLEUZ, we modify the original definition of BLEU to make it sensitive to the distribution Dt(i) over the training samples.
Background	The modified version of BLEU is called weighted BLE U (WBLEU) in this paper.

BLEU is mentioned in 25 sentences in this paper.

Topics mentioned in this paper:

6. Syntax-to-Morphology Mapping in Factored Phrase-Based Statistical Machine Translation from English to Turkish

Yeniterzi, Reyyan and Oflazer, Kemal

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We incrementally explore capturing various syntactic substructures as complex tags on the English side, and evaluate how our translations improve in BLEU scores.
Abstract	Our maximal set of source and target side transformations, coupled with some additional techniques, provide an 39% relative improvement from a baseline 17.08 to 23.78 BLEU , all averaged over 10 training and test sets.
Experimental Setup and Results	For evaluation, we used the BLEU metric (Pap-ineni et al., 2001).
Experimental Setup and Results	Wherever meaningful, we report the average BLEU scores over 10 data sets along with the maximum and minimum values and the standard deviation.
Experimental Setup and Results	We can observe that the combined syntax-to-morphology transformations on the source side provide a substantial improvement by themselves and a simple target side transformation on top of those provides a further boost to 21.96 BLEU which represents a 28.57% relative improvement over the word-based baseline and a 18.00% relative improvement over the factored baseline.
Introduction	We find that with the full set of syntax-to-morphology transformations and some additional techniques we can get about 39% relative improvement in BLEU scores over a word-based baseline and about 28% improvement of a factored baseline, all experiments being done over 10 training and test sets.
Syntax-to-Morphology Mapping	We find (and elaborate later) that this reduction in the English side of the training corpus, in general, is about 30%, and is correlated with improved BLEU scores.

BLEU is mentioned in 35 sentences in this paper.

Topics mentioned in this paper:

7. Pseudo-Word for Phrase-Based Machine Translation

Duan, Xiangyu and Zhang, Min and Li, Haizhou

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments and Results	Statistical significance in BLEU score differences was tested by paired bootstrap re-sampling (Koehn, 2004).
Experiments and Results	BLEU 0.4029 0.3146 NIST 7.0419 8.8462 METEOR 0.5785 0.5335
Experiments and Results	Both SMP and ESSP outperform baseline consistently in BLEU , NIST and METEOR.

BLEU is mentioned in 10 sentences in this paper.

Topics mentioned in this paper:

8. Hindi-to-Urdu Machine Translation through Transliteration

Durrani, Nadir and Sajjad, Hassan and Fraser, Alexander and Schmid, Helmut

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We obtain final BLEU scores of 19.35 (conditional probability model) and 19.00 (joint probability model) as compared to 14.30 for a baseline phrase-based system and 16.25 for a system which transliterates OOV words in the baseline system.
Evaluation	M Pbo Pbl Pb2 M1 M2 BLEU 14.3 16.25 16.13 18.6 17.05
Evaluation	Both our systems (Model-1 and Model-2) beat the baseline phrase-based system with a BLEU point difference of 4.30 and 2.75 respectively.
Evaluation	The difference of 2.35 BLEU points between M1 and Pbl indicates that transliteration is useful for more than only translating OOV words for language pairs like Hindi-Urdu.
Final Results	This section shows the improvement in BLEU score by applying heuristics and combinations of heuristics in both the models.
Final Results	BLEU point improvement and combined with all the heuristics (M2H123) gives an overall gain of 1.95 BLEU points and is close to our best results (M1H12).
Final Results	One important issue that has not been investigated yet is that BLEU has not yet been shown to have good performance in morphologically rich target languages like Urdu, but there is no metric known to work better.
Introduction	Section 4 discusses the training data, parameter optimization and the initial set of experiments that compare our two models with a baseline Hindi-Urdu phrase-based system and with two transliteration-aided phrase-based systems in terms of BLEU scores

BLEU is mentioned in 10 sentences in this paper.

Topics mentioned in this paper:

9. Automatic Evaluation Method for Machine Translation Using Noun-Phrase Chunking

Echizen-ya, Hiroshi and Araki, Kenji

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	To confirm the effectiveness of noun-phrase chunking, we performed the experiment using a system combining BLEU with our method.
Experiments	In this case, BLEU scores were used as scorewd in Eq.
Experiments	This experimental result is shown as “BLEU with our method” in Tables 2—5.
Introduction	Methods based on word strings (6.9., BLEU (Papineni et al., 2002), NIST(NIST, 2002), METEOR(Banerjee and Lavie., 2005), ROUGE-L(Lin and Och, 2004),

BLEU is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

10. Hierarchical Search for Word Alignment

Riesa, Jason and Marcu, Daniel

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Our model outperforms a GIZA++ Model-4 baseline by 6.3 points in F-measure, yielding a 1.1 BLEU score increase over a state-of-the-art syntax-based machine translation system.
Conclusion	We treat word alignment as a parsing problem, and by taking advantage of English syntax and the hypergraph structure of our search algorithm, we report significant increases in both F-measure and BLEU score over standard baselines in use by most state-of-the-art MT systems today.
Experiments	BLEU Words .696 45.1 2,538 .674 46.4 2,262
Experiments	Our hypergraph alignment algorithm allows us a 1.1 BLEU increase over the best baseline system, Model-4 grow-diag-final.
Experiments	We also report a 2.4 BLEU increase over a system trained with alignments from Model-4 union.
Related Work	Very recent work in word alignment has also started to report downstream effects on BLEU score.
Related Work	(2009) confirm and extend these results, showing BLEU improvement for a hierarchical phrase-based MT system on a small Chinese corpus.

BLEU is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

11. Bilingual Sense Similarity for Statistical Machine Translation

Chen, Boxing and Foster, George and Kuhn, Roland

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Analysis and Discussion	Table 4: Results ( BLEU %) of Chinese—to—English large data (CE_LD) and small data (CE_SD) NIST task by applying one feature.
Analysis and Discussion	Table 5: Results ( BLEU %) for combination of two similarity scores.
Analysis and Discussion	Table 6: Results ( BLEU %) of using simple features based on context on small data NIST task.
Experiments	Our evaluation metric is IBM BLEU (Papineni et al., 2002), which performs case-insensitive matching of n- grams up to n = 4.
Experiments	Table 2: Results ( BLEU %) of small data Chinese-to-English NIST task.
Experiments	Table 3: Results ( BLEU %) of large data Chinese-to-English NIST task and German-to—English WMT task.

BLEU is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

12. Discriminative Pruning for Discriminative ITG Alignment

Liu, Shujie and Li, Chi-Ho and Zhou, Ming

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	On top of the pruning framework, we also propose a discriminative ITG alignment model using hierarchical phrase pairs, which improves both F-score and Bleu score over the baseline alignment system of GIZA++.
Evaluation	Finally, we also do end-to-end evaluation using both F-score in alignment and Bleu score in translation.
Evaluation	HP-DITG using DPDI achieves the best Bleu score with acceptable time cost.
Evaluation	It shows that HP-DITG (with DPDI) is better than the three baselines both in alignment F-score and Bleu score.

BLEU is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

13. A Hybrid Rule/Model-Based Finite-State Framework for Normalizing SMS Messages

Beaufort, Richard and Roekhaut, Sophie and Cougnon, Louise-Amélie and Fairon, Cédrick

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Evaluated in French by 10-fold-cross validation, the system achieves a 9.3% Word Error Rate and a 0.83 BLEU score.
Conclusion and perspectives	Evaluated by tenfold cross-validation, the system seems efficient, and the performance in terms of BLEU score and WER are quite encouraging.
Evaluation	The system was evaluated in terms of BLEU score (Papineni et al., 2001), Word Error Rate (WER) and Sentence Error Rate (SER).
Evaluation	The copy-paste results just inform about the real deViation of our corpus from the traditional spelling conventions, and highlight the fact that our system is still at pains to significantly reduce the SER, while results in terms of WER and BLEU score are quite encouraging.

BLEU is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

14. Error Detection for Statistical Machine Translation Using Linguistic Features

Xiong, Deyi and Zhang, Min and Li, Haizhou

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Corpus ‘ BLEU (%) RCW (%)
Experiments	Table 4: Case-insensitive BLEU score and ratio of correct words (RCW) on the training, development and test corpus.
Experiments	Table 4 shows the case-insensitive BLEU score and the percentage of words that are labeled as correct according to the method described above on the training, development and test corpus.
SMT System	The performance, in terms of BLEU (Papineni et al., 2002) score, is shown in Table 4.

BLEU is mentioned in 4 sentences in this paper.

Topics mentioned in this paper: