SciSurf: Index of "BLEU score" in Proc. ACL 2010

Index of papers in Proc. ACL 2010 that mention

BLEU score

Seen in text as:

BLEU score (48)
BLEU scores (24)
Bleu score (5)

Seen in 75 sentences in 11 papers.

1. Syntax-to-Morphology Mapping in Factored Phrase-Based Statistical Machine Translation from English to Turkish

Yeniterzi, Reyyan and Oflazer, Kemal

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We incrementally explore capturing various syntactic substructures as complex tags on the English side, and evaluate how our translations improve in BLEU scores .
Experimental Setup and Results	Wherever meaningful, we report the average BLEU scores over 10 data sets along with the maximum and minimum values and the standard deviation.
Experimental Setup and Results	Table 1: BLEU scores for a variety of transformation combinations
Experimental Setup and Results	15Note than in this case, the translations would be generated in the same format, but we then split such postpositions from the words they are attached to, during decoding, and then evaluate the BLEU score .
Introduction	We find that with the full set of syntax-to-morphology transformations and some additional techniques we can get about 39% relative improvement in BLEU scores over a word-based baseline and about 28% improvement of a factored baseline, all experiments being done over 10 training and test sets.
Syntax-to-Morphology Mapping	We find (and elaborate later) that this reduction in the English side of the training corpus, in general, is about 30%, and is correlated with improved BLEU scores .

BLEU score is mentioned in 22 sentences in this paper.

Topics mentioned in this paper:

2. Improving Statistical Machine Translation with Monolingual Collocation

Liu, Zhanyi and Wang, Haifeng and Wu, Hua and Li, Sheng

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	As compared to baseline systems, we achieve absolute improvements of 2.40 BLEU score on a phrase-based SMT system and 1.76 BLEU score on a parsing-based SMT system.
Conclusion	The improved word alignment results in an improvement of 2.16 BLEU score on a phrase-based SMT system and an improvement of 1.76 BLEU score on a parsing-based SMT system.
Conclusion	When we also used phrase collocation probabilities as additional features, the phrase-based SMT performance is finally improved by 2.40 BLEU score as compared with the baseline system.
Experiments on Parsing-Based SMT	The system using the improved word alignments achieves an absolute improvement of 1.76 BLEU score , which indicates that the improvements of word alignments are also effective to improve the performance of the parsing-based SMT systems.
Experiments on Phrase-Based SMT	If the same alignment method is used, the systems using CM-3 got the highest BLEU scores .
Experiments on Phrase-Based SMT	When the phrase collocation probabilities are incorporated into the SMT system, the translation quality is improved, achieving an absolute improvement of 0.85 BLEU score .
Experiments on Phrase-Based SMT	As compared with the baseline system, an absolute improvement of 2.40 BLEU score is achieved.
Introduction	The alignment improvement results in an improvement of 2.16 BLEU score on phrase-based SMT system and an improvement of 1.76 BLEU score on parsing-based SMT system.
Introduction	SMT performance is further improved by 0.24 BLEU score .

BLEU score is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

3. Constituency to Dependency Translation with Forests

Mi, Haitao and Liu, Qun

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion and Future Work	Using all constituency-to-dependency translation rules and bilingual phrases, our model achieves +0.7 points improvement in BLEU score significantly over a state-of-the-art forest-based tree-to-string system.
Experiments	We use the standard minimum error-rate training (Och, 2003) to tune the feature weights to maximize the system’s BLEU score on development set.
Experiments	The baseline system extracts 31.9M 625 rules, 77.9M 525 rules respectively and achieves a BLEU score of 34.17 on the test set3.
Experiments	As shown in the third line in the column of BLEU score , the performance drops 1.7 BLEU points over baseline system due to the poorer rule coverage.

BLEU score is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

4. Boosting-Based System Combination for Machine Translation

Xiao, Tong and Zhu, Jingbo and Zhu, Muhua and Wang, Huizhen

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Background	where BLEU(e,-j, r,-) is the smoothed sentence-level BLEU score (Liang et al., 2006) of the translation e with respect to the reference translations r,, and e: is the oracle translation which is selected from {em ..., em} in terms of BLEU(e,-j, r,-).
Background	Figures 2-5 show the BLEU curves on the development and test sets, where the X-aXis is the iteration number, and the Y-aXis is the BLEU score of the system generated by the boosting-based system combination.
Background	The BLEU scores tend to converge to the stable values after 20 iterations for all the systems.

BLEU score is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

5. Discriminative Pruning for Discriminative ITG Alignment

Liu, Shujie and Li, Chi-Ho and Zhou, Ming

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	On top of the pruning framework, we also propose a discriminative ITG alignment model using hierarchical phrase pairs, which improves both F-score and Bleu score over the baseline alignment system of GIZA++.
Evaluation	Finally, we also do end-to-end evaluation using both F-score in alignment and Bleu score in translation.
Evaluation	HP-DITG using DPDI achieves the best Bleu score with acceptable time cost.
Evaluation	It shows that HP-DITG (with DPDI) is better than the three baselines both in alignment F-score and Bleu score .

BLEU score is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

6. A Hybrid Rule/Model-Based Finite-State Framework for Normalizing SMS Messages

Beaufort, Richard and Roekhaut, Sophie and Cougnon, Louise-Amélie and Fairon, Cédrick

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Evaluated in French by 10-fold-cross validation, the system achieves a 9.3% Word Error Rate and a 0.83 BLEU score .
Conclusion and perspectives	Evaluated by tenfold cross-validation, the system seems efficient, and the performance in terms of BLEU score and WER are quite encouraging.
Evaluation	The system was evaluated in terms of BLEU score (Papineni et al., 2001), Word Error Rate (WER) and Sentence Error Rate (SER).
Evaluation	The copy-paste results just inform about the real deViation of our corpus from the traditional spelling conventions, and highlight the fact that our system is still at pains to significantly reduce the SER, while results in terms of WER and BLEU score are quite encouraging.

BLEU score is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

7. Tackling Sparse Data Issue in Machine Translation Evaluation

Bojar, Ondřej and Kos, Kamil and Mareċek, David

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion	This is confirmed for other languages as well: the lower the BLEU score the lower the correlation to human judgments.
Problems of BLEU	We plot the official BLEU score against the rank established as the percentage of sentences where a system ranked no worse than all its competitors (Callison-Burch et al., 2009).
Problems of BLEU	Figure 3 documents the issue across languages: the lower the BLEU score itself (i.e.
Problems of BLEU	A phrase-based system like Moses (cu-bojar) can sometimes produce a long sequence of tokens exactly as required by the reference, leading to a high BLEU score .

BLEU score is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

8. Hindi-to-Urdu Machine Translation through Transliteration

Durrani, Nadir and Sajjad, Hassan and Fraser, Alexander and Schmid, Helmut

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We obtain final BLEU scores of 19.35 (conditional probability model) and 19.00 (joint probability model) as compared to 14.30 for a baseline phrase-based system and 16.25 for a system which transliterates OOV words in the baseline system.
Final Results	This section shows the improvement in BLEU score by applying heuristics and combinations of heuristics in both the models.
Final Results	For other parts of the data where the translators have heavily used transliteration, the system may receive a higher BLEU score .
Introduction	Section 4 discusses the training data, parameter optimization and the initial set of experiments that compare our two models with a baseline Hindi-Urdu phrase-based system and with two transliteration-aided phrase-based systems in terms of BLEU scores

BLEU score is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

9. Pseudo-Word for Phrase-Based Machine Translation

Duan, Xiangyu and Zhang, Min and Li, Haizhou

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments and Results	Statistical significance in BLEU score differences was tested by paired bootstrap re-sampling (Koehn, 2004).
Experiments and Results	Best ESSP (Wchpwen) is significantly better than baseline (p<0.0l) in BLEU score, best SMP (wdpwen) is significantly better than baseline (p<0.05) in BLEU score .
Experiments and Results	wchpwen is significantly better than baseline (p<0.04) in BLEU score .

BLEU score is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

10. Hierarchical Search for Word Alignment

Riesa, Jason and Marcu, Daniel

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Our model outperforms a GIZA++ Model-4 baseline by 6.3 points in F-measure, yielding a 1.1 BLEU score increase over a state-of-the-art syntax-based machine translation system.
Conclusion	We treat word alignment as a parsing problem, and by taking advantage of English syntax and the hypergraph structure of our search algorithm, we report significant increases in both F-measure and BLEU score over standard baselines in use by most state-of-the-art MT systems today.
Related Work	Very recent work in word alignment has also started to report downstream effects on BLEU score .

BLEU score is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

11. Training Phrase Translation Models with Leaving-One-Out

Wuebker, Joern and Mauser, Arne and Ney, Hermann

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Alignment	We perform minimum error rate training with the downhill simplex algorithm (Nelder and Mead, 1965) on the development data to obtain a set of scaling factors that achieve a good BLEU score .
Experimental Evaluation	A second iteration of the training algorithm shows nearly no changes in BLEU score , but a small improvement in TER.
Experimental Evaluation	yields a BLEU score slightly lower than with fixed interpolation on both DEV and TEST.

BLEU score is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: