Index of papers in Proc. ACL 2011 that mention
  • BLEU score
Tan, Ming and Zhou, Wenli and Zheng, Lei and Wang, Shaojun
Abstract
The large scale distributed composite language model gives drastic perplexity reduction over n-grams and achieves significantly better translation quality measured by the BLEU score and “readability” when applied to the task of re—ranking the N -best list from a state-of—the-art parsing-based machine translation system.
Experimental results
We substitute our language model and use MERT (Och, 2003) to optimize the BLEU score (Papineni et al., 2002).
Experimental results
We partition the data into ten pieces, 9 pieces are used as training data to optimize the BLEU score (Papineni et al., 2002) by MERT (Och,
Experimental results
2003), a remaining single piece is used to re-rank the 1000-best list and obtain the BLEU score .
Introduction
ply our language models to the task of re-ranking the N-best list from Hiero (Chiang, 2005; Chiang, 2007), a state-of-the-art parsing-based MT system, we achieve significantly better translation quality measured by the BLEU score and “readability”.
BLEU score is mentioned in 13 sentences in this paper.
Topics mentioned in this paper:
Clifton, Ann and Sarkar, Anoop
Conclusion and Future Work
We found that using a segmented translation model based on unsupervised morphology induction and a model that combined morpheme segments in the translation model with a postprocessing morphology prediction model gave us better BLEU scores than a word-based baseline.
Experimental Results
All the BLEU scores reported are for lowercase evaluation.
Experimental Results
No Uni indicates the seg-Lented BLEU score without unigrams.
Experimental Results
.on of m-BLEU score (Luong et al., 2010) where 1e BLEU score is computed by comparing the 3gmented output with a segmented reference ranslation.
Models 2.1 Baseline Models
performance of unsupervised segmentation for translation, our third baseline is a segmented translation model based on a supervised segmentation model (called Sup), using the hand-built Omorfi morphological analyzer (Pirinen and Lis-tenmaa, 2007), which provided slightly higher BLEU scores than the word-based baseline.
Translation and Morphology
Our proposed approaches are significantly better than the state of the art, achieving the highest reported BLEU scores on the English-Finnish Europarl version 3 dataset.
BLEU score is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Wu, Xianchao and Matsuzaki, Takuya and Tsujii, Jun'ichi
Abstract
Extensive experiments involving large-scale English-to-Japanese translation revealed a significant improvement of 1.8 points in BLEU score , as compared with a strong forest-to-string baseline system.
Conclusion
Extensive experiments on large-scale English-to-Japanese translation resulted in a significant improvement in BLEU score of 1.8 points (p < 0.01), as compared with our implementation of a strong forest-to-string baseline system (Mi et al., 2008; Mi and Huang, 2008).
Experiments
Here, fw denotes function word, and DT denotes the decoding time, and the BLEU scores were computed onthetestset
Experiments
the final BLEU scores of C3—T with Min-F and C3-F.
Experiments
Using the composed rule set C3—F in our forest-based decoder, we achieved an optimal BLEU score of 28.89 (%).
Introduction
(2008) achieved a 3.1-point improvement in BLEU score (Papineni et al., 2002) by including bilingual syntactic phrases in their forest-based system.
Introduction
Using the composed rules of the present study in a baseline forest-to-string translation system results in a 1.8-point improvement in the BLEU score for large-scale English-to-Japanese translation.
BLEU score is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Ravi, Sujith and Knight, Kevin
Machine Translation as a Decipherment Task
The figure also shows the corresponding BLEU scores in parentheses for comparison (higher scores indicate better MT output).
Machine Translation as a Decipherment Task
Better LMs yield better MT results for both parallel and decipherment training—for example, using a segment-based English LM instead of a 2-gram LM yields a 24% reduction in edit distance and a 9% improvement in BLEU score for EM decipherment.
Machine Translation as a Decipherment Task
Figure 4 plots the BLEU scores versus training sizes for different MT systems on the Time corpus.
BLEU score is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Neubig, Graham and Watanabe, Taro and Sumita, Eiichiro and Mori, Shinsuke and Kawahara, Tatsuya
Experimental Evaluation
6For most models, while likelihood continued to increase gradually for all 100 iterations, BLEU score gains plateaued after 5-10 iterations, likely due to the strong prior information
Experimental Evaluation
It can also be seen that combining phrase tables from multiple samples improved the BLEU score for HLEN, but not for HIER.
Hierarchical ITG Model
(2003) that using phrases where max(|e|, |f g 3 cause significant improvements in BLEU score , while using larger phrases results in diminishing returns.
Introduction
We also find that it achieves superior BLEU scores over previously proposed ITG-based phrase alignment approaches.
BLEU score is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Han, Bo and Baldwin, Timothy
Conclusion and Future Work
In normalisation, we compared our method with two benchmark methods from the literature, and achieved that highest F-score and BLEU score by integrating dictionary lookup, word similarity and context support modelling.
Experiments
The 10-fold cross-validated BLEU score (Papineni et al., 2002) over this data is 0.81.
Experiments
Additionally, we evaluate using the BLEU score over the normalised form of each message, as the SMT method can lead to perturbations of the token stream, vexing standard precision, recall and F-score evaluation.
BLEU score is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: