Index of papers in Proc. ACL 2014 that mention

BLEU scores

Seen in text as:

BLEU scores (40)
BLEU score (25)

Seen in 59 sentences in 10 papers.

1. That's Not What I Meant! Using Parsers to Avoid Structural Ambiguities in Generated Text

Duan, Manjuan and White, Michael

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Using parse accuracy in a simple reranking strategy for self-monitoring, we find that with a state-of-the-art averaged perceptron realization ranking model, BLEU scores cannot be improved with any of the well-known Treebank parsers we tested, since these parsers too often make errors that human readers would be unlikely to make.
Abstract	However, by using an SVM ranker to combine the realizer’s model score together with features from multiple parsers, including ones designed to make the ranker more robust to parsing mistakes, we show that significant increases in BLEU scores can be achieved.
Introduction	With this simple reranking strategy and each of three different Treebank parsers, we find that it is possible to improve BLEU scores on Penn Treebank development data with White & Rajkumar’s (2011; 2012) baseline generative model, but not with their averaged perceptron model.
Introduction	With the SVM reranker, we obtain a significant improvement in BLEU scores over
Introduction	Additionally, in a targeted manual analysis, we find that in cases where the SVM reranker improves the BLEU score, improvements to fluency and adequacy are roughly balanced, while in cases where the BLEU score goes down, it is mostly fluency that is made worse (with reranking yielding an acceptable paraphrase roughly one third of the time in both cases).
Reranking with SVMs 4.1 Methods	In training, we used the BLEU scores of each realization compared with its reference sentence to establish a preference order over pairs of candidate realizations, assuming that the original corpus sentences are generally better than related alternatives, and that BLEU can somewhat reliably predict human preference judgments.
Reranking with SVMs 4.1 Methods	The complete model, BBS+dep+nbest, achieved a BLEU score of 88.73, significantly improving upon the perceptron model (p < 0.02).
Simple Reranking	Table 2: Devset BLEU scores for simple ranking on top of n-best perceptron model realizations
Simple Reranking	Simple ranking with the Berkeley parser of the generative model’s n-best realizations raised the BLEU score from 85.55 to 86.07, well below the averaged perceptron model’s BLEU score of 87.93.

BLEU scores is mentioned in 16 sentences in this paper.

Topics mentioned in this paper:

perceptron (29)
SVM (23)
BLEU (20)

2. Are Two Heads Better than One? Crowdsourced Translation via a Two-Step Collaboration of Non-Professional Translators and Editors

Yan, Rui and Gao, Mingkun and Pavlick, Ellie and Callison-Burch, Chris

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Evaluation	In the following sections, we evaluate each of our methods by calculating BLEU scores against the same four sets of three reference translations.
Evaluation	This allows us to compare the BLEU score achieved by our methods against the BLEU scores achievable by professional translators.
Evaluation	As expected, random selection yields bad performance, with a BLEU score of 30.52.

BLEU scores is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

Turker (25)
TER (12)
BLEU (11)

3. Dependency-based Pre-ordering for Chinese-English Machine Translation

Cai, Jingsheng and Utiyama, Masao and Sumita, Eiichiro and Zhang, Yujie

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We present a set of dependency-based pre-ordering rules which improved the BLEU score by 1.61 on the NIST 2006 evaluation data.
Conclusion	The results showed that our approach achieved a BLEU score gain of 1.61.
Dependency-based Pre-ordering Rule Set	In the primary experiments, we tested the effectiveness of the candidate rules and filtered the ones that did not work based on the BLEU scores on the development set.
Experiments	For evaluation, we used BLEU scores (Papineni et al., 2002).
Experiments	It shows the BLEU scores on the test set and the statistics of pre-ordering on the training set, which includes the total count of each rule set and the number of sentences they were ap-
Introduction	Experiment results showed that our pre-ordering rule set improved the BLEU score on the NIST 2006 evaluation data by 1.61.

BLEU scores is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

4. Nonparametric Method for Data-driven Image Captioning

Mason, Rebecca and Charniak, Eugene

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Our Approach	BLEU Scores 13 N J:
Our Approach	Figure l: BLEU scores vs k for SumBasic extraction.
Our Approach	As shown in Figure 1, our system’s BLEU scores increase rapidly until about k = 25.

BLEU scores is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

BLEU (8)
BLEU scores (6)

5. Sentence Level Dialect Identification for Machine Translation System Selection

Salloum, Wael and Elfardy, Heba and Alamir-Salloum, Linda and Habash, Nizar and Diab, Mona

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion and Future Work	We plan to give different weights to different training examples based on the drop in BLEU score the example can cause if classified incorrectly.
MT System Selection	We run the 5,562 sentences of the classification training data through our four MT systems and produce sentence-level BLEU scores (with length penalty).
MT System Selection	We pick the name of the MT system with the highest BLEU score as the class label for that sentence.
MT System Selection	When there is a tie in BLEU scores, we pick the system label that yields better overall BLEU scores from the systems tied.
Machine Translation Experiments	All differences in BLEU scores between the four systems are statistically significant above the 95% level.
Machine Translation Experiments	We also report in Table 1 an oracle system selection where we pick, for each sentence, the English translation that yields the best BLEU score .

BLEU scores is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

6. Surface Realisation from Knowledge-Bases

Gyawali, Bikash and Gardent, Claire

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion	We observed that this often fails to return the best output in terms of BLEU score , fluency, grammaticality and/or meaning.
Results and Discussion	Figure 6: BLEU scores and Grammar Size (Number of Elementary TAG trees
Results and Discussion	The average BLEU score is given with respect to all input (All) and to those inputs for which the systems generate at least one sentence (Covered).
Results and Discussion	In terms of BLEU score , the best version of our system (AUTEXP) outperforms the probabilistic approach of IMS by a large margin (+0.17) and produces results similar to the fully handcrafted UDEL system (-().

BLEU scores is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

7. Hybrid Simplification using Deep Semantics and Machine Translation

Narayan, Shashi and Gardent, Claire

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	BLEU score We used Moses support tools: multi-bleu10 to calculate BLEU scores .
Experiments	The BLEU scores shown in Table 4 show that our system produces simplifications that are closest to the reference.
Experiments	In sum, the automatic metrics indicate that our system produces simplification that are consistently closest to the reference in terms of edit distance, number of splits and BLEU score .
Related Work	(2010) namely, an aligned corpus of 100/131 EWKP/SWKP sentences and show that they achieve better BLEU score .

BLEU scores is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

8. Enhancing Grammatical Cohesion: Generating Transitional Expressions for SMT

Tu, Mei and Zhou, Yu and Zong, Chengqing

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	In Table 3, almost all BLEU scores are improved, no matter what strategy is used.
Experiments	The final BLEU scores on NIST05 and NIST06 are given in Table 4.
Experiments	BLEU scores on the large-scale training data.

BLEU scores is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

9. Translation Assistance by Translation of L1 Fragments in an L2 Context

van Gompel, Maarten and van den Bosch, Antal

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments & Results	The BLEU scores , not included in the figure but shown in Table 2, show a similar trend.
Experiments & Results	Statistical significance on the BLEU scores was tested using pairwise bootstrap sampling (Koehn, 2004).
Experiments & Results	Another discrepancy is found in the BLEU scores of the English—>Chinese experiments, where we measure an unexpected drop in BLEU score under baseline.

BLEU scores is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

10. Empirical Study of Unsupervised Chinese Word Segmentation Methods for SMT on Large-scale Corpora

Wang, Xiaolin and Utiyama, Masao and Finch, Andrew and Sumita, Eiichiro

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Complexity Analysis	It was set to 3 for the monolingual unigram model, and 2 for the bilingual unigram model, which provided slightly higher BLEU scores on the development set than the other settings.
Complexity Analysis	Table 4 presents the BLEU scores for Moses using different segmentation methods.
Introduction	o improvement of BLEU scores compared to supervised Stanford Chinese word segmenter.

BLEU scores is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

segmenters (11)
BLEU (9)
bigram (7)