Index of papers in Proc. ACL 2013 that mention

BLEU scores

Seen in text as:

BLEU scores (36)
BLEU score (32)

Seen in 69 sentences in 13 papers.

1. Scalable Decipherment for Machine Translation via Hash Sampling

Ravi, Sujith

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We show empirical results on the OPUS data—our method yields the best BLEU scores compared to existing approaches, while achieving significant computational speedups (several orders faster).
Discussion and Future Work	These when combined with standard MT systems such as Moses (Koehn et al., 2007) trained on parallel corpora, have been shown to yield some BLEU score improvements.
Experiments and Results	To evaluate translation quality, we use BLEU score (Papineni et al., 2002), a standard evaluation measure used in machine translation.
Experiments and Results	We show that our method achieves the best performance ( BLEU scores ) on this task while being significantly faster than both the previous approaches.
Experiments and Results	For both the MT tasks, we also report BLEU scores for a baseline system using identity translations for common words (words appearing in both source/target vocabularies) and random translations for other words.

BLEU scores is mentioned in 10 sentences in this paper.

Topics mentioned in this paper:

2. Integrating Phrase-based Reordering Features into a Chart-based Decoder for Machine Translation

Nguyen, ThuyLinh and Vogel, Stephan

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiment Results	We tuned the parameters on the MT06 NIST test set (1664 sentences) and report the BLEU scores on three unseen test sets: MT04 (1353 sentences), MT05 (1056 sentences) and MT09 (1313 sentences).
Experiment Results	On average the improvement is 1.07 BLEU score (45.66
Experiment Results	without new phrase-based features and 1.14 BLEU score over the baseline Hiero system.
Phrasal-Hiero Model	Compare BLEU scores of translation using all extracted rules (the first row) and translation using only rules without nonaligned subphrases (the second row).

BLEU scores is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

3. Predicting and Eliciting Addressee's Emotion in Online Dialogue

Hasegawa, Takayuki and Kaji, Nobuhiro and Yoshinaga, Naoki and Toyoda, Masashi

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Each utterance in the test data has more than one responses that elicit the same goal emotion, because they are used to compute BLEU score (see section 5.3).
Experiments	We first use BLEU score (Papineni et al., 2002) to perform automatic evaluation (Ritter et al., 2011).
Experiments	In this evaluation, the system is provided with the utterance and the goal emotion in the test data and the generated responses are evaluated through BLEU score .

BLEU scores is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

4. Name-aware Machine Translation

Li, Haibo and Zheng, Jing and Ji, Heng and Li, Qi and Wang, Wen

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Baseline MT	The scaling factors for all features are optimized by minimum error rate training algorithm to maximize BLEU score (Och, 2003).
Experiments	In order to investigate the correlation between name-aware BLEU scores and human judgment results, we asked three bilingual speakers to judge our translation output from the baseline system and the NAMT system, on a Chinese subset of 250 sentences (each sentence has two corresponding translations from baseline and NAMT) extracted randomly from 7 test corpora.
Experiments	We computed the name-aware BLEU scores on the subset and also the aggregated average scores from human judgments.
Experiments	Furthermore, we calculated three Pearson product-moment correlation coefficients between human judgment scores and name-aware BLEU scores of these two MT systems.
Name-aware MT Evaluation	Based on BLEU score , we design a name-aware BLEU metric as follows.
Name-aware MT Evaluation	Finally the name-aware BLEU score is defined as:

BLEU scores is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

BLEU (19)
word alignment (17)
LM (12)

5. Hierarchical Phrase Table Combination for Machine Translation

Zhu, Conghui and Watanabe, Taro and Sumita, Eiichiro and Zhao, Tiejun

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion and Future Work	The method assumes that a combined model is derived from a hierarchical Pitman-Yor process with each prior learned separately in each domain, and achieves BLEU scores competitive with traditional batch-based ones.
Experiment	The BLEU scores reported in this paper are the average of 5 independent runs of independent batch-MIRA weight training, as suggested by (Clark et al., 2011).
Experiment	When comparing the hier-combin with the pialign-batch, the BLEU scores are a little higher while the time spent for training is much lower, almost one quarter of the pialign-batch.
Experiment	Table 4 shows the BLEU scores for the three data sets, in which the order of combining phrase tables from each domain is alternated in the ascending and descending of the similarity to the test data.

BLEU scores is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

6. Additive Neural Networks for Statistical Machine Translation

liu, lemao and Watanabe, Taro and Sumita, Eiichiro and Zhao, Tiejun

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	In the extreme, if the k-best list consists only of a pair of translations ((6, d), (6’, d’ )), the desirable weight should satisfy the assertion: if the BLEU score of 6* is greater than that of 6’, then the model score of (6, d) with this weight will be also greater than that of (6’, d’ In this paper, a pair (6*, 6’) for a source sentence f is called as a preference pair for f. Following PRO, we define the following objective function under the maX-margin framework to optimize the AdNN model:
Introduction	to that of Moses: on the NISTOS test set, L-Hiero achieves 25.1 BLEU scores and Moses achieves 24.8.
Introduction	Since both MERT and PRO tuning toolkits involve randomness in their implementations, all BLEU scores reported in the experiments are the average of five tuning runs, as suggested by Clark et al.

BLEU scores is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

7. Combining Referring Expression Generation and Surface Realization: A Corpus-Based Investigation of Architectures

Zarriess, Sina and Kuhn, Jonas

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	When REG and linearization are applied on shallowSyn_re with gold shallow trees, the BLEU score is lower (60.57) as compared to the system that applies syntax and linearization on deepSynJrre, deep trees with gold REs ( BLEU score of 63.9).
Experiments	The revision-based system with disjoint modelling of implicits shows a slight, nonsignificant increase in BLEU score .
Experiments	By contrast, the BLEU.. score is signficantly better for the joint approach.

BLEU scores is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

8. An Infinite Hierarchical Bayesian Model of Phrasal Translation

Cohn, Trevor and Haffari, Gholamreza

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	9Hence the BLEU scores we get for the baselines may appear lower than what reported in the literature.
Experiments	Table 3 shows the BLEU scores for the three translation tasks UR/AlUFA—>EN based on our method against the baselines.
Experiments	For our models, we report the average BLEU score of the 5 independent runs as well as that of the aggregate phrase table generated by these 5 independent runs.

BLEU scores is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

9. Online Relative Margin Maximization for Statistical Machine Translation

Eidelman, Vladimir and Marton, Yuval and Resnik, Philip

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Additional Experiments	On the large feature set, RM is again the best performer, except, perhaps, a tied BLEU score with MIRA on MT08, but with a clear 1.8 TER gain.
Discussion	This correlates with our observation that RM’s overall BLEU score is negatively impacted by the BP, as the BLEU precision scores are noticeably higher.
Discussion	We also notice that while PRO had the lowest BLEU scores in Chinese, it was competitive in Arabic with the highest number of features.
Experiments	5In the small feature set RAMPION yielded similar best BLEU scores , but worse TER.

BLEU scores is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

feature set (25)
BLEU (18)
TER (11)

10. Two-Neighbor Orientation Model with Cross-Boundary Global Contexts

Setiawan, Hendra and Zhou, Bowen and Xiang, Bing and Shen, Libin

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	In columns 2 and 4, we report the BLEU scores , while in columns 3 and 5, we report the TER scores.
Experiments	Model 2 which conditions POL on OR provides an additional +0.2 BLEU improvement on BLEU score consistently across the two genres.
Experiments	The inclusion of explicit MOS modeling in Model 4 gives a significant BLEU score improvement of +0.5 but no TER improvement in newswire.

BLEU scores is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

11. Enlisting the Ghost: Modeling Empty Categories for Machine Translation

Xiang, Bing and Luo, Xiaoqiang and Zhou, Bowen

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Results	The BLEU scores from different systems are shown in Table 10 and Table 11, respectively.
Experimental Results	Preprocessing of the data with ECs inserted improves the BLEU scores by about 0.6 for newswire and 0.2 to 0.3 for the weblog data, compared to each baseline separately.
Experimental Results	Table 10: BLEU scores in the Hiero system.

BLEU scores is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

12. Translating Dialectal Arabic to English

Sajjad, Hassan and Darwish, Kareem and Belinkov, Yonatan

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Proposed Methods 3.1 Egyptian to EG’ Conversion	Phrase merging that preferred phrases learnt from EG’ data over AR data performed the best with a BLEU score of 16.96.
Proposed Methods 3.1 Egyptian to EG’ Conversion	In further analysis, we examined 1% of the sentences with the largest difference in BLEU score .
Proposed Methods 3.1 Egyptian to EG’ Conversion	Out of these, more than 70% were cases where the EG’ model achieved a higher BLEU score .

BLEU scores is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

BLEU (13)
parallel data (9)
LM (8)

13. Dirt Cheap Web-Scale Parallel Text from the Common Crawl

Smith, Jason R. and Saint-Amand, Herve and Plamada, Magdalena and Koehn, Philipp and Callison-Burch, Chris and Lopez, Adam

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Table 8: BLEU scores for several language pairs ‘ systems trained on data from WMT data.
Abstract	Table 9: BLEU scores for French-English and English-French before and after adding the mined parallel data to systems trained on data from WMT data including the French-English Gigaword (Callison-Burch et al., 2011).
Abstract	Table 12: BLEU scores for Spanish-English before and after adding the mined parallel data to a baseline Europarl system.

BLEU scores is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: