Index of papers in Proc. ACL 2012 that mention
  • BLEU points
He, Xiaodong and Deng, Li
Abstract
The proposed method, evaluated on the Europarl German-to-English dataset, leads to a 1.1 BLEU point improvement over a state-of-the-art baseline translation system.
Abstract
Experiments on the Europarl German-to-English dataset show that the proposed method leads to a 1.1 BLEU point improvement over a strong baseline.
Abstract
Experimental results showed that their approach outperformed a baseline by 0.8 BLEU point when using monotonic decoding, but there was no
BLEU points is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Kolachina, Prasanth and Cancedda, Nicola and Dymetman, Marc and Venkatapathy, Sriram
Inferring a learning curve from mostly monolingual data
As an example, the model estimated using Lasso for the 75K anchor size eXhibits a root mean squared error of 6 BLEU points .
Inferring a learning curve from mostly monolingual data
The average distance is on the same scale as the BLEU score, which suggests that our best curves can predict the gold curve within 1.5 BLEU points on average (the best result being 0.7 BLEU points when the initial points are lK-SK-lOK-ZOK) which is a telling result.
Inferring a learning curve from mostly monolingual data
For the cases where a slightly larger in-domain “seed” parallel corpus is available, we introduced an extrapolation method and a combined method yielding high-precision predictions: using models trained on up to 20K sentence pairs we can predict performance on a given test set with a root mean squared error in the order of l BLEU point at 75K sentence pairs, and in the order of 2-4 BLEU points at 500K.
Introduction
They show that without any parallel data we can predict the expected translation accuracy at 75K segments within an error of 6 BLEU points (Table 4), while using a seed training corpus of 10K segments narrows this error to within 1.5 points (Table 6).
BLEU points is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Xiao, Xinyan and Xiong, Deyi and Zhang, Min and Liu, Qun and Lin, Shouxun
Experiments
By using all the features (last line in the table), we improve the translation performance over the baseline system by 0.87 BLEU point on average.
Experiments
We clearly find that the two rule-topic distributions improve the performance by 0.48 and 0.38 BLEU points over the baseline respectively.
Experiments
Our topic similarity method on monotone rule achieves the most improvement which is 0.6 BLEU points , while the improvement on reordering rules is the smallest among the three types.
Introduction
Experiments on Chinese-English translation tasks (Section 6) show that, our method outperforms the baseline hierarchial phrase-based system by +0.9 BLEU points .
BLEU points is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Xiong, Deyi and Zhang, Min and Li, Haizhou
Experiments
0 The proposed predicate translation models achieve an average improvement of 0.57 BLEU points across the two NIST test sets when all features (lex+sem) are used.
Experiments
0 When we integrate both lexical and semantic features (lex+sem) described in Section 3.2, we obtain an improvement of about 0.33 BLEU points over the system where only lexical features (lex) are used.
Experiments
We obtain an average improvement of 0.4 BLEU points on the two test sets over the baseline when we incorporate the proposed argument reordering model into our system.
BLEU points is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Razmara, Majid and Foster, George and Sankaran, Baskaran and Sarkar, Anoop
Conclusion & Future Work
We showed that this approach can gain up to 2.2 BLEU points over its concatenation baseline and 0.39 BLEU points over a powerful mixture model.
Experiments & Results 4.1 Experimental Setup
In particular, Switching:Max could gain up to 2.2 BLEU points over the concatenation baseline and 0.39 BLEU points over the best performing baseline (i.e.
Experiments & Results 4.1 Experimental Setup
lowest score among the mixture operations, however after tuning, it learns to bias the weights towards one of the models and hence improves by 1.31 BLEU points .
BLEU points is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Simianer, Patrick and Riezler, Stefan and Dyer, Chris
Experiments
However, scaling all features to the full training set shows significant improvements for algorithm 3, and especially for algorithm 4, which gains 0.8 BLEU points over tuning 12 features on the development set.
Experiments
Here tuning large feature sets on the respective dev sets yields significant improvements of around 2 BLEU points over tuning the 12 default features on the dev sets.
Experiments
Another 0.5 BLEU points (test-crawll 1) or even 1.3 BLEU points (test-crawl 10) are gained when scaling to the full training set using iterative features selection.
BLEU points is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: