Index of papers in Proc. ACL 2012 that mention
  • BLEU score
He, Xiaodong and Deng, Li
Abstract
The training objective is an expected BLEU score , which is closely linked to translation quality.
Abstract
bold updating), the author proposed a local updating strategy where the model parameters are updated towards a pseudo-reference (i.e., the hypothesis in the n-best list that gives the best BLEU score ).
Abstract
In our work, we use the expectation of BLEU scores as the objective.
BLEU score is mentioned in 19 sentences in this paper.
Topics mentioned in this paper:
Sun, Hong and Zhou, Ming
Abstract
In addition, a revised BLEU score (called iBLEU) which measures the adequacy and diversity of the generated paraphrase sentence is proposed for tuning parameters in SMT systems.
Conclusion
Furthermore, a revised BLEU score that balances between paraphrase adequacy and dissimilarity is proposed in our training process.
Discussion
The first part of iBLEU, which is the traditional BLEU score , helps to ensure the quality of the machine translation results.
Experiments and Results
We show the BLEU score (computed against references) to measure the adequacy and self-BLEU (computed against source sentence) to evaluate the dissimilarity (lower is better).
Experiments and Results
From the results we can see that, when the value of a decreases to address more penalty on self-paraphrase, the self-BLEU score rapidly decays while the consequence effect is that BLEU score computed against references also drops seriously.
Experiments and Results
It is not capable with no joint learning or with the traditional BLEU score does not take self-paraphrase into consideration.
Introduction
The jointly-learned dual SMT system: (1) Adapts the SMT systems so that they are tuned specifically for paraphrase generation purposes, e. g., to increase the dissimilarity; (2) Employs a revised BLEU score (named iBLEU, as it’s an input-aware BLEU metric) that measures adequacy and dissimilarity of the paraphrase results at the same time.
Paraphrasing with a Dual SMT System
Two issues are also raised in (Zhao and Wang, 2010) about using automatic metrics: paraphrase changes less gets larger BLEU score and the evaluations of paraphrase quality and rate tend to be incompatible.
Paraphrasing with a Dual SMT System
(2005) have shown the capability for measuring semantic equivalency using BLEU score); BLEU (c, s) is the BLEU score computed between the candidate and the source sentence to measure the dissimilarity.
BLEU score is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Kolachina, Prasanth and Cancedda, Nicola and Dymetman, Marc and Venkatapathy, Sriram
Inferring a learning curve from mostly monolingual data
Our objective is to predict the evolution of the BLEU score on the given test set as a function of the size of a random subset of the training data
Inferring a learning curve from mostly monolingual data
We first train models to predict the BLEU score at m anchor sizes 81, .
Inferring a learning curve from mostly monolingual data
We then perform inference using these models to predict the BLEU score at each anchor, for the test case of interest.
Selecting a parametric family of curves
The values are on the same scale as the BLEU scores .
Selecting a parametric family of curves
BLEU scores .0 pl on
BLEU score is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Razmara, Majid and Foster, George and Sankaran, Baskaran and Sarkar, Anoop
Ensemble Decoding
In Section 4.2, we compare the BLEU scores of different mixture operations on a French-English experimental setup.
Ensemble Decoding
However, experiments showed changing the scores with the normalized scores hurts the BLEU score radically.
Ensemble Decoding
However, we did not try it as the BLEU scores we got using the normalization heuristic was not promissing and it would impose a cost in decoding as well.
Experiments & Results 4.1 Experimental Setup
Since the Hiero baselines results were substantially better than those of the phrase-based model, we also implemented the best-performing baseline, linear mixture, in our Hiero-style MT system and in fact it achieves the hights BLEU score among all the baselines as shown in Table 2.
Experiments & Results 4.1 Experimental Setup
This baseline is run three times the score is averaged over the BLEU scores with standard deviation of 0.34.
Experiments & Results 4.1 Experimental Setup
We also reported the BLEU scores when we applied the span-wise normalization heuristic.
BLEU score is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Pauls, Adam and Klein, Dan
Experiments
The BLEU scores for these outputs are 32.7, 27.8, and 20.8.
Experiments
In particular, their translations had a lower BLEU score , making their task easier.
Experiments
We see that our system prefers the reference much more often than the S-GRAM language model.11 However, we also note that the easiness of the task is correlated with the quality of translations (as measured in BLEU score ).
BLEU score is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Wu, Xianchao and Sudoh, Katsuhito and Duh, Kevin and Tsukada, Hajime and Nagata, Masaaki
Experiments
training data and not necessarily exactly follow the tendency of the final BLEU scores .
Experiments
For example, CCG is worse than Malt in terms of P/R yet with a higher BLEU score .
Experiments
Also, PAS+sem has a lower P/R than Berkeley, yet their final BLEU scores are not statistically different.
BLEU score is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Yang, Nan and Li, Mu and Zhang, Dongdong and Yu, Nenghai
Experiments
We compare their influence on RankingSVM accuracy, alignment crossing-link number, end-to-end BLEU score , and the model size.
Experiments
is RankingSVM accuracy in percentage on the training data; CLN is the crossing-link number per sentence on parallel corpus with automatically generated word alignment; BLEU is the BLEU score in percentage on web test set on Rank-IT setting (system with integrated rank reordering model); leacn means 11 most frequent lexicons in the training corpus.
Experiments
These features also correspond to BLEU score improvement for End-to-End evaluations.
BLEU score is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
He, Wei and Wu, Hua and Wang, Haifeng and Liu, Ting
Discussion
on BLEU score
Experiments
(00,-, 01,-) are selected for the extraction of paraphrase rules if two conditions are satisfied: (1) BLEU(eZi) — BLEU(eli) > 61, and (2) BLEU(eZi) > 62, where BLEU(-) is a function for computing BLEU score ; 61 and 62 are thresholds for balancing the rules number and the quality of paraphrase rules.
Extraction of Paraphrase Rules
If the sentence in T 2 has a higher BLEU score than the aligned sentence in T1, the corresponding sentences in S0 and S1 are selected as candidate paraphrase sentence pairs, which are used in the following steps of paraphrase extractions.
BLEU score is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: