Index of papers in Proc. ACL 2013 that mention
  • NIST
Ling, Wang and Xiang, Guang and Dyer, Chris and Black, Alan and Trancoso, Isabel
Experiments
We chose to use this data set, rather than more standard NIST test sets to ensure that we had recent documents in the test set (the most recent NIST test sets contain documents published in 2007, well before our microblog data was created).
Experiments
For this test set, we used 8 million sentences from the full NIST parallel dataset as the language model training data.
Experiments
FBIS 9.4 18.6 10.4 12.3 NIST 11.5 21.2 11.4 13.9 Weibo 8.75 15.9 15.7 17.2
Parallel Data Extraction
Likewise, for the EN-AR language pair, we use a fraction of the NIST dataset, by removing the data originated from UN, which leads to approximately 1M sentence pairs.
NIST is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Chen, Boxing and Kuhn, Roland and Foster, George
Abstract
Experiments on large scale NIST evaluation data show improvements over strong baselines: +1.8 BLEU on Arabic to English and +1.4 BLEU on Chinese to English over a non-adapted baseline, and significant improvements in most circumstances over baselines with linear mixture model adaptation.
Experiments
We carried out experiments in two different settings, both involving data from NIST Open MT 2012.2 The first setting is based on data from the Chinese to English constrained track, comprising about 283 million English running words.
Experiments
The development set (tune) was taken from the NIST 2005 evaluation set, augmented with some web-genre material reserved from other NIST corpora.
Experiments
Table 2: NIST Arabic-English data.
Vector space model adaptation
Table 1: NIST Chinese-English data.
NIST is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Hewavitharana, Sanjika and Mehay, Dennis and Ananthakrishnan, Sankaranarayanan and Natarajan, Prem
Abstract
On an English-to-Iraqi CSLT task, the proposed approach gives significant improvements over a baseline system as measured by BLEU, TER, and NIST .
Experimental Setup and Results
Table 1 summarizes test set performance in BLEU (Papineni et a1., 2001), NIST (Doddington, 2002) and TER (Snover et a1., 2006).
Experimental Setup and Results
In the ASR setting, which simulates a real-world deployment scenario, this system achieves improvements of 0.39 (BLEU), -0.6 (TER) and 0.08 ( NIST ).
Incremental Topic-Based Adaptation
1 REFERENCE TRANSCRIPTIONS SYSTEM 1 BLEUT 1 TER1 1 NISTT
Incremental Topic-Based Adaptation
SYSTEM 1 BLEUT 1 TER1 1 NISTT
Introduction
With this approach, we demonstrate significant improvements over a baseline phrase-based SMT system as measured by BLEU, TER and NIST scores on an English-to-Iraqi CSLT task.
NIST is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Zarriess, Sina and Kuhn, Jonas
Experiments
NIST , sentence-level n- gram overlap weighted in favour of less frequent n- grams, as in (Belz et al., 2011)
Experiments
score for the REG—>LIN system comes close to the upper bound that applies linearization on linSynJflae, gold shallow trees with gold REs (BLEUT of 72.4), whereas the difference in standard BLEU and NIST is high.
Experiments
Input System BLEU NIST BLEUT
NIST is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Green, Spence and Wang, Sida and Cer, Daniel and Manning, Christopher D.
Analysis
The mass is concentrated along the diagonal, probably because MT05/6/8 was prepared by NIST , an American agency, while the bitext was collected from many sources including Agence France Presse.
Experiments
4.3 NIST OpenMT Experiment
Experiments
However, the bitext5k models do not generalize as well to the NIST evaluation sets as represented by the MT04 result.
Introduction
The first experiment uses standard tuning and test sets from the NIST OpenMT competitions.
NIST is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Setiawan, Hendra and Zhou, Bowen and Xiang, Bing and Shen, Libin
Abstract
On NIST MT08 set, our most advanced model brings around +2.0 BLEU and -1.0 TER improvement.
Experiments
As for the blind test set, we report the performance on the NIST MT08 evaluation set, which consists of 691 sentences from newswire and 666 sentences from weblog.
Experiments
Table 4 summarizes the experimental results on NIST MT08 newswire and weblog.
Experiments
Table 4: The NIST MT08 results on newswire (nw) and weblog (wb) genres.
NIST is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Visweswariah, Karthik and Khapra, Mitesh M. and Ramanathan, Ananthakrishnan
Abstract
The data generated allows us to train a reordering model that gives an improvement of 1.8 BLEU points on the NIST MT—08 Urdu-English evaluation set over a reordering model that only uses manual word alignments, and a gain of 5.2 BLEU points over a standard phrase-based baseline.
Experimental setup
We use about 10K sentences (180K words) of manual word alignments which were created in house using part of the NIST MT—08 training data3 to train our baseline reordering model and to train our supervised machine aligners.
Experimental setup
We use a parallel corpus of 3.9M words consisting of 1.7M words from the NIST MT—08 training data set and 2.2M words extracted from parallel news stories on the
Experimental setup
We report results on the (four reference) NIST MT—08 evaluation set in Table 4 for the News and Web conditions.
NIST is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Braslavski, Pavel and Beloborodov, Alexander and Khalilov, Maxim and Sharoff, Serge
Evaluation methodology
In addition to human evaluation, we also ran system-level automatic evaluations using BLEU (Papineni et al., 2001), NIST (Doddington, 2002), METEOR (Banerjee and Lavie, 2005), TER (Snover et al., 2009), and GTM (Turian et al., 2003).
Results
The lower part of Table 2 also reports the results of simulated dynamic ranking (using the NIST rankings as the initial order for the sort operation).
Results
Sentence level Corpus Metric Median Mean Trimmed level BLEU 0.357 0.298 0.348 0.833 NIST 0.357 0.291 0.347 0.810 Meteor 0.429 0.348 0.393 0.714 TER 0.214 0.186 0.204 0.619 GTM 0.429 0.340 0.392 0.714
NIST is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Eidelman, Vladimir and Marton, Yuval and Resnik, Philip
Additional Experiments
For training, we used the non-UN portion of the NIST training corpora, which was segmented using an HMM segmenter (Lee et al., 2003).
Experiments
For training we used the non-UN and non-HK Hansards portions of the NIST training corpora, which was segmented using the Stanford segmenter (Tseng et al., 2005).
Experiments
We used cdec (Dyer et al., 2010) as our hierarchical phrase-based decoder, and tuned the parameters of the system to optimize BLEU (Papineni et al., 2002) on the NIST MT06 corpus.
NIST is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Feng, Yang and Cohn, Trevor
Experiments
Here the training data consists of the non-UN portions and non-HK Hansards portions of the NIST training corpora distributed by the LDC, totalling 303k sentence pairs with 8m and 9.4m words of Chinese and English, respectively.
Experiments
For the development set we use the NIST 2002 test set, and evaluate performance on the test sets from NIST 2003
Experiments
We evaluate on the NIST test sets from 2003 and 2005, and the 2002 test set was used for MERT training.
NIST is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Liu, Yang
Abstract
As our approach combines the merits of phrase-based and string-to-dependency models, it achieves significant improvements over the two baselines on the NIST Chinese-English datasets.
Introduction
We evaluate our method on the NIST Chinese-English translation datasets.
Introduction
We used the 2002 NIST MT Chinese-English dataset as the development set and the 2003-2005 NIST datasets as the testsets.
NIST is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Nguyen, ThuyLinh and Vogel, Stephan
Experiment Results
The language model is the interpolation of 5-gram language models built from news corpora of the NIST 2012 evaluation.
Experiment Results
We tuned the parameters on the MT06 NIST test set (1664 sentences) and report the BLEU scores on three unseen test sets: MT04 (1353 sentences), MT05 (1056 sentences) and MT09 (1313 sentences).
Experiment Results
We tuned the parameters on MT06 NIST test set of 1664 sentences and report the results of MT04, MT05 and MT08 unseen test sets.
NIST is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: