Index of papers in Proc. ACL that mention

NIST

Seen in text as:

NIST (355)
NISTR (5)

Seen in 327 sentences in 57 papers.

1. Bilingual Sense Similarity for Statistical Machine Translation

Chen, Boxing and Foster, George and Kuhn, Roland

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Analysis and Discussion	CE_LD CE_SD testset ( NIST ) ’06 ’08 ’06 ’08
Analysis and Discussion	Table 4: Results (BLEU%) of Chinese—to—English large data (CE_LD) and small data (CE_SD) NIST task by applying one feature.
Analysis and Discussion	Table 6: Results (BLEU%) of using simple features based on context on small data NIST task.
Experiments	The first one is the large data condition, based on training data for the NIST 2 2009 evaluation Chinese-to-English track.
Experiments	We first created a development set which used mainly data from the NIST 2005 test set, and also some balanced-genre web-text from the NIST training material.
Experiments	Evaluation was performed on the NIST 2006 and 2008 test sets.

NIST is mentioned in 11 sentences in this paper.

Topics mentioned in this paper:

2. Collaborative Decoding: Partial Hypothesis Re-ranking Using Translation Consensus between Decoders

Li, Mu and Duan, Nan and Zhang, Dongdong and Li, Chi-Ho and Zhou, Ming

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Experimental results on data sets for NIST Chinese-to—English machine translation task show that the co-decoding method can bring significant improvements to all baseline decoders, and the outputs from co-decoding can be used to further improve the result of system combination.
Experiments	We conduct our experiments on the test data from the NIST 2005 and NIST 2008 Chinese-to-English machine translation tasks.
Experiments	The NIST 2003 test data is used for development data to estimate model parameters.
Experiments	In our experiments all the models are optimized with case-insensitive NIST version of BLEU score and we report results using this metric in percentage numbers.
Introduction	We will present experimental results on the data sets of NIST Chinese-to-English machine translation task, and demonstrate that co-decoding can bring significant improvements to baseline systems.

NIST is mentioned in 16 sentences in this paper.

Topics mentioned in this paper:

3. Contradictions and Justifications: Extensions to the Textual Entailment Task

Voorhees, Ellen M.

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

The Three-way Decision Task	The answer key for the three-way decision task was developed at the National Institute of Standards and Technology ( NIST ) using annotators who had experience as TREC and DUC assessors.
The Three-way Decision Task	NIST assessors annotated all 800 entailment pairs in the test set, with each pair independently annotated by two different assessors.
The Three-way Decision Task	The three-way answer key was formed by keeping exactly the same set of YES answers as in the two-way key (regardless of the NIST annotations) and having NIST staff adjudicate assessor differences on the remainder.

NIST is mentioned in 17 sentences in this paper.

Topics mentioned in this paper:

NIST (17)

4. Fast and Robust Neural Network Joint Models for Statistical Machine Translation

Devlin, Jacob and Zbib, Rabih and Huang, Zhongqiang and Lamar, Thomas and Schwartz, Richard and Makhoul, John

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	On the NIST OpenMT12 Arabic-English condition, the NNJ M features produce a gain of +3.0 BLEU on top of a powerful, feature-rich baseline which already includes a target-only NNLM.
Introduction	We show primary results on the NIST OpenMT12 Arabic-English condition.
Introduction	We also show strong improvements on the NIST OpenMT12 Chinese-English task, as well as the DARPA BOLT (Broad Operational Language Translation) Arabic-English and Chinese-English conditions.
Model Variations	For Arabic word tokenization, we use the MADA-ARZ tokenizer (Habash et al., 2013) for the BOLT condition, and the Sakhr9 tokenizer for the NIST condition.
Model Variations	We present MT primary results on Arabic-English and Chinese-English for the NIST OpenMT12 and DARPA BOLT conditions.
Model Variations	6.1 NIST OpenMT12 Results

NIST is mentioned in 13 sentences in this paper.

Topics mentioned in this paper:

5. Robust Machine Translation Evaluation with Entailment Features

Pado, Sebastian and Galley, Michel and Jurafsky, Dan and Manning, Christopher D.

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We compare this metric against a combination metric of four state—of—the—art scores (BLEU, NIST , TER, and METEOR) in two different settings.
EXpt. 1: Predicting Absolute Scores	Our first experiment evaluates the models we have proposed on a corpus with traditional annotation on a seven-point scale, namely the NIST OpenMT 2008 corpus.4 The corpus contains translations of newswire teXt into English from three source languages (Arabic (Ar), Chinese (Ch), Urdu (Ur)).
EXpt. 1: Predicting Absolute Scores	BLEUR, METEORR, and NISTR significantly predict one language each (all Arabic); TERR, MTR, and RTER predict two languages.
Experimental Evaluation	NISTR consists of 16 features.
Experimental Evaluation	NIST-n scores (1 g n g 10) and information-weighted n-gram precision scores (1 g n g 4); NIST brevity penalty (BP); and NIST score divided by BP.
Expt. 2: Predicting Pairwise Preferences	1: Among individual metrics, METEORR and TERR do better than BLEUR and NISTR .
Expt. 2: Predicting Pairwise Preferences	NISTR 50.2 70.4
Expt. 2: Predicting Pairwise Preferences	Again, we see better results for METEORR and TERR than for BLEUR and NISTR , and the individual metrics do worse than the combination models.
Introduction	Since human evaluation is costly and difficult to do reliably, a major focus of research has been on automatic measures of MT quality, pioneered by BLEU (Papineni et a1., 2002) and NIST (Doddington, 2002).
Introduction	BLEU and NIST measure MT quality by using the strong correlation between human judgments and the degree of n-gram overlap between a system hypothesis translation and one or more reference translations.

NIST is mentioned in 10 sentences in this paper.

Topics mentioned in this paper:

6. Learning Topic Representation for SMT with Neural Networks

Cui, Lei and Zhang, Dongdong and Liu, Shujie and Chen, Qiming and Li, Mu and Zhou, Ming and Yang, Muyun

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Experimental results show that our method significantly improves translation accuracy in the NIST Chinese-to-English translation task compared to a state-of-the-art baseline.
Experiments	The NIST 2003 dataset is the development data.
Experiments	The testing data consists of NIST 2004, 2005, 2006 and 2008 datasets.
Experiments	NIST 2004
Introduction	We integrate topic similarity features in the log-linear model and evaluate the performance on the NIST Chinese-to-English translation task.

NIST is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

7. Learning Translation Consensus with Structured Label Propagation

Liu, Shujie and Li, Chi-Ho and Li, Mu and Zhou, Ming

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Experimental results show that, our method can significantly improve machine translation performance on both IWSLT and NIST data, compared with a state-of-the-art baseline.
Conclusion and Future Work	We conduct experiments on IWSLT and NIST data, and our method can improve the performance significantly.
Experiments and Results	We test our method with two data settings: one is IWSLT data set, the other is NIST data set.
Experiments and Results	For the NIST data set, the bilingual training data we used is NIST 2008 training set excluding the Hong Kong Law and Hong Kong Hansard.
Experiments and Results	The baseline results on NIST data are shown in Table 2.
Introduction	We conduct experiments with IWSLT and NIST data, and experimental results show that, our method

NIST is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

8. Efficient Minimum Error Rate Training and Minimum Bayes-Risk Decoding for Translation Hypergraphs and Lattices

Kumar, Shankar and Macherey, Wolfgang and Dyer, Chris and Och, Franz

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Discussion	Table 5: MBR Parameter Tuning on NIST systems
Experiments	The first one is the constrained data track of the NIST Arabic-to-English (aren) and Chinese-to-English (zhen) translation taskl.
Experiments	Table 1: Statistics over the NIST dev/test sets.
Experiments	Our development set (dev) consists of the NIST 2005 eval set; we use this set for optimizing MBR parameters.

NIST is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

n-gram (21)
BLEU (20)
phrase-based (11)

9. Using Discourse Structure Improves Machine Translation Evaluation

Guzmán, Francisco and Joty, Shafiq and Màrquez, Llu'is and Nakov, Preslav

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Results	Group III: contains other important evaluation metrics, which were not considered in the WMT12 metrics task: NIST and ROUGE for both system- and segment-level, and BLEU and TER at segment-level.
Experimental Results	NIST .817 .842 .875
Experimental Results	NIST .214 .172 .206 ROUGE .185 .144 .201
Experimental Setup	To complement the set of individual metrics that participated at the WMT12 metrics task, we also computed the scores of other commonly-used evaluation metrics: BLEU (Papineni et al., 2002), NIST (Doddington, 2002), TER (Snover et al., 2006), ROUGE-W (Lin, 2004), and three METEOR variants (Denkowski and Lavie, 2011): METEOR-ex (exact match), METEOR-st (+stemming) and METEOR-sy (+synonyms).
Experimental Setup	Combination of five metrics based on lexical similarity: BLEU, NIST , METEOR-ex, ROUGE-W, and TERp-A.
Related Work	The field of automatic evaluation metrics for MT is very active, and new metrics are continuously being proposed, especially in the context of the evaluation campaigns that run as part of the Workshops on Statistical Machine Translation (WMT 2008-2012), and NIST Metrics for Machine Translation Challenge (MetricsMATR), among others.

NIST is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

10. Smaller Alignment Models for Better Translations: Unsupervised Word Alignment with the l0-norm

Vaswani, Ashish and Huang, Liang and Chiang, David

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	To demonstrate the effect of the {O-norm on the IBM models, we performed experiments on four translation tasks: Arabic-English, Chinese-English, and Urdu-English from the NIST Open MT Evaluation, and the Czech-English translation from the Workshop on Machine Translation (WMT) shared task.
Experiments	0 Chinese-English: selected data from the constrained task of the NIST 2009 Open MT Evaluation.3
Experiments	o Arabic-English: all available data for the constrained track of NIST 2009, excluding United Nations proceedings (LDC2004E13), ISI Automatically Extracted Parallel Text (LDC2007E08), and Ummah newswire text (LDC2004T18), for a total of 5.4+4.3 million words.

NIST is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

11. Boosting-Based System Combination for Machine Translation

Xiao, Tong and Zhu, Jingbo and Zhu, Muhua and Wang, Huizhen

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	The experimental results on three NIST evaluation test sets show that our method leads to significant improvements in translation accuracy over the baseline systems.
Background	2 In this paper, we use the NIST definition of BLEU where the effective reference length is the length of the shortest reference translation.
Background	The data set used for weight training in boosting-based system combination comes from NIST MTO3 evaluation set.
Background	The test sets are the NIST evaluation sets of MTO4, MTOS and MTO6.
Introduction	All the systems are evaluated on three NIST MT evaluation test sets.

NIST is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

12. MAXSIM: A Maximum Similarity Metric for Machine Translation Evaluation

Chan, Yee Seng and Ng, Hwee Tou

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Metric Design Considerations	To evaluate our metric, we conduct experiments on datasets from the ACL-07 MT workshop and NIST
Metric Design Considerations	Table 4: Correlations on the NIST MT 2003 dataset.
Metric Design Considerations	5.2 NIST MT 2003 Dataset

NIST is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

unigram (24)
BLEU (20)
bigrams (16)

13. Learning to Translate with Multiple Objectives

Duh, Kevin and Sudoh, Katsuhito and Wu, Xianchao and Tsukada, Hajime and Nagata, Masaaki

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	(2) The NIST task is Chinese-to-English translation with OpenMT08 training data and MT06 as devset.
Experiments	Train Devset #Feat Metrics PubMed 0.2M 2k 14 BLEU, RIBES NIST 7M 1.6k 8 BLEU,NTER
Experiments	Our MT models are trained with standard phrase-based Moses software (Koehn and others, 2007), with IBM M4 alignments, 4gram SRILM, leXical ordering for PubMed and distance ordering for the NIST system.
Introduction	Experiments on NIST Chinese-English and PubMed English-Japanese translation using BLEU, TER, and RIBES are presented in Section 4.

NIST is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

14. Microblogs as Parallel Corpora

Ling, Wang and Xiang, Guang and Dyer, Chris and Black, Alan and Trancoso, Isabel

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	We chose to use this data set, rather than more standard NIST test sets to ensure that we had recent documents in the test set (the most recent NIST test sets contain documents published in 2007, well before our microblog data was created).
Experiments	For this test set, we used 8 million sentences from the full NIST parallel dataset as the language model training data.
Experiments	FBIS 9.4 18.6 10.4 12.3 NIST 11.5 21.2 11.4 13.9 Weibo 8.75 15.9 15.7 17.2
Parallel Data Extraction	Likewise, for the EN-AR language pair, we use a fraction of the NIST dataset, by removing the data originated from UN, which leads to approximately 1M sentence pairs.

NIST is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

15. The Contribution of Linguistic Features to Automatic Machine Translation Evaluation

Amigó, Enrique and Giménez, Jesús and Gonzalo, Julio and Verdejo, Felisa

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Alternatives to Correlation-based Meta-evaluation	NIST 5 .70 randOST 5 .20 minOST 3.67
Correlation with Human Judgements	nist .
Metrics and Test Beds	At the lexical level, we have included several standard metrics, based on different similarity assumptions: edit distance (WER, PER and TER), lexical precision (BLEU and NIST ), lexical recall (ROUGE), and F-measure (GTM and METEOR).
Metrics and Test Beds	Table 1: NIST 2004/2005 MT Evaluation Campaigns.
Metrics and Test Beds	We use the test beds from the 2004 and 2005 NIST MT Evaluation Campaigns (Le and Przy-bocki, 2005)2.
Previous Work on Machine Translation Meta-Evaluation	With the aim of overcoming some of the deficiencies of BLEU, Doddington (2002) introduced the NIST metric.
Previous Work on Machine Translation Meta-Evaluation	Lin and Och (2004) experimented, unlike previous works, with a wide set of metrics, including NIST , WER (NieBen et al., 2000), PER (Tillmann et al., 1997), and variants of ROUGE, BLEU and GTM.

NIST is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

16. Unsupervised Translation Induction for Chinese Abbreviations using Monolingual Corpora

Li, Zhifei and Yarowsky, David

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We integrate our method into a state-of-the-art baseline translation system and show that it consistently improves the performance of the baseline system on various NIST MT test sets.
Conclusions	We integrate our method into a state-of-the-art phrase-based baseline translation system, i.e., Moses (Koehn et al., 2007), and show that the integrated system consistently improves the performance of the baseline system on various NIST machine translation test sets.
Experimental Results	We compile a parallel dataset which consists of various corpora distributed by the Linguistic Data Consortium (LDC) for NIST MT evaluation.
Experimental Results	4.5.2 BLEU on NIST MT Test Sets
Experimental Results	Table 7 reports the results on various NIST MT test sets.
Introduction	We carry out experiments on a state-of-the-art SMT system, i.e., Moses (Koehn et al., 2007), and show that the abbreviation translations consistently improve the translation performance (in terms of BLEU (Papineni et al., 2002)) on various NIST MT test sets.

NIST is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

17. Pseudo-Word for Phrase-Based Machine Translation

Duan, Xiangyu and Zhang, Min and Li, Haizhou

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments and Results	Additionally, NIST score (Dod-dington, 2002) and METEOR (Banerjee and La-vie, 2005) are also used to check the consistency of experimental results.
Experiments and Results	BLEU 0.4029 0.3146 NIST 7.0419 8.8462 METEOR 0.5785 0.5335
Experiments and Results	Both SMP and ESSP outperform baseline consistently in BLEU, NIST and METEOR.

NIST is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

18. Automatic Evaluation of Linguistic Quality in Multi-Document Summarization

Pitler, Emily and Louis, Annie and Nenkova, Ani

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We train and test linguistic quality models on consecutive years of NIST evaluation data in order to show the generality of results.
Conclusion	Automatic evaluation will make testing easier during system development and enable reporting results obtained outside of the cycles of NIST evaluation.
Introduction	quality and none have been validated on data from NIST evaluations.
Introduction	We evaluate the predictive power of these linguistic quality metrics by training and testing models on consecutive years of NIST evaluations (data described
Results and discussion	In both DUC 2006 and DUC 2007, ten NIST assessors wrote summaries for the various inputs.
Results and discussion	We only report results on the input level, as we are interested in distinguishing between the quality of the summaries, not the NIST assessors’ writing skills.

NIST is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

19. Topological Ordering of Function Words in Hierarchical Phrase-based Translation

Setiawan, Hendra and Kan, Min Yen and Li, Haizhou and Resnik, Philip

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Setup	We trained the system on the NIST MT06 Eval corpus excluding the UN data (approximately 900K sentence pairs).
Experimental Setup	We used the NIST MT03 test set as the development set for optimizing interpolation weights using minimum error rate training (MERT; (Och and Ney, 2002)).
Experimental Setup	We carried out evaluation of the systems on the NIST 2006 evaluation test (MT06) and the NIST 2008 evaluation test (MT08).

NIST is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

20. Combining Referring Expression Generation and Surface Realization: A Corpus-Based Investigation of Architectures

Zarriess, Sina and Kuhn, Jonas

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	NIST , sentence-level n- gram overlap weighted in favour of less frequent n- grams, as in (Belz et al., 2011)
Experiments	score for the REG—>LIN system comes close to the upper bound that applies linearization on linSynJflae, gold shallow trees with gold REs (BLEUT of 72.4), whereas the difference in standard BLEU and NIST is high.
Experiments	Input System BLEU NIST BLEUT

NIST is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

21. A Sense-Based Translation Model for Statistical Machine Translation

Xiong, Deyi and Zhang, Min

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion	o The sense-based translation model is able to substantially improve translation quality in terms of both BLEU and NIST .
Experiments	We used the NIST MT03 evaluation test data as our development set, and the NIST MT05 as the test set.
Experiments	We evaluated translation quality with the case-insensitive BLEU-4 (Papineni et al., 2002) and NIST (Doddington, 2002).
Experiments	System BLEU(%) NIST STM (i5w) 34.64 9.4346 STM (i10w) 34.76 9.5114 STM (i15w) - -

NIST is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

22. Incremental Topic-Based Translation Model Adaptation for Conversational Spoken Language Translation

Hewavitharana, Sanjika and Mehay, Dennis and Ananthakrishnan, Sankaranarayanan and Natarajan, Prem

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	On an English-to-Iraqi CSLT task, the proposed approach gives significant improvements over a baseline system as measured by BLEU, TER, and NIST .
Experimental Setup and Results	Table 1 summarizes test set performance in BLEU (Papineni et a1., 2001), NIST (Doddington, 2002) and TER (Snover et a1., 2006).
Experimental Setup and Results	In the ASR setting, which simulates a real-world deployment scenario, this system achieves improvements of 0.39 (BLEU), -0.6 (TER) and 0.08 ( NIST ).
Incremental Topic-Based Adaptation	1 REFERENCE TRANSCRIPTIONS SYSTEM 1 BLEUT 1 TER1 1 NISTT
Incremental Topic-Based Adaptation	SYSTEM 1 BLEUT 1 TER1 1 NISTT
Introduction	With this approach, we demonstrate significant improvements over a baseline phrase-based SMT system as measured by BLEU, TER and NIST scores on an English-to-Iraqi CSLT task.

NIST is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

23. Vector Space Model for Adaptation in Statistical Machine Translation

Chen, Boxing and Kuhn, Roland and Foster, George

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Experiments on large scale NIST evaluation data show improvements over strong baselines: +1.8 BLEU on Arabic to English and +1.4 BLEU on Chinese to English over a non-adapted baseline, and significant improvements in most circumstances over baselines with linear mixture model adaptation.
Experiments	We carried out experiments in two different settings, both involving data from NIST Open MT 2012.2 The first setting is based on data from the Chinese to English constrained track, comprising about 283 million English running words.
Experiments	The development set (tune) was taken from the NIST 2005 evaluation set, augmented with some web-genre material reserved from other NIST corpora.
Experiments	Table 2: NIST Arabic-English data.
Vector space model adaptation	Table 1: NIST Chinese-English data.

NIST is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

24. Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machine Translation

Lu, Shixiang and Chen, Zhenbiao and Xu, Bo

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	On two Chinese-English tasks, our semi-supervised DAE features obtain statistically significant improvements of l.34/2.45 (IWSLT) and 0.82/1.52 ( NIST ) BLEU points over the unsupervised DBN features and the baseline features, respectively.
Conclusions	The results also demonstrate that DNN (DAE and HCDAE) features are complementary to the original features for SMT, and adding them together obtain statistically significant improvements of 3.16 (IWSLT) and 2.06 ( NIST ) BLEU points over the baseline features.
Experiments and Results	NIST .
Experiments and Results	Our development set is NIST 2005 MT evaluation set (1084 sentences), and our test set is NIST 2006 MT evaluation set (1664 sentences).
Experiments and Results	Adding new DNN features as extra features significantly improves translation accuracy (row 2-17 vs. 1), with the highest increase of 2.45 (IWSLT) and 1.52 ( NIST ) (row 14 vs. 1) BLEU points over the baseline features.
Introduction	Finally, we conduct large-scale experiments on IWSLT and NIST Chinese-English translation tasks, respectively, and the results demonstrate that our solutions solve the two aforementioned shortcomings successfully.

NIST is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

25. Topic Models for Dynamic Translation Model Adaptation

Eidelman, Vladimir and Boyd-Graber, Jordan and Resnik, Philip

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	The second setting uses the non-UN and non-HK Hansards portions of the NIST training corpora with LTM only.
Experiments	En Zh FBIS 269K 10.3M 7.9M NIST 1.6M 44.4M 40.4M
Experiments	2010) as our decoder, and tuned the parameters of the system to optimize BLEU (Papineni et al., 2002) on the NIST MT06 tuning corpus using the Margin Infused Relaxed Algorithm (MIRA) (Crammer et al., 2006; Eidelman, 2012).

NIST is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

26. Toward Better Chinese Word Segmentation for SMT via Bilingual Constraints

Zeng, Xiaodong and Chao, Lidia S. and Wong, Derek F. and Trancoso, Isabel and Tian, Liang

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	We adopted three state-of-the-art metrics, BLEU (Papineni et al., 2002), NIST (Doddington et al., 2000) and METEOR (Banerjee and Lavie, 2005), to evaluate the translation quality.
Experiments	The NIST evaluation campaign data, MT—03 and MT-05, are selected to comprise the MT development data, devMT, and testing data, testMT, respectively.
Experiments	NIST and METEOR over others.

NIST is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

CRFs (30)
segmentations (18)
treebank (16)

27. Polylingual Tree-Based Topic Models for Translation Domain Adaptation

Hu, Yuening and Zhai, Ke and Eidelman, Vladimir and Boyd-Graber, Jordan

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Dataset and SMT Pipeline We use the NIST MT Chinese-English parallel corpus (NIS T), excluding non-UN and non-HK Hansards portions as our training dataset.
Experiments	To optimize SMT system, we tune the parameters on NIST MT06, and report results on three test sets: MT02, MT03 and MT05.2
Experiments	Resources for Prior Tree To build the tree for tLDA and ptLDA, we extract the word correlations from a Chinese-English bilingual dictionary (Denisowski, 1997).4 We filter the dictionary using the NIST vocabulary, and keep entries mapping single Chinese and single English words.

NIST is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

28. Joint Learning of a Dual SMT System for Paraphrase Generation

Sun, Hong and Zhou, Ming

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Our experiments on NIST 2008 testing data with automatic evaluation as well as human judgments suggest that the proposed method is able to enhance the paraphrase quality by adjusting between semantic equivalency and surface dissimilarity.
Experiments and Results	We use 2003 NIST Open Machine Translation Evaluation data (NIST 2003) as development data (containing 919 sentences) for MERT and test the performance on NIST 2008 data set (containing 1357 sentences).
Experiments and Results	NIST Chinese-to-English evaluation data offers four English human translations for every Chinese sentence.
Experiments and Results	Table 1: iBLEU Score Results( NIST 2008)
Introduction	We test our method on NIST 2008 testing data.

NIST is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

SMT systems (18)
BLEU (13)
BLEU score (10)

29. A New String-to-Dependency Machine Translation Algorithm with a Target Dependency Language Model

Shen, Libin and Xu, Jinxi and Weischedel, Ralph

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusions and Future Work	Our string-to-dependency system generates 80% fewer rules, and achieves 1.48 point improvement in BLEU and 2.53 point improvement in TER on the decoding output on the NIST 04 Chinese-English evaluation set.
Experiments	We used part of the NIST 2006 Chinese-English large track data as well as some LDC corpora collected for the DARPA GALE program (LDC2005E83, LDC2006E34 and LDC2006G05) as our bilingual training data.
Experiments	We tuned the weights on NIST MT05 and tested on MT04.
Introduction	For example, Chiang (2007) showed that the Hiero system achieved about 1 to 3 point improvement in BLEU on the NIST 03/04/05 Chinese-English evaluation sets compared to a start-of-the-art phrasal system.
Introduction	Our string-to-dependency decoder shows 1.48 point improvement in BLEU and 2.53 point improvement in TER on the NIST 04 Chinese-English MT evaluation set.

NIST is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

30. Quadratic-Time Dependency Parsing for Machine Translation

Galley, Michel and Manning, Christopher D.

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Our results show that augmenting a state-of-the-art phrase-based system with this dependency language model leads to significant improvements in TER (0.92%) and BLEU (0.45%) scores on five NIST Chinese-English evaluation test sets.
Introduction	petitive phrase-based systems in large-scale experiments such as NIST evaluations.2 This lack of significant difference may not be completely surprising.
Introduction	2Results of the 2008 NIST Open MT evaluation (http://www.itl.nist.gov/iad/mig/tests/mt/2008/doc/ mt08_official_results_v0 .html) reveal that, while many of the best systems in the Chinese-English and Arabic-English tasks incorporate synchronous CFG models, score differences with the best phrase-based system were insignificantly small.
Machine translation experiments	For tuning and testing, we use the official NIST MT evaluation data for Chinese from 2002 to 2008 (MT02 to MT08), which all have four English references for each input sentence.
Machine translation experiments	Table 6 provides experimental results on the NIST test data (excluding the tuning set MTOS) for each of the three genres: newswire, web data, and speech (broadcast news and conversation).

NIST is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

31. Character-Level Machine Translation Evaluation for Languages with Ambiguous Word Boundaries

Liu, Chang and Ng, Hwee Tou

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Table 1: Inter-judge Kappa for the NIST 2008 English—Chinese task
Experiments	4.2 NIST 2008 English-Chinese MT Task
Experiments	The NIST 2008 English-Chinese MT task consists of 127 documents with 1,830 segments, each with four reference translations and eleven automatic MT system translations.
Introduction	The work compared various MT evaluation metrics (BLEU, NIST , METEOR, GTM, 1 — TER) with different segmentation schemes, and found that treating every single character as a token (character-level MT evaluation) gives the best correlation with human judgments.

NIST is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

BLEU (19)
word-level (15)
n-grams (12)

32. Two-Neighbor Orientation Model with Cross-Boundary Global Contexts

Setiawan, Hendra and Zhou, Bowen and Xiang, Bing and Shen, Libin

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	On NIST MT08 set, our most advanced model brings around +2.0 BLEU and -1.0 TER improvement.
Experiments	As for the blind test set, we report the performance on the NIST MT08 evaluation set, which consists of 691 sentences from newswire and 666 sentences from weblog.
Experiments	Table 4 summarizes the experimental results on NIST MT08 newswire and weblog.
Experiments	Table 4: The NIST MT08 results on newswire (nw) and weblog (wb) genres.

NIST is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

33. Enriching Morphologically Poor Languages for Statistical Machine Translation

Avramidis, Eleftherios and Koehn, Philipp

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Results were evaluated with both BLEU (Papineni et al., 2001) and NIST metrics ( NIST , 2002).
Experiments	BLEU NIST set devtest test07 devtest test07 baseline 18.13 18.05 5.218 5.279 person 18.16 18.17 5.224 5.316
Experiments	The NIST metric clearly shows a significant improvement, because it mostly measures difficult n-gram matches (e. g. due to the long-distance rules we have been dealing with).

NIST is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

34. A Hybrid Approach to Skeleton-based Translation

Xiao, Tong and Zhu, Jingbo and Zhang, Chunliang

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We apply our approach to a state-of-the-art phrase-based system and demonstrate very promising BLEU improvements and TER reductions on the NIST Chinese-English MT evaluation data.
Conclusion and Future Work	The experimental results show that the proposed approach achieves very promising BLEU improvements and TER reductions on the NIST evaluation data.
Evaluation	We used the newswire portion of the NIST MT06 evaluation data as our development set, and used the evaluation data of MT04 and MTOS as our test sets.
Introduction	0 We apply the proposed model to Chinese-English phrase-based MT and demonstrate promising BLEU improvements and TER reductions on the NIST evaluation data.

NIST is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

35. Randomized Language Models via Perfect Hash Functions

Talbot, David and Brants, Thorsten

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	We run an improved version of our 2006 NIST MT Evaluation entry for the Arabic-English “Unlimited” data track.6 The language model is the same one as in the previous section.
Experiments	We use MT04 data for system development, with MT05 data and MT06 ( “NIST” subset) data for blind testing.
Experiments	Overall, our baseline results compare favorably to those reported on the NIST MT06 web site.

NIST is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

36. A Tree Sequence Alignment-based Tree-to-Tree Translation Model

Zhang, Min and Jiang, Hongfei and Aw, Aiti and Li, Haizhou and Tan, Chew Lim and Li, Sheng

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Experimental results on the NIST MT-2005 Chinese-English translation task show that our method statistically significantly outperforms the baseline systems.
Conclusions and Future Work	The experimental results on the NIST MT-2005 Chinese-English translation task demonstrate the effectiveness of the proposed model.
Experiments	We used sentences with less than 50 characters from the NIST MT-2002 test set as our development set and the NIST MT-2005 test set as our test set.
Introduction	Experiment results on the NIST MT-2005 Chinese-English translation task show that our method significantly outperforms Moses (Koehn et al., 2007), a state-of-the-art phrase-based SMT system, and other linguistically syntax-based methods, such as SCFG-based and STSG-based methods (Zhang et al., 2007).

NIST is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

37. Forest-based Tree Sequence to String Translation Model

Zhang, Hui and Zhang, Min and Li, Haizhou and Aw, Aiti and Tan, Chew Lim

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Experimental results on the NIST MT-2003 Chinese-English translation task show that our method statistically significantly outperforms the four baseline systems.
Conclusion	Finally, we examine our methods on the FBIS corpus and the NIST MT-2003 Chinese-English translation task.
Experiment	We use the FBIS corpus as training set, the NIST MT-2002 test set as development (deV) set and the NIST MT-2003 test set as test set.
Introduction	We evaluate our method on the NIST MT-2003 Chinese-English translation tasks.

NIST is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

38. Dependency-based Pre-ordering for Chinese-English Machine Translation

Cai, Jingsheng and Utiyama, Masao and Sumita, Eiichiro and Zhang, Yujie

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We present a set of dependency-based pre-ordering rules which improved the BLEU score by 1.61 on the NIST 2006 evaluation data.
Experiments	Our development set was the official NIST MT evaluation data from 2002 to 2005, consisting of 4476 Chinese-English sentences pairs.
Experiments	Our test set was the NIST 2006 MT evaluation data, consisting of 1664 sentence pairs.
Introduction	Experiment results showed that our pre-ordering rule set improved the BLEU score on the NIST 2006 evaluation data by 1.61.

NIST is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

39. Error Detection for Statistical Machine Translation Using Linguistic Features

Xiong, Deyi and Zhang, Min and Li, Haizhou

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	For the error detection task, we use the best translation hypotheses of NIST MT-02/05/03 generated by MOSES as our training, development, and test corpus respectively.
SMT System	The translation task is on the official NIST Chinese-to-English evaluation data.
SMT System	For minimum error rate tuning (Och, 2003), we use NIST MT-02 as the development set for the translation task.
SMT System	In order to calculate word posterior probabilities, we generate 10,000 best lists for NIST MT-02/03/05 respectively.

NIST is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

40. Cut the noise: Mutually reinforcing reordering and alignments for improved machine translation

Visweswariah, Karthik and Khapra, Mitesh M. and Ramanathan, Ananthakrishnan

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	The data generated allows us to train a reordering model that gives an improvement of 1.8 BLEU points on the NIST MT—08 Urdu-English evaluation set over a reordering model that only uses manual word alignments, and a gain of 5.2 BLEU points over a standard phrase-based baseline.
Experimental setup	We use about 10K sentences (180K words) of manual word alignments which were created in house using part of the NIST MT—08 training data3 to train our baseline reordering model and to train our supervised machine aligners.
Experimental setup	We use a parallel corpus of 3.9M words consisting of 1.7M words from the NIST MT—08 training data set and 2.2M words extracted from parallel news stories on the
Experimental setup	We report results on the (four reference) NIST MT—08 evaluation set in Table 4 for the News and Web conditions.

NIST is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

41. PORT: a Precision-Order-Recall MT Evaluation Metric for Tuning

Chen, Boxing and Kuhn, Roland and Larkin, Samuel

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	The dev set comprised mainly data from the NIST 2005 test set, and also some balanced-genre web-text from NIST .
Experiments	Evaluation was performed on NIST 2006 and 2008.
Experiments	Table 10: Ordering scores (p, I and v) for test sets NIST
Introduction	0 BLEU (Papineni et al., 2002), NIST (Doddington, 2002), WER, PER, TER (Snover et al., 2006), and LRscore (Birch and Osborne, 2011) do not use external linguistic

NIST is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

42. Head-Driven Hierarchical Phrase-based Translation

Li, Junhui and Tu, Zhaopeng and Zhou, Guodong and van Genabith, Josef

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Experiments on Chinese—English translation on four NIST MT test sets show that the HD—HPB model significantly outperforms Chiang’s model with average gains of 1.91 points absolute in BLEU.
Experiments	We train our model on a dataset with ~1.5M sentence pairs from the LDC dataset.2 We use the 2002 NIST MT evaluation test data (878 sentence pairs) as the development data, and the 2003, 2004, 2005, 2006-news NIST MT evaluation test data (919, 1788, 1082, and 616 sentence pairs, respectively) as the test data.
Experiments	For evaluation, the NIST BLEU script (version 12) with the default settings is used to calculate the BLEU scores.
Introduction	Experiments on Chinese-English translation using four NIST MT test sets show that our HD-HPB model significantly outperforms Chiang’s HPB as well as a SAMT—style refined version of HPB.

NIST is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

Xiao, Xinyan and Xiong, Deyi and Zhang, Min and Liu, Qun and Lin, Shouxun

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We show that our model significantly improves the translation performance over the baseline on NIST Chinese-to-English translation experiments.
Experiments	We present our experiments on the NIST Chinese-English translation tasks.
Experiments	We used the NIST evaluation set of 2005 (MT05) as our development set, and sets of MT06/MT08 as test sets.
Experiments	Case-insensitive NIST BLEU (Papineni et al., 2002) was used to mea-

NIST is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

44. Fast and Adaptive Online Training of Feature-Rich Translation Models

Green, Spence and Wang, Sida and Cer, Daniel and Manning, Christopher D.

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Analysis	The mass is concentrated along the diagonal, probably because MT05/6/8 was prepared by NIST , an American agency, while the bitext was collected from many sources including Agence France Presse.
Experiments	4.3 NIST OpenMT Experiment
Experiments	However, the bitext5k models do not generalize as well to the NIST evaluation sets as represented by the MT04 result.
Introduction	The first experiment uses standard tuning and test sets from the NIST OpenMT competitions.

NIST is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

45. Binarized Forest to String Translation

Zhang, Hao and Fang, Licheng and Xu, Peng and Wu, Xiaoyun

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Combining the two techniques, we show that using a fast shift-reduce parser we can achieve significant quality gains in NIST 2008 English-to-Chinese track (1.3 BLEU points over a phrase-based system, 0.8 BLEU points over a hierarchical phrase-based system).
Experiments	For English-to-Chinese translation, we used all the allowed training sets in the NIST 2008 constrained track.
Experiments	For NIST , we filtered out sentences exceeding 80 words in the parallel texts.

NIST is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

46. A Markov Model of Machine Translation using Non-parametric Bayesian Inference

Feng, Yang and Cohn, Trevor

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Here the training data consists of the non-UN portions and non-HK Hansards portions of the NIST training corpora distributed by the LDC, totalling 303k sentence pairs with 8m and 9.4m words of Chinese and English, respectively.
Experiments	For the development set we use the NIST 2002 test set, and evaluate performance on the test sets from NIST 2003
Experiments	We evaluate on the NIST test sets from 2003 and 2005, and the 2002 test set was used for MERT training.

NIST is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

47. Online Relative Margin Maximization for Statistical Machine Translation

Eidelman, Vladimir and Marton, Yuval and Resnik, Philip

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Additional Experiments	For training, we used the non-UN portion of the NIST training corpora, which was segmented using an HMM segmenter (Lee et al., 2003).
Experiments	For training we used the non-UN and non-HK Hansards portions of the NIST training corpora, which was segmented using the Stanford segmenter (Tseng et al., 2005).
Experiments	We used cdec (Dyer et al., 2010) as our hierarchical phrase-based decoder, and tuned the parameters of the system to optimize BLEU (Papineni et al., 2002) on the NIST MT06 corpus.

NIST is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

feature set (25)
BLEU (18)
TER (11)

48. Dependency Based Chinese Sentence Realization

He, Wei and Wang, Haifeng and Guo, Yuqing and Liu, Ting

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	In addition to BLEU score, percentage of exactly matched sentences and average NIST simple string accuracy (SSA) are adopted as evaluation metrics.
Experiments	The average NIST simple string accuracy score reflects the average number of insertion (I), deletion (D), and substitution (5) errors between the output sentence and the reference sentence.
Log-linear Models	3 The BLEU scoring script is supplied by NIST Open Machine Translation Evaluation at ftp://iaguarncsl.nist.gov/mt/resources/mteval-vl lb.pl

NIST is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

49. A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation

Liu, Yang

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	As our approach combines the merits of phrase-based and string-to-dependency models, it achieves significant improvements over the two baselines on the NIST Chinese-English datasets.
Introduction	We evaluate our method on the NIST Chinese-English translation datasets.
Introduction	We used the 2002 NIST MT Chinese-English dataset as the development set and the 2003-2005 NIST datasets as the testsets.

NIST is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

50. English-to-Russian MT evaluation campaign

Braslavski, Pavel and Beloborodov, Alexander and Khalilov, Maxim and Sharoff, Serge

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Evaluation methodology	In addition to human evaluation, we also ran system-level automatic evaluations using BLEU (Papineni et al., 2001), NIST (Doddington, 2002), METEOR (Banerjee and Lavie, 2005), TER (Snover et al., 2009), and GTM (Turian et al., 2003).
Results	The lower part of Table 2 also reports the results of simulated dynamic ranking (using the NIST rankings as the initial order for the sort operation).
Results	Sentence level Corpus Metric Median Mean Trimmed level BLEU 0.357 0.298 0.348 0.833 NIST 0.357 0.291 0.347 0.810 Meteor 0.429 0.348 0.393 0.714 TER 0.214 0.186 0.204 0.619 GTM 0.429 0.340 0.392 0.714

NIST is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

51. Underspecifying and Predicting Voice for Surface Realisation Ranking

Zarriess, Sina and Cahill, Aoife and Kuhn, Jonas

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Setup	NIST 13.01 12.95 12.69 Match 27.91 27.66 26.38 Ling.
Experimental Setup	Model BLEU 0.764 0.759 0.747 NIST 13.18 13.14 13.01
Experimental Setup	use several standard measures: a) exact match: how often does the model select the original corpus sentence, b) BLEU: n-gram overlap between top-ranked and original sentence, c) NIST : modification of BLEU giving more weight to less frequent n-grams.

NIST is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

52. Lattice Desegmentation for Statistical Machine Translation

Salameh, Mohammad and Cherry, Colin and Kondrak, Grzegorz

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Setup	We train our English-to-Arabic system using 1.49 million sentence pairs drawn from the NIST 2012 training set, excluding the UN data.
Experimental Setup	We tune on the NIST 2004 evaluation set (1353 sentences) and evaluate on NIST 2005 (1056 sentences).
Results	Judging from the output on the NIST 2005 test set, the system uses these discontiguous desegmentations very rarely: only 5% of desegmented tokens align to discontiguous source phrases.

NIST is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

LM (16)
language model (13)
BLEU (13)

53. Empirical Study of Unsupervised Chinese Word Segmentation Methods for SMT on Large-scale Corpora

Wang, Xiaolin and Utiyama, Masao and Finch, Andrew and Sumita, Eiichiro

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Experimental results show that the proposed method is comparable to supervised segmenters on the in-domain NIST OpenMT corpus, and yields a 0.96 BLEU relative increase on NTCIR PatentMT corpus which is out-of-domain.
Complexity Analysis	The first bilingual corpus: OpenMT06 was used in the NIST open machine translation 2006 Evaluation 2.
Complexity Analysis	The data sets of NIST Eval 2002 to 2005 were used as the development for MERT tuning (Och, 2003).

NIST is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

segmenters (11)
BLEU (9)
bigram (7)

54. Can You Repeat That? Using Word Repetition to Improve Spoken Term Detection

Wintrode, Jonathan and Khudanpur, Sanjeev

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	The BABEL task is modeled on the 2006 NIST Spoken Term Detection evaluation ( NIST , 2006) but focuses on limited resource conditions.
Results	At our disposal, we have the five BABEL languages — Tagalog, Cantonese, Pashto, Turkish and Vietnamese — as well as the development data from the NIST 2006 English evaluation.
Term Detection Re-scoring	The primary metric for the BABEL program, Actual Term Weighted Value (ATWV) is defined by NIST using a cost function of the false alarm probability P(FA) and P(l\/liss), averaged over a set of queries ( NIST , 2006).

NIST is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

55. A Comparative Study of Target Dependency Structures for Statistical Machine Translation

Wu, Xianchao and Sudoh, Katsuhito and Duh, Kevin and Tsukada, Hajime and Nagata, Masaaki

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	For Chinese-to-English translation, we use the parallel data from NIST Open Machine Translation Evaluation tasks.
Experiments	The NIST 2003 and 2005 test data are respectively taken as the development and test set.
Introduction	(2008) and achieved state-of-the-art results as reported in the NIST 2008 Open MT Evaluation workshop and the NTCIR-9 Chinese-to-English patent translation task (Goto et al., 2011; Ma and Matsoukas, 2011).

NIST is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

dependency trees (17)
CCG (10)
BLEU (5)

56. Improve SMT Quality with Automatically Extracted Paraphrase Rules

He, Wei and Wu, Hua and Wang, Haifeng and Liu, Ting

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	develop NIST 2002 878 10 NIST 2005 1,082 4 NIST 2004 1,788 5 test NIST 2006 1,664 4 NIST 2008 1,357 4
Experiments	The system was tested using the Chinese-English MT evaluation sets of NIST 2004, NIST 2006 and NIST 2008.
Experiments	For development, we used the Chinese-English MT evaluation sets of NIST 2002 and NIST 2005.

NIST is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

57. Integrating Phrase-based Reordering Features into a Chart-based Decoder for Machine Translation

Nguyen, ThuyLinh and Vogel, Stephan

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiment Results	The language model is the interpolation of 5-gram language models built from news corpora of the NIST 2012 evaluation.
Experiment Results	We tuned the parameters on the MT06 NIST test set (1664 sentences) and report the BLEU scores on three unseen test sets: MT04 (1353 sentences), MT05 (1056 sentences) and MT09 (1313 sentences).
Experiment Results	We tuned the parameters on MT06 NIST test set of 1664 sentences and report the results of MT04, MT05 and MT08 unseen test sets.

NIST is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: