Index of papers in Proc. ACL that mention

development set

Seen in text as:

development set (376)
development sets (36)
developing set (7)
Development Set (4)

Seen in 422 sentences in 89 papers.

1. Transition-based Dependency Parsing with Selectional Branching

Choi, Jinho D. and McCallum, Andrew

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	These languages are selected because they contain non-projective trees and are publicly available from the CoNLL-X webpage.6 Since the CoNLL-X data we have does not come with development sets , the last 10% of each training set is used for development.
Experiments	Feature selection is done on the English development set .
Experiments	First, all parameters are tuned on the English development set by using grid search on T = [1, .
Selectional branching	Input: Dt: training set, Dd: development set .
Selectional branching	First, an initial model M0 is trained on all data by taking the one-best sequences, and its score is measured by testing on a development set (lines 2-4).

development set is mentioned in 13 sentences in this paper.

Topics mentioned in this paper:

2. Exploiting Heterogeneous Treebanks for Parsing

Niu, Zheng-Yu and Wang, Haifeng and Wu, Hua

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments of Grammar Formalism Conversion	Table 4: Results of the generative parser on the development set , when trained with various weighting of CTB training set and CDTPS .
Experiments of Parsing	We used a standard split of CTB for performance evaluation, articles 1-270 and 400-1151 as training set, articles 301-325 as development set , and articles 271-300 as test set.
Experiments of Parsing	We tried the corpus weighting method when combining CDTPS with CTB training set (abbreviated as CTB for simplicity) as training data, by gradually increasing the weight (including 1, 2, 5, 10, 20, 50) of CTB to optimize parsing performance on the development set .
Experiments of Parsing	Table 4 presents the results of the generative parser with various weights of CTB on the development set .
Our Two-Step Solution	The number of removed trees will be determined by cross validation on development set .
Our Two-Step Solution	The value of A will be tuned by cross validation on development set .
Our Two-Step Solution	Corpus weighting is exactly such an approach, with the weight tuned on development set , that will be used for parsing on homogeneous treebanks in this paper.

development set is mentioned in 13 sentences in this paper.

Topics mentioned in this paper:

3. A Multi-Domain Translation Model Framework for Statistical Machine Translation

Sennrich, Rico and Schwenk, Holger and Aransa, Walid

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	If there is a mismatch between the domain of the development set and the test set, domain adaptation can potentially harm performance compared to an unadapted baseline.
Translation Model Architecture	As a way of optimizing instance weights, (Sennrich, 2012b) minimize translation model perplexity on a set of phrase pairs, automatically extracted from a parallel development set .
Translation Model Architecture	Cluster a development set into k clusters.
Translation Model Architecture	4.1 Clustering the Development Set

development set is mentioned in 18 sentences in this paper.

Topics mentioned in this paper:

4. Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification

Tang, Duyu and Wei, Furu and Yang, Nan and Zhou, Ming and Liu, Ting and Qin, Bing

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Related Work	The training and development sets were completely in full to task participants.
Related Work	However, we were unable to download all the training and development sets because some tweets were deleted or not available due to modified authorization status.
Related Work	The tradeoff parameter of ReEmb (Labutov and Lipson, 2013) is tuned on the development set of SemEval 2013.

development set is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

5. Error Detection for Statistical Machine Translation Using Linguistic Features

Xiong, Deyi and Zhang, Min and Li, Haizhou

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Error Detection with a Maximum Entropy Model	To avoid overfitting, we optimize the Gaussian prior on the development set .
Experiments	We find that the LG parser can not fully parse 560 sentences (63.8%) in the training set (MT-02), 731 sentences (67.6%) in the development set (MT-05) and 660 sentences (71.8%) in the test set (MT-03).
Experiments	To compare with previous work using word posterior probabilities for confidence estimation, we carried out experiments using wpp estimated from N -best lists with the classification threshold 7', which was optimized on our development set to minimize CER.
Features	1This does not mean we do not need a development set .
Features	We do validate our feature selection and other experimental settings on the development set .
Features	We optimize the discrete factor on our development set and find the optimal value is 1.
Introduction	3) Divide words into two groups (correct translations and errors) by using a classification threshold optimized on a development set .
Related Work	rectly output prediction results from our dis-criminatively trained classifier without optimizing a classification threshold on a distinct development set beforehand.1 Most previous approaches make decisions based on a pre-tuned classification threshold 7' as follows
SMT System	For minimum error rate tuning (Och, 2003), we use NIST MT-02 as the development set for the translation task.

development set is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

6. Coreference Semantics from Web Features

Bansal, Mohit and Klein, Dan

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Note that most previous work does not report (or need) a standard development set; hence, for tuning our features and its hyper-parameters, we randomly split the original training data into a training and development set with a 70/30 ratio (and then use the full original training set during testing).
Experiments	7Note that the development set is used only for ACE04, because for ACE05, and ACE05-ALL, we directly test using the features tuned on ACE04.
Experiments	Table 2: Incremental results for the Web features on the ACE04 development set .
Semantics via Web Features	As a real example from our development set , the co-occurrence count cm for the headword pair (leaden president) is 11383, while it is only 95 for the headword pair (voter president); after normalization and loglo, the values are -10.9 and -12.0, respectively.
Semantics via Web Features	Also, we do not constrain the order of hl and hg because these patterns can hold for either direction of coreference.4 As a real example from our development set , the 012 count for the headword pair (leaden president) is 752, while for (voter president), it is 0.
Semantics via Web Features	We chose the following three context types, based on performance on a development set:

development set is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

7. Global Learning of Focused Entailment Graphs

Berant, Jonathan and Dagan, Ido and Goldberger, Jacob

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Evaluation	The graphs were randomly split into a development set (11 graphs) and a test set (12 graphs)6.
Experimental Evaluation	The left half depicts methods where the development set was needed to tune parameters, and the right half depicts methods that do not require a (manually created) development set at all.
Experimental Evaluation	Results on the left were achieved by optimizing the top-K parameter on the development set , and on the right by optimizing on the training set automatically generated from WordNet.
Learning Entailment Graph Edges	Note that this constant needs to be optimized on a development set .
Learning Entailment Graph Edges	Importantly, while the score-based formulation contains a parameter A that requires optimization, this probabilistic formulation is parameter free and does not utilize a development set at all.

development set is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

8. Efficient Minimum Error Rate Training and Minimum Bayes-Risk Decoding for Translation Hypergraphs and Lattices

Kumar, Shankar and Macherey, Wolfgang and Dyer, Chris and Och, Franz

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Discussion	In this paper, we have described how MERT can be employed to estimate the weights for the linear loss function to maximize BLEU on a development set .
Experiments	Our development set (dev) consists of the NIST 2005 eval set; we use this set for optimizing MBR parameters.
Experiments	MERT is then performed to optimize the BLEU score on a development set ; For MERT, we use 40 random initial parameters as well as parameters computed using corpus based statistics (Tromble et al., 2008).
Experiments	We select the MBR scaling factor (Tromble et al., 2008) based on the development set ; it is set to 0.1, 0.01, 0.5, 0.2, 0.5 and 1.0 for the aren-phrase, aren-hier, aren-samt, zhen-phrase zhen-hier and zhen-samt systems respectively.
Introduction	We employ MERT to select these weights by optimizing BLEU score on a development set .

development set is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

n-gram (21)
BLEU (20)
phrase-based (11)

9. Semi-supervised Relation Extraction with Large-scale Word Clustering

Sun, Ang and Grishman, Ralph and Sekine, Satoshi

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Cluster Feature Selection	Our main idea is to learn the best set of prefix lengths, perhaps through the validation of their effectiveness on a development set of data.
Cluster Feature Selection	Because this method does not need validation on the development set , it is the laziest but the fastest method for selecting clusters.
Cluster Feature Selection	Exhaustive Search (ES): ES works by trying every possible combination of the set I and picking the one that works the best for the development set .
Experiments	The set of prefix lengths that worked the best for the development set was chosen to select clusters.
Experiments	It was interesting that ES did not always outperform the two statistical methods which might be because of its overfitting to the development set .
Experiments	For the semi-supervised system, each test fold was the same one used in the baseline and the other 4 folds were further split into a training set and a development set in a ratio of 7:3 for selecting clusters.

development set is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

10. Beam-Width Prediction for Efficient Context-Free Parsing

Bodenstab, Nathan and Dunlop, Aaron and Hall, Keith and Roark, Brian

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Beam-Width Prediction	Figure 3 is a visual representation of beam-width prediction on a single sentence of the development set using the Berkeley latent-variable grammar and Boundary FOM.
Conclusion and Future Work	Table 1: Section 22 development set results for CYK and Beam-Search (Beam) parsing using the Berkeley latent-variable grammar.
Open/Closed Cell Classification	We found the value A = 102 to give the best performance on our development set , and we use this value in all of our experiments.
Open/Closed Cell Classification	Figures 2a and 2b compare the pruned charts of Chart Constraints and Constituent Closure for a single sentence in the development set .
Open/Closed Cell Classification	We compare the effectiveness of Constituent Closure, Complete Closure, and Chart Constraints, by decreasing the percentage of chart cells closed until accuracy over all sentences in our development set start to decline.
Results	In Table 1 we present the accuracy and parse time for three baseline parsers on the development set : exhaustive CYK parsing, beam-search parsing using only the inside score BC), and beam-search parsing using the Boundary FOM.

development set is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

11. Word Representations: A Simple and General Method for Semi-Supervised Learning

Turian, Joseph and Ratinov, Lev-Arie and Bengio, Yoshua

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Supervised evaluation tasks	training partition sentences, and evaluated their F1 on the development set .
Supervised evaluation tasks	After each epoch over the training set, we measured the accuracy of the model on the development set .
Supervised evaluation tasks	Training was stopped after the accuracy on the development set did not improve for 10 epochs, generally about 50—80 epochs total.

development set is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

12. Learning to lemmatise Polish noun phrases

Radziszewski, Adam

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

CRF and features	For this purpose the development set was split into training and testing part.
Evaluation	The performed evaluation assumed training of the CRF on the whole development set annotated with the induced transformations and then applying the trained model to tag the evaluation part with transformations.
Evaluation	Observation of the development set suggests that returning the original inflected NPs may be a better baseline.
Preparation of training data	The whole set was divided randomly into the development set (1105 NPs) and evaluation set (564 NPs).
Preparation of training data	The development set was enhanced with word-level transformations that were induced automatically in the following manner.
Preparation of training data	The frequencies of all transformations induced from the development set are given in Tab.
Related works	For development and evaluation, two subsets of NCP were chosen and manually annotated with NP lemmas: development set (112 phrases) and evaluation set (224 phrases).

development set is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

13. Models of Semantic Representation with Visual Attributes

Silberer, Carina and Ferrari, Vittorio and Lapata, Mirella

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Setup	We optimized this value on the development set and obtained best results with 5 = 0.
Results	We optimized the model parameters on a development set consisting of cue-associate pairs from Nelson et al.
Results	The best performing model on the development set used 500 visual terms and 750 topics and the association measure proposed in Griffiths et al.
The Attribute Dataset	For most concepts the development set contained a maximum of 100 images and the test set a maximum of 200 images.
The Attribute Dataset	Concepts with less than 800 images in total were split into 1/8 test and development set each, and 3/4 training set.
The Attribute Dataset	The development set was used for devising and refining our attribute annotation scheme.

development set is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

14. Automatic Syllabification with Structured SVMs for Letter-to-Phoneme Conversion

Bartlett, Susan and Kondrak, Grzegorz and Cherry, Colin

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Syllabification Experiments	Recall that 5K training examples were held out as a development set .
Syllabification with Structured SVMs	We use a linear kernel, and tune the SVM’s cost parameter on a development set .
Syllabification with Structured SVMs	While developing our tagging schemes and feature representation, we used a development set of 5K words held out from our CELEX training data.
Syllabification with Structured SVMs	Our development set experiments suggested that numbering ONC tags increases their performance.

development set is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

15. Predicting Grammaticality on an Ordinal Scale

Heilman, Michael and Cahill, Aoife and Madnani, Nitin and Lopez, Melissa and Mulholland, Matthew and Tetreault, Joel

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	For this experiment, all models were estimated from the training set and evaluated on the development set .
Experiments	For test set evaluations, we trained on the combination of the training and development sets (§2), to maximize the amount of training data for the final experiments.
Experiments	12We selected a threshold for binarization from a grid of 1001 points from 1 to 4 that maximized the accuracy of binarized predictions from a model trained on the training set and evaluated on the binarized development set .
System Description	On 10 preliminary runs with the development set , this variance
System Description	Table l: Pearson’s 7“ on the development set , for our full system and variations excluding each feature type.

development set is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

16. Steps to Excellence: Simple Inference with Refined Scoring of Dependency Trees

Zhang, Yuan and Lei, Tao and Barzilay, Regina and Jaakkola, Tommi and Globerson, Amir

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Setup	For efficiency, we limit the sentence length to 70 tokens in training and development sets .
Experimental Setup	This gives a 99% pruning recall on the CATiB development set .
Experimental Setup	After pruning, we tune the regularization parameter 0 = {0.l,0.01,0.001} on development sets for different languages.
Sampling-Based Dependency Parsing with Global Features	2In our work we choose oz 2 0.003, which gives a 98.9% oracle POS tagging accuracy on the CATiB development set .

development set is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

17. Joint POS Tagging and Transition-based Constituent Parsing in Chinese with Non-local Features

Wang, Zhiguo and Xue, Nianwen

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiment	We conducted experiments on the Penn Chinese Treebank (CTB) version 5.1 (Xue et al., 2005): Articles 001-270 and 400-1151 were used as the training set, Articles 301-325 were used as the development set , and Articles 271-300 were used
Experiment	We tuned the optimal number of iterations of perceptron training algorithm on the development set .
Experiment	We trained these three systems on the training set and evaluated them on the development set .

development set is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

18. Improving Statistical Machine Translation with Monolingual Collocation

Liu, Zhanyi and Wang, Haifeng and Wu, Hua and Li, Sheng

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments on Parsing-Based SMT	The feature weights are tuned on the development set using the minimum error
Experiments on Phrase-Based SMT	We used the NIST MT-2002 set as the development set and the NIST MT-2004 test set as the test set.
Experiments on Phrase-Based SMT	And Koehn's implementation of minimum error rate training (Och, 2003) is used to tune the feature weights on the development set .
Experiments on Word Alignment	(11), we also manually labeled a development set including 100 sentence pairs, in the same manner as the test set.
Experiments on Word Alignment	By minimizing the AER on the development set , the interpolation coefficients of the collocation probabilities on CM-l and CM-2 were set to 0.1 and 0.9.
Improving Phrase Table	For the phrase only including one word, we set a fixed collocation probability that is the average of the collocation probabilities of the sentences on a development set .

development set is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

19. Computational Approaches to Sentence Completion

Zweig, Geoffrey and Platt, John C. and Meek, Christopher and Burges, Christopher J.C. and Yessenalina, Ainur and Liu, Qiang

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Results 5.1 Data Resources	This book contains eleven practice tests, and we used all the sentence completion questions in the first five tests as a development set , and all the questions in the last six tests as the test set.
Experimental Results 5.1 Data Resources	To provide human benchmark performance, we asked six native speaking high school students and five graduate students to answer the questions on the development set .
Experimental Results 5.1 Data Resources	For the LSA—LM, an interpolation weight of 0.1 was used for the LSA score, determined through optimization on the development set .
Sentence Completion via Latent Semantic Analysis	In practice, a f of 1.2 was selected on the basis of development set results.

development set is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

20. Application-driven Statistical Paraphrase Generation

Zhao, Shiqi and Lan, Xiang and Liu, Ting and Li, Sheng

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Setup	In our experiments, the development set contains 200 sentences and the test set contains 500 sentences, both of which are randomly selected from the human translations of 2008 NIST Open Machine Translation Evaluation: Chinese to English Task.
Statistical Paraphrase Generation	= cdev(+7“)/cdev(7“), where cdev(7“) is the total number of unit replacements in the generated paraphrases on the development set .
Statistical Paraphrase Generation	Replacement rate (rr): rr measures the paraphrase degree on the development set , i.e., the percentage of words that are paraphrased.
Statistical Paraphrase Generation	We define rr as: 77 = wdev(7“)/wdev(s), where wdev(7“) is the total number of words in the replaced units on the development set, and wdev (s) is the number of words of all sentences on the development set .

development set is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

21. Additive Neural Networks for Statistical Machine Translation

liu, lemao and Watanabe, Taro and Sumita, Eiichiro and Zhao, Tiejun

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	MERT (Och, 2003), MIRA (Watanabe et al., 2007; Chiang et al., 2008), PRO (Hopkins and May, 2011) and so on, which itera-tively optimize a weight such that, after re-ranking a k-best list of a given development set with this weight, the loss of the resulting l-best list is minimal.
Introduction	where f is a source sentence in a given development set , and ((6, d), (6’, d’ is a preference pair for f; N is the number of all preference pairs; A > 0 is a regularizer.
Introduction	Given a development set , we first run pre-training to obtain an initial parameter 61 for Algorithm 1 in line 1.

development set is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

22. Broadcast News Story Segmentation Using Manifold Learning on Latent Topic Distributions

Lu, Xiaoming and Xie, Lei and Leung, Cheung-Chi and Ma, Bin and Li, Haizhou

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental setup	We separated this corpus into three non-overlapping sets: a training set of 500 programs for parameter estimation in topic modeling and LE, a development set of 133 programs for empirical tuning and a test set of 400 programs for performance evaluation.
Experimental setup	A number of parameters were set through empirical tuning on the developent set .
Experimental setup	Figure 1 shows the results on the development set and the test set.

development set is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

23. Dependency-based Pre-ordering for Chinese-English Machine Translation

Cai, Jingsheng and Utiyama, Masao and Sumita, Eiichiro and Zhang, Yujie

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Dependency-based Pre-ordering Rule Set	4 Conduct primary experiments which used the same training set and development set as the experiments described in Section 3.
Dependency-based Pre-ordering Rule Set	In the primary experiments, we tested the effectiveness of the candidate rules and filtered the ones that did not work based on the BLEU scores on the development set .
Experiments	Our development set was the official NIST MT evaluation data from 2002 to 2005, consisting of 4476 Chinese-English sentences pairs.
Experiments	lected from the development set .
Experiments	The evaluation set contained 200 sentences randomly selected from the development set .

development set is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

24. Joint Event Extraction via Structured Prediction with Global Features

Li, Qi and Ji, Heng and Huang, Liang

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	For comparison, we used the same test set with 40 newswire articles (672 sentences) as in (J i and Grishman, 2008; Liao and Grishman, 2010) for the experiments, and randomly selected 30 other documents (863 sentences) from different genres as the development set .
Experiments	We use the harmonic mean of the trigger’s F1 measure and argument’s F1 measure to measure the performance on the development set .
Experiments	Figure 6 shows the training curves of the averaged perceptron with respect to the performance on the development set when the beam size is 4.

development set is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

25. DErivBase: Inducing and Evaluating a Derivational Morphology Resource for German

Zeller, Britta and Šnajder, Jan and Padó, Sebastian

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Building the Resource	To this end, we constructed a development set comprised of a sample of 1,000 derivational families induced using our rules.
Building the Resource	We also estimated the reliability of derivational rules by analyzing the accuracy of each rule on the development set .
Evaluation	We have considered a number of string distance measures and tested them on the development set (cf.
Evaluation	This is based on preliminary experiments on the development set (cf.
Evaluation	Lemmas included in the development set (Section 4.1) were excluded from sampling.

development set is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

26. Learning Structured Perceptrons for Coreference Resolution with Latent Antecedents and Non-local Features

Björkelund, Anders and Kuhn, Jonas

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Results	the English development set as a function of number of training iterations with two different beam sizes, 20 and 100, over the local and nonlocal feature sets.
Results	In Figure 4 we compare early update with LaSO and delayed LaSO on the English development set .
Results	Table 1 displays the differences in F-measures and CoNLL average between the local and nonlocal systems when applied to the development sets for each language.

development set is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

27. Evaluating a Crosslinguistic Grammar Resource: A Case Study of Wambaya

Bender, Emily M.

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Wambaya grammar	In addition, this grammar has relatively low ambiguity, assigning on average 11.89 parses per item in the development set .
Wambaya grammar	Of the 92 sentences in this text, 20 overlapped with items in the development set , so the
Wambaya grammar	The parsed portion of the development set (732 items) constitutes a sufficiently large corpus to train a parse selection model using the Redwoods disambiguation technology (Toutanova et al., 2005).

development set is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

28. Exploring Deterministic Constraints: from a Constrained English POS Tagger to an Efficient ILP Solution to Chinese Word Segmentation

Zhao, Qiuye and Marcus, Mitch

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Following most previous work, e. g. (Collins, 2002) and (Shen et al., 2007), we divide this corpus into training set (sections 0-18), development set (sections 19-21) and the final test set (sections 22-24).
Abstract	Following (J iang et al., 2008a), we divide this corpus into training set (chapters 1-260), development set (chapters 271-300) and the final test set (chapters 301-325).
Abstract	Experiments in this section are carried out on the development set .

development set is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

POS tagging (31)
ILP (20)
Viterbi (18)

29. Word Segmentation of Informal Arabic with Domain Adaptation

Monroe, Will and Green, Spence and Manning, Christopher D.

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Error Analysis	We sampled 100 errors randomly from all errors made by our final model (trained on all three datasets with domain adaptation and additional features) on the ARZ development set ; see Table 4.
Error Analysis	Table 4: Counts of error categories (out of 100 randomly sampled ARZ development set errors).
Error Analysis	One example of this distinction that appeared in the development set is the pair any)» mawdm“‘my topic” (yo madeZ< + 6.
Experiments	F1 scores provide a more informative assessment of performance than word-level or character-level accuracy scores, as over 80% of tokens in the development sets consist of only one segment, with an average of one segmentation every 4.7 tokens (or one every 20.4 characters).
Experiments	Table 1 contains results on the development set for the model of Green and DeNero and our improvements.

development set is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

30. Discriminative Pronunciation Modeling: A Large-Margin, Feature-Rich Approach

Tang, Hao and Keshet, Joseph and Livescu, Karen

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	The regularization parameter A is tuned on the development set .
Experiments	We run all three algorithms for multiple epochs and pick the best epoch based on development set performance.
Experiments	For the first set of experiments, we use the same division of the corpus as in (Livescu and Glass, 2004; Jyothi et al., 2011) into a 2492—word training set, a 165-word development set , and a 236-word test set.

development set is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

31. Anchors Regularized: Adding Robustness and Extensibility to Scalable Topic-Modeling Algorithms

Nguyen, Thang and Hu, Yuening and Boyd-Graber, Jordan

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Regularization Improves Topic Models	We split each dataset into a training fold (70%), development fold (15%), and a test fold (15%): the training data are used to fit models; the development set are used to select parameters (anchor threshold M, document prior 04, regularization weight A); and final results are reported on the test fold.
Regularization Improves Topic Models	We select 04 using grid search on the development set .
Regularization Improves Topic Models	4.1 Grid Search for Parameters on Development Set

development set is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

32. Labeling Documents with Timestamps: Learning from their Time Expressions

Chambers, Nathanael

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Datasets	In other words, the development set includes documents from July 1995, July 1996, July 1997, etc.
Datasets	The development set includes 7,300 from July of each year.
Experiments and Results	The A factor in the joint classifier is optimized on the development set as described in Section 4.3.
Experiments and Results	The features described in this paper were selected solely by studying performance on the development set .
Learning Time Constraints	Figure 4: Development set accuracy and A values.

development set is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

MaxEnt (15)
unigrams (15)
NER (7)

33. Semantic Parsing via Paraphrasing

Berant, Jonathan and Liang, Percy

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Empirical evaluation	We tuned the L1 regularization strength, developed features, and ran analysis experiments on the development set (averaging across random splits).
Empirical evaluation	To further examine this, we ran BCFL13 on the development set , allowing it to use only predicates from logical forms suggested by our logical form construction step.
Empirical evaluation	This improved oracle accuracy on the development set to 64.5%, but accuracy was 32.2%.

development set is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

34. Fast and Accurate Shift-Reduce Constituent Parsing

Zhu, Muhua and Zhang, Yue and Chen, Wenliang and Zhang, Min and Zhu, Jingbo

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Table 5: Experimental results on the English and Chinese development sets with the padding technique and new supervised features added incrementally.
Experiments	Table 6: Experimental results on the English and Chinese development sets with different types of semi-supervised features added incrementally to the extended parser.
Experiments	on the development sets .

development set is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

35. Handling Ambiguities of Bilingual Predicate-Argument Structures for Statistical Machine Translation

Zhai, Feifei and Zhang, Jiajun and Zhou, Yu and Zong, Chengqing

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiment	The development set and test set come from the NIST evaluation test data (from 2003 to 2005).
Experiment	Finally, the development set includes 595 sentences from NIST MT03 and the test set contains 1,786 sentences from NIST MT04 and MT05.
Experiment	We perform SRL on the source part of the training set, development set and test set by the Chinese SRL system used in (Zhuang and Zong, 2010b).

development set is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

36. Refining Event Extraction through Cross-Document Inference

Ji, Heng and Grishman, Ralph

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Results and Analysis	We used 10 newswire texts from ACE 2005 training corpora (from March to May of 2003) as our development set , and then conduct blind test on a separate set of 40 ACE 2005 newswire texts.
Experimental Results and Analysis	We select the thresholds (d with k=1~13) for various confidence metrics by optimizing the F-measure score of each rule on the development set , as shown in Figure 2 and 3 as follows.
Experimental Results and Analysis	The labeled point on each curve shows the best F-measure that can be obtained on the development set by adjusting the threshold for that rule.

development set is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

37. Heterogeneous Transfer Learning for Image Clustering via the SocialWeb

Yang, Qiang and Chen, Yuqiang and Xue, Gui-Rong and Dai, Wenyuan and Yu, Yong

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	To empirically investigate the parameter A and the convergence of our algorithm aPLSA, we generated five more date sets as the development sets .
Experiments	The detailed description of these five development sets , namely tunel to tune5 is listed in Table 1 as well.
Experiments	We have done some experiments on the development sets to investigate how different A affect the performance of aPLSA.

development set is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

38. Improving the Use of Pseudo-Words for Evaluating Selectional Preferences

Chambers, Nathanael and Jurafsky, Daniel

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	We randomly chose 9 documents from the year 2001 for a development set , and 41 documents for testing.
How Frequent is Unseen Data?	We then record every seen (vd, n) pair during training that is seen two or more times3 and then count the number of unseen pairs in the NYT development set (1455 tests).
How Frequent is Unseen Data?	Figure 1: Percentage of NYT development set that is unseen when trained on varying amounts of data.
How Frequent is Unseen Data?	Figure 2: Percentage of subject/object/preposition arguments in the NYT development set that is unseen when trained on varying amounts of NYT data.

development set is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

39. A Discriminative Latent Variable Model for Statistical Machine Translation

Blunsom, Phil and Cohn, Trevor and Osborne, Miles

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Discussion and Further Work	_ tences ( development set )
Evaluation	A comparison on the impact of accounting for all derivations in training and decoding ( development set ).
Evaluation	The effect of the beam width (log-scale) on max-translation decoding ( development set ).
Evaluation	An informal comparison of the outputs on the development set , presented in Table 4, suggests that the

development set is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

40. Learning Soft Linear Constraints with Application to Citation Field Extraction

Anzaroot, Sam and Passos, Alexandre and Belanger, David and McCallum, Andrew

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Citation Extraction Data	There are 660 citations in the development set and 367 citation in the test set.
Citation Extraction Data	We then use the development set to learn the penalties for the soft constraints, using the perceptron algorithm described in section 3.1.
Citation Extraction Data	We instantiate constraints from each template in section 5.1, iterating over all possible labels that contain a B prefix at any level in the hierarchy and pruning all constraints with imp(c) < 2.75 calculated on the development set .
Soft Constraints in Dual Decomposition	We found it beneficial, though it is not theoretically necessary, to learn the constraints on a held-out development set , separately from the other model parameters, as during training most constraints are satisfied due to overfitting, which leads to an underestimation of the relevant penalties.

development set is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

41. Automatic Adaptation of Annotation Standards: Chinese Word Segmentation and POS Tagging -- A Case Study

Jiang, Wenbin and Huang, Liang and Liu, Qun

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	The data splitting convention of other two corpora, People’s Daily doesn’t reserve the development sets , so in the following experiments, we simply choose the model after 7 iterations when training on this corpus.
Experiments	Table 4: Error analysis for Joint S&T on the developing set of CTB.
Experiments	To obtain further information about what kind of errors be alleviated by annotation adaptation, we conduct an initial error analysis for Joint S&T on the developing set of CTB.

development set is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

42. An Error-Driven Word-Character Hybrid Model for Joint Chinese Word Segmentation and POS Tagging

Kruengkrai, Canasai and Uchimoto, Kiyotaka and Kazama, Jun'ichi and Wang, Yiou and Torisawa, Kentaro and Isahara, Hitoshi

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Note that the development set was only used for evaluating the trained model to obtain the optimal values of tunable parameters.
Experiments	For the baseline policy, we varied 7“ in the range of [1, 5] and found that setting 7“ = 3 yielded the best performance on the development set for both the small and large training corpus experiments.
Experiments	Optimal balances were selected using the development set .
Policies for correct path selection	4In our experiments, the optimal threshold value 7" is selected by evaluating the performance of joint word segmentation and POS tagging on the development set.

development set is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

43. Better Alignments = Better Translations?

Ganchev, Kuzman and Graça, João V. and Taskar, Ben

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Phrase-based machine translation	For each alignment model and decoding type we train Moses and use MERT optimization to tune its parameters on a development set .
Phrase-based machine translation	For Hansards we randomly chose 1000 and 500 sentences from test 1 and test 2 to be testing and development sets respectively.
Phrase-based machine translation	In principle, we would like to tune the threshold by optimizing BLEU score on a development set , but that is impractical for experiments with many pairs of languages.
Word alignment results	Figure 7 shows an example of the same sentence, using the same model where in one case Viterbi decoding was used and in the other case Posterior decoding tuned to minimize AER on a development set

development set is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

44. Revisiting Pivot Language Approach for Machine Translation

Wu, Hua and Wang, Haifeng

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	For Chinese-English-Spanish translation, we used the development set (devset3) released for the pivot task as the test set, which contains 506 source sentences, with 7 reference translations in English and Spanish.
Experiments	To be capable of tuning parameters on our systems, we created a development set of 1,000 sentences taken from the training sets, with 3 reference translations in both English and Spanish.
Experiments	This development set is also used to train the regression learning model.

development set is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

45. A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation

Liu, Yang

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	Adding dependency language model (“depLM”) and the maximum entropy shift-reduce parsing model (“maxent”) significantly improves BLEU and TER on the development set , both separately and jointly.
Introduction	We used the 2002 NIST MT Chinese-English dataset as the development set and the 2003-2005 NIST datasets as the testsets.
Introduction	BLEU and TER scores are calculated on the development set .

development set is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

46. Less Grammar, More Features

Hall, David and Durrett, Greg and Klein, Dan

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Annotations	Table 2: Results for the Penn Treebank development set , sentences of length g 40, for different annotation schemes implemented on top of the X-bar grammar.
Features	Table 1 shows the results of incrementally building up our feature set on the Penn Treebank development set .
Other Languages	(2013) only report results on the development set for the Berkeley-Rep model; however, the task organizers also use a version of the Berkeley parser provided with parts of speech from high-quality POS taggers for each language (Berkeley-Tags).
Other Languages	On the development set , we outperform the Berkeley parser and match the performance of the Berkeley-Rep parser.

development set is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

47. Automatic Image Annotation Using Auxiliary Text Information

Feng, Yansong and Lapata, Mirella

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

BBC News Database	the kernel whose value is optimized on the development set .
BBC News Database	where 0t is a smoothing parameter tuned on the development set , sa is the annotation for the latent variable 5 and sd its corresponding document.
BBC News Database	where ,u is a smoothing parameter estimated on the development set , lama is a Boolean variable denoting whether w appears in the annotation sa, and Nw is the number of latent variables that contain w in their annotations.

development set is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

48. Discriminative Learning with Natural Annotations: Word Segmentation as a Case Study

Jiang, Wenbin and Sun, Meng and Lü, Yajuan and Yang, Yating and Liu, Qun

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Figure 3: Learning curve of the averaged perceptron classifier on the CTB developing set .
Experiments	We train the baseline perceptron classifier for word segmentation on the training set of CTB 5.0, using the developing set to determine the best training iterations.
Experiments	Figure 3 shows the learning curve of the averaged perceptron on the developing set .

development set is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

49. Tagging The Web: Building A Robust Web Tagger with Neural Network

Ma, Ji and Zhang, Yue and Zhu, Jingbo

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	While emails and weblogs are used as the development sets , reviews, news groups and Yahoo!Answers are used as the final test sets.
Experiments	All these parameters are selected according to the averaged accuracy on the development set .
Experiments	Experimental results under the 4 combined settings on the development sets are illustrated in Figure 2, 3 and 4, where the

development set is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

50. Dependency Parsing and Projection Based on Word-Pair Classification

Jiang, Wenbin and Liu, Qun

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Boosting an MST Parser	The relative weight A is adjusted to maximize the performance on the development set , using an algorithm similar to minimum error-rate training (Och, 2003).
Experiments	Figure 3: Performance curves of the word-pair classification model on the development sets of WSJ and CTB 5.0, with respect to a series of ratio 7“.
Experiments	Figure 4: The performance curve of the word-pair classification model on the development set of CTB 5.0, with respect to a series of threshold (9.
Experiments	Then, on each instance set we train a classifier and test it on the development set of CTB 5.0.

development set is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

51. Boosting-Based System Combination for Machine Translation

Xiao, Tong and Zhu, Jingbo and Zhu, Muhua and Wang, Huizhen

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Background	1 The data set used for weight training is generally called development set or tuning set in the SMT field.
Background	We see, first of all, that all the three systems are improved during iterations on the development set .
Background	iteration number Figure 2: BLEU scores on the development set

development set is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

52. Joint Feature Selection in Distributed Stochastic Learning for Large-Scale Discriminative Training in SMT

Simianer, Patrick and Riezler, Stefan and Dyer, Chris

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We present eXperiments on learning on 1.5 million training sentences, and show significant improvements over tuning discriminative models on small development sets .
Experiments	The results on the news-commentary (nc) data show that training on the development set does not benefit from adding large feature sets — BLEU result differences between tuning 12 default features
Experiments	However, scaling all features to the full training set shows significant improvements for algorithm 3, and especially for algorithm 4, which gains 0.8 BLEU points over tuning 12 features on the development set .
Introduction	Our resulting models are learned on large data sets, but they are small and outperform models that tune feature sets of various sizes on small development sets .

development set is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

53. Utilizing Dependency Language Models for Graph-based Dependency Parsing Models

Chen, Wenliang and Zhang, Min and Li, Haizhou

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Figure 4 shows the UAS curves on the development set , where K is beam size for Intersect and K-best for Rescoring, the X-aXis represents K, and the Y—aXis represents the UAS scores.
Experiments	Table 3: The parsing times on the development set (seconds for all the sentences)
Experiments	Table 3 shows the parsing times of Intersect on the development set for English.
Implementation Details	The numbers, 10% and 30%, are tuned on the development sets in the experiments.

development set is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

54. Entity-Based Local Coherence Modelling Using Topological Fields

Cheung, Jackie Chi Kit and Penn, Gerald

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	There are 216 documents and 4126 original-permutation pairs in the training set, and 24 documents and 465 pairs in the development set .
Introduction	Transition length, salience, and a regularization parameter are tuned on the development set .
Introduction	We only report results using the setting of transition length g 4, and no salience threshold, because they give the best performance on the development set .

development set is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

55. Unsupervised Discovery of Generic Relationships Using Pattern Clusters and its Evaluation by Automatically Generated SAT Analogy Questions

Davidov, Dmitry and Rappoport, Ari

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Corpora and Parameters	We used part of the Russian corpus as a development set for determining the parameters.
Corpora and Parameters	On our development set we have tested various parameter settings.
Corpora and Parameters	In our experiments we have used the following values (again, determined using a development set ) for these parameters: F0: 1,000 words per million (wpm); FH: 100 wpm; FB: 1.2 wpm; N: 500 words; W: 5 words; L: 30%; S: 2/3; 04: 0.1.

development set is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

56. Brutus: A Semantic Role Labeling System Incorporating CCG, CFG, and Dependency Features

Boxwell, Stephen and Mehay, Dennis and Brew, Chris

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Argument Mapping Model	Given these features with gold standard parses, our argument mapping model can predict entire argument mappings with an accuracy rate of 87.96% on the test set, and 87.70% on the development set .
Identification and Labeling Models	All classifiers were trained to 500 iterations of L-BFGS training — a quasi-Newton method from the numerical optimization literature (Liu and N o-cedal, 1989) — using Zhang Le’s maxent toolkit.2 To prevent overfitting we used Gaussian priors with global variances of l and 5 for the identifier and labeler, respectively.3 The Gaussian priors were determined empirically by testing on the development set .
Identification and Labeling Models	4The size of the window was determined experimentally on the development set — we use the same window sizes throughout.

development set is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

CCG (39)
semantic role (25)
treebank (13)

57. Statistical Machine Translation Improves Question Retrieval in Community Question Answering via Matrix Factorization

Zhou, Guangyou and Liu, Fang and Liu, Yang and He, Shizhu and Zhao, Jun

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	We tune the parameters on a small development set of 50 questions.
Experiments	This development set is also extracted from Yahoo!
Experiments	For parameter K, we do an experiment on the development set to determine the optimal values among 50, 100, 150, - - - , 300 in terms of MAP.

development set is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

58. Simple Semi-supervised Dependency Parsing

Koo, Terry and Carreras, Xavier and Collins, Michael

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	The English experiments were performed on the Penn Treebank (Marcus et al., 1993), using a standard set of head-selection rules (Yamada and Matsumoto, 2003) to convert the phrase structure syntax of the Treebank to a dependency tree representation.6 We split the Treebank into a training set (Sections 2—21), a development set (Section 22), and several test sets (Sections 0,7 1, 23, and 24).
Experiments	of iterations of perceptron training, we performed up to 30 iterations and chose the iteration which optimized accuracy on the development set .
Experiments	Table 6: Parent-prediction accuracies of unlabeled Czech parsers on the PDT 1.0 development set .

development set is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

59. Semi-supervised Learning for Automatic Prosodic Event Detection Using Co-training Algorithm

Jeon, Je Hun and Liu, Yang

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Co-training strategy for prosodic event detection	Development Set 20 1,356 2,275 f2b, f3b Labeled set L 5 347 573 m2b, m3b Unlabeled set U 1,027 77,207 129,305 m4b
Conclusions	In our experiment, we used some labeled data as development set to estimate some parameters.
Experiments and results	Among labeled data, 102 utterances of all f] a and m] 19 speakers are used for testing, 20 utterances randomly chosen from f2b, f3b, m2b, m3b, and m4b are used as development set to optimize parameters such as A and confidence level threshold, 5 utterances are used as the initial training set L, and the rest of the data is used as unlabeled set U, which has 1027 unlabeled utterances (we removed the human labels for co-training experiments).

development set is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

60. Decoder Integration and Expected BLEU Training for Recurrent Neural Network Language Models

Auli, Michael and Gao, Jianfeng

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Expected BLEU Training	1We tuned AM+1 on the development set but found that AM+1 = 1 resulted in faster training and equal accuracy.
Expected BLEU Training	We fix 6 and re-optimize A in the presence of the recurrent neural network model using Minimum Error Rate Training (Och, 2003) on the development set (§5).
Experiments	ther lattices or the unique 100-best output of the phrase-based decoder and reestimate the log-linear weights by running a further iteration of MERT on the n-best list of the development set , augmented by scores corresponding to the neural network models.

development set is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

61. Shift-Reduce CCG Parsing with a Dependency Model

Xu, Wenduan and Clark, Stephen and Zhang, Yue

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	The beam size was tuned on the development set , and a value of 128 was found to achieve a reasonable balance of accuracy and speed; hence this value was used for all experiments.
Experiments	dependency length on the development set .
Experiments	Table 1 shows the accuracies of all parsers on the development set , in terms of labeled precision and recall over the predicate-argument dependencies in CCGBank.

development set is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

62. A Sense-Based Translation Model for Statistical Machine Translation

Xiong, Deyi and Zhang, Min

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	We used the NIST MT03 evaluation test data as our development set , and the NIST MT05 as the test set.
Experiments	Table 4: Experiment results of the sense-based translation model (STM) with lexicon and sense features extracted from a window of size varying from $5 to $15 words on the development set .
Experiments	Our first group of experiments were conducted to investigate the impact of the window size k on translation performance in terms of BLEUMIST on the development set .

development set is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

63. Forest Reranking: Discriminative Parsing with Non-Local Features

Huang, Liang

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	We use the standard split of the Treebank: sections 02-21 as the training data (39832 sentences), section 22 as the development set (1700 sentences), and section 23 as the test set (2416 sentences).
Experiments	The development set and the test set are parsed with a model trained on all 39832 training sentences.
Experiments	We use the development set to determine the optimal number of iterations for averaged perceptron, and report the F1 score on the test set.

development set is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

64. Learning Document-Level Semantic Properties from Free-Text Annotations

Branavan, S.R.K. and Chen, Harr and Eisenstein, Jacob and Barzilay, Regina

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Setup	Training Our model needs to be provided with the number of clusters K. We set K large enough for the model to learn effectively on the development set .
Experimental Setup	A threshold for this proportion is set for each property via the development set .
Model Description	Properties with proportions above a set threshold (tuned on a development set ) are predicted as being supported.

development set is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

65. Unsupervised Morphology-Based Vocabulary Expansion

Rasooli, Mohammad Sadegh and Lippincott, Thomas and Habash, Nizar and Rambow, Owen

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Evaluation	We sampled data from the training and development set of the Persian dependency treebank (Rasooli et al., 2013) to create a comparable seventh dataset in Persian.
Evaluation	00 is the upper-bound OOV reduction for our expansion model: for each word in the development set , we ask if our model, without any vocabulary size restriction at all, could generate it.
Evaluation	Table 5: Results from running a handcrafted Turkish morphological analyzer (Oflazer, 1996) on different expansions and on the development set .

development set is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

66. Phrase Table Training for Precision and Recall: What Makes a Good Phrase and a Good Phrase Pair?

Deng, Yonggang and Xu, Jia and Gao, Yuqing

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

A Generic Phrase Training Procedure	In the final step 4 (line 15), parameters {Am 7'} are discriminatively trained on a development set using the downhill simplex method (Nelder and Mead, 1965).
Discussions	We use feature functions to decide the order and the threshold 7' to locate the boundary guided with a development set .
Introduction	A significant deviation from most other approaches is that the framework is parameterized and can be optimized jointly with the decoder to maximize translation performance on a development set .

development set is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

67. Hypertagging: Supertagging for Surface Realization with CCG

Espinosa, Dominic and White, Michael and Mehay, Dennis

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Results and Discussion	The the whole feature set was found in feature ablation testing on the development set to outperform all other feature subsets significantly (p < 2.2 - 10—16).
Results and Discussion	The development set (00) was used to tune the 6 parameter to obtain reasonable hypertag ambiguity levels; the model was not otherwise tuned to it.
Results and Discussion	limit; on the development set , this improvement eli-mates more than the number of known search errors (cf.

development set is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

logical form (13)
POS tags (13)
CCG (11)

68. Abstractive Summarization of Spoken and Written Conversations Based on Phrasal Queries

Mehdad, Yashar and Carenini, Giuseppe and Ng, Raymond T.

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Setup	For parameters estimation, we tune all parameters (utterance selection and path ranking) ex-haustively with 0.1 intervals using our development set .
Phrasal Query Abstraction Framework	The parameters a and fl are tuned on a development set and sum up to 1.
Phrasal Query Abstraction Framework	We estimate the percentage of the retrieved utterances based on the development set .

development set is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

69. Improving Text Simplification Language Modeling Using Unsimplified Text Data

Kauchak, David

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Language Model Evaluation: Lexical Simplification	The data set contains a development set of 300 examples and a test set of 1710 examples.3 For our experiments, we evaluated the models on the test set.
Language Model Evaluation: Lexical Simplification	The best lambda was chosen based on a linear search optimized on the SemEval 2012 development set .
Why Does Unsimplified Data Help?	For the simplification task, the optimal lambda value determined on the development set was 0.98, with a very strong bias towards the simple model.

development set is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

70. Simple supervised document geolocation with geodesic grids

Wing, Benjamin and Baldridge, Jason

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Data	Running on the full data set is time-consuming, so development was done on a subset of about 80,000 articles (19.9 million tokens) as a training set and 500 articles as a development set .
Experiments	For both Wikipedia and Twitter, preliminary experiments on the development set were run to plot the prediction error for each method for each level of resolution, and the optimal resolution for each method was chosen for obtaining test results.
Experiments	We recomputed the distributions using several values for both parameters and evaluated on the development set .

development set is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

development set (3)
topic model (3)

71. A Word-Class Approach to Labeling PSCFG Rules for Machine Translation

Zollmann, Andreas and Vogel, Stephan

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	We evaluate our approach by comparing translation quality, as evaluated by the IBM-BLEU (Papineni et al., 2002) metric on the NIST Chinese-to-English translation task using MT04 as development set to train the model parameters A, and MTOS, MT06 and MT08 as test sets.
Experiments	We therefore choose N merely based on development set performance.
Experiments	Unfortunately, variance in development set BLEU scores tends to be higher than test set scores, despite of SAMT MERT’s inbuilt algorithms to overcome local optima, such as random restarts and zeroing-out.

development set is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

72. A Class of Submodular Functions for Document Summarization

Lin, Hui and Bilmes, Jeff

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	We used DUC-03 as our development set , and tested on DUC-04 data.
Experiments	DUC-03 was used as development set .
Experiments	Figure 1 illustrates how ROUGE-1 scores change when 04 and K vary on the development set (DUC-03).

development set is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

73. A Comparison of Loopy Belief Propagation and Dual Decomposition for Integrated CCG Supertagging and Parsing

Auli, Michael and Lopez, Adam

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	We measured the evolving accuracy of the models on the development set (Figure 4).
Oracle Parsing	The reason that using the gold-standard supertags doesn’t result in 100% oracle parsing accuracy is that some of the development set parses cannot be constructed by the learned grammar.
Oracle Parsing	riety of fixed beam settings (Figure l), considering only the subset of our development set which could be parsed with all beam settings.

development set is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

CCG (13)
F-score (6)
Model score (6)

74. Higher-order Constituent Parsing and Parser Combination

Chen, Xiao and Kit, Chunyu

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Constituent Recombination	The parameters )V; and p are tuned by the Powell’s method (Powell, 1964) on a development set , using the F1 score of PARSEVAL (Black et al., 1991) as objective.
Experiment	For parser combination, we follow the setting of Fossum and Knight (2009), using Section 24 instead of Section 22 of WSJ treebank as development set .
Experiment	It is tuned on a development set using the gold sec-

development set is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

75. A Class-Based Agreement Model for Generating Accurately Inflected Translations

Green, Spence and DeNero, John

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Table 1: Intrinsic evaluation accuracy [‘70] ( development set ) for Arabic segmentation and tagging.
Experiments	1 shows development set accuracy for two settings.
Experiments	We tuned the feature weights on a development set using lattice-based minimum error rate training (MERT) (Macherey et al.,

development set is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

phrase-based (10)
CRF (9)
LM (9)

76. Bayesian Symbol-Refined Tree Substitution Grammars for Syntactic Parsing

Shindo, Hiroyuki and Miyao, Yusuke and Fujino, Akinori and Nagata, Masaaki

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiment	We estimated the optimal values of the stopping probabilities s by using the development set .
Experiment	In all our experiments, we conducted ten independent runs to train our model, and selected the one that performed best on the development set in terms of parsing accuracy.
Experiment	development set (S 100).

development set is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

77. Exploring Syntactic Structural Features for Sub-Tree Alignment Using Bilingual Tree Kernels

Sun, Jun and Zhang, Min and Tan, Chew Lim

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Substructure Spaces for BTKs	The coefficient Oi for the composite kernel are tuned with respect to F-measure (F) on the development set of HIT corpus.
Substructure Spaces for BTKs	Those thresholds are also tuned on the development set of HIT corpus with respect to F-measure.
Substructure Spaces for BTKs	We use these sentences with less than 50 characters from the NIST MT-2002 test set as the development set (to speed up tuning for syntax based system) and the NIST MT-2005 test set as our test set.

development set is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

78. Unsupervised Transcription of Historical Documents

Berg-Kirkpatrick, Taylor and Durrett, Greg and Klein, Dan

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	We used as a development set ten additional documents from the Old Bailey proceedings and five additional documents from Trove that were not part of our test set.
Results and Analysis	This slightly improves performance on our development set and can be thought of as placing a prior on the glyph shape parameters.
Results and Analysis	We performed error analysis on our development set by randomly choosing 100 word errors from the WER alignment and manually annotating them with relevant features.

development set is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

79. A Markov Model of Machine Translation using Non-parametric Bayesian Inference

Feng, Yang and Cohn, Trevor

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	We used minimum error rate training (Och, 2003) to tune the feature weights to maximise the BLEU score on the development set .
Experiments	For the development set we use both ASR devset l and 2 from IWSLT 2005, and
Experiments	For the development set we use the NIST 2002 test set, and evaluate performance on the test sets from NIST 2003

development set is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

80. Co-regularizing character-based and word-based models for semi-supervised Chinese word segmentation

Zeng, Xiaodong and Wong, Derek F. and Chao, Lidia S. and Trancoso, Isabel

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiment	The development sets are mainly used to tune the values of the weight factor 04 in Equation 5.
Experiment	We evaluated the performance (F-score) of our model on the three development sets by using different 04 values, where 04 is progressively increased in steps of 0.1 (0 < 04 < 1.0).
Experiment	1The “baseline” uses a different training configuration so that the oz values in the decoding are also need to be tuned on the development sets .

development set is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

81. Argument Inference from Relevant Event Mentions in Chinese Argument Extraction

Li, Peifeng and Zhu, Qiaoming and Zhou, Guodong

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimentation	Besides, we reserve 33 documents in the training set as the development set and use the ground truth entities, times and values for our training and testing.
Experimentation	Our statistics on the development set shows almost 65% of the event mentions are involved in those Correfrence, Parallel and Sequence relations, which occupy 63%, 50%, 9% respectively6.
Inferring Inter-Sentence Arguments on Relevant Event Mentions	development set ; tri and tri ’ are triggers of kth and k’th event mention whose event types are et and et ’ in S<,~,j> and S<,~,,~> respectively.

development set is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

82. Grounded Unsupervised Semantic Parsing

Poon, Hoifung

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	We used the development set for initial development and tuning hyperparameters.
Experiments	For the GUSP system, we set the hyperparame-ters from initial experiments on the development set , and used them in all subsequent experiments.
Grounded Unsupervised Semantic Parsing	In preliminary experiments on the development set , we found that the naive model (with multinomials as conditional probabilities) did not perform well in EM.

development set is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

83. A Transition-Based Dependency Parser Using a Dynamic Parsing Strategy

Sartorio, Francesco and Satta, Giorgio and Nivre, Joakim

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Assessment	We use sections 2-21 for training, 22 as development set , and 23 as test set.
Experimental Assessment	We train all parsers up to 30 iterations, and for each parser we select the weight vector (3 from the iteration with the best accuracy on the development set .
Experimental Assessment	We have computed the average value of 7 on our English data set, resulting in 2.98 (variance 2.15) for training set, and 2.95 (variance 1.96) for development set .

development set is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

84. A Pilot Study of Opinion Summarization in Conversations

Wang, Dong and Liu, Yang

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	The 18 conversations annotated by all 3 annotators are used as test set, and the rest of 70 conversations are used as development set to tune the parameters (determining the best combination weights).
Experiments	From the development set , we used the grid search method to obtain the best combination weights for the two summarization methods.
Experiments	In the sentence-ranking method, the best parameters found on the development set are Asim = 0, Are; 2 0.3, Agent 2 0.3, Alen = 0.4.

development set is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

85. A Random Walk Approach to Selectional Preferences Based on Preference Ranking and Propagation

Tian, Zhenhua and Xiang, Hengheng and Liu, Ziqi and Zheng, Qinghua

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

RSP: A Random Walk Model for SP	We split the test set equally into two parts: one as the development set and the other as the final test set.
RSP: A Random Walk Model for SP	Parameters Tuning: The parameters are tuned on the PTB development set , using AFP as the generalization data.
RSP: A Random Walk Model for SP	This experiment is conducted on the PTB development set with RND confounders.

development set is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

86. Paraphrasing Adaptation for Web Search Ranking

Wang, Chenguang and Duan, Nan and Zhou, Ming and Zhang, Ming

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiment	We use the first 1,419 queries together with their annotated documents as the development set to tune paraphrasing parameters (as we discussed in Section 2.3), and use the rest as the test set.
Experiment	The ranking model is trained based on the development set .
Paraphrasing for Web Search	{Qi,D{4abel}f=1 is a human-labeled development set .

development set is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

87. Integrating Translation Memory into Phrase-Based Machine Translation during Decoding

Wang, Kun and Zong, Chengqing and Su, Keh-Yih

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	We randomly selected a development set and a test set, and then the remaining sentence pairs are for training set.
Experiments	Furthermore, development set and test set are divided into various intervals according to their best fuzzy match scores.
Experiments	All the feature weights and the weight for each probability factor (3 factors for Model-III) are tuned on the development set with minimum-error-rate training (MERT) (Och, 2003).

development set is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

TER (14)
BLEU (11)
SMT system (11)

88. Joint Inference for Fine-grained Opinion Extraction

Yang, Bishan and Cardie, Claire

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	We set aside 132 documents as a development set and use 350 documents as the evaluation set.
Experiments	We used L2-regu1arization; the regularization parameter was tuned using the development set .
Experiments	The parameter A was tuned using the development set .

development set is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

relation extraction (20)
CRF (14)
ILP (14)

89. Topological Field Parsing of German

Cheung, Jackie Chi Kit and Penn, Gerald

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	The number of iterations was determined by experiments on the development set .
Experiments	Tuning the parameter settings on the development set , we found that parameterized categories, binarization, and including punctuation gave the best F1 performance.
Experiments	While experimenting with the development set of TuBa-D/Z, we noticed that the parser sometimes returns parses, in which paired punctuation (e.g.

development set is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: