Index of papers in Proc. ACL 2014 that mention
  • development set
Tang, Duyu and Wei, Furu and Yang, Nan and Zhou, Ming and Liu, Ting and Qin, Bing
Related Work
The training and development sets were completely in full to task participants.
Related Work
However, we were unable to download all the training and development sets because some tweets were deleted or not available due to modified authorization status.
Related Work
The tradeoff parameter of ReEmb (Labutov and Lipson, 2013) is tuned on the development set of SemEval 2013.
development set is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Zhang, Yuan and Lei, Tao and Barzilay, Regina and Jaakkola, Tommi and Globerson, Amir
Experimental Setup
For efficiency, we limit the sentence length to 70 tokens in training and development sets .
Experimental Setup
This gives a 99% pruning recall on the CATiB development set .
Experimental Setup
After pruning, we tune the regularization parameter 0 = {0.l,0.01,0.001} on development sets for different languages.
Sampling-Based Dependency Parsing with Global Features
2In our work we choose oz 2 0.003, which gives a 98.9% oracle POS tagging accuracy on the CATiB development set .
development set is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Heilman, Michael and Cahill, Aoife and Madnani, Nitin and Lopez, Melissa and Mulholland, Matthew and Tetreault, Joel
Experiments
For this experiment, all models were estimated from the training set and evaluated on the development set .
Experiments
For test set evaluations, we trained on the combination of the training and development sets (§2), to maximize the amount of training data for the final experiments.
Experiments
12We selected a threshold for binarization from a grid of 1001 points from 1 to 4 that maximized the accuracy of binarized predictions from a model trained on the training set and evaluated on the binarized development set .
System Description
On 10 preliminary runs with the development set , this variance
System Description
Table l: Pearson’s 7“ on the development set , for our full system and variations excluding each feature type.
development set is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Wang, Zhiguo and Xue, Nianwen
Experiment
We conducted experiments on the Penn Chinese Treebank (CTB) version 5.1 (Xue et al., 2005): Articles 001-270 and 400-1151 were used as the training set, Articles 301-325 were used as the development set , and Articles 271-300 were used
Experiment
We tuned the optimal number of iterations of perceptron training algorithm on the development set .
Experiment
We trained these three systems on the training set and evaluated them on the development set .
development set is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Berant, Jonathan and Liang, Percy
Empirical evaluation
We tuned the L1 regularization strength, developed features, and ran analysis experiments on the development set (averaging across random splits).
Empirical evaluation
To further examine this, we ran BCFL13 on the development set , allowing it to use only predicates from logical forms suggested by our logical form construction step.
Empirical evaluation
This improved oracle accuracy on the development set to 64.5%, but accuracy was 32.2%.
development set is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Björkelund, Anders and Kuhn, Jonas
Results
the English development set as a function of number of training iterations with two different beam sizes, 20 and 100, over the local and nonlocal feature sets.
Results
In Figure 4 we compare early update with LaSO and delayed LaSO on the English development set .
Results
Table 1 displays the differences in F-measures and CoNLL average between the local and nonlocal systems when applied to the development sets for each language.
development set is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Cai, Jingsheng and Utiyama, Masao and Sumita, Eiichiro and Zhang, Yujie
Dependency-based Pre-ordering Rule Set
4 Conduct primary experiments which used the same training set and development set as the experiments described in Section 3.
Dependency-based Pre-ordering Rule Set
In the primary experiments, we tested the effectiveness of the candidate rules and filtered the ones that did not work based on the BLEU scores on the development set .
Experiments
Our development set was the official NIST MT evaluation data from 2002 to 2005, consisting of 4476 Chinese-English sentences pairs.
Experiments
lected from the development set .
Experiments
The evaluation set contained 200 sentences randomly selected from the development set .
development set is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Monroe, Will and Green, Spence and Manning, Christopher D.
Error Analysis
We sampled 100 errors randomly from all errors made by our final model (trained on all three datasets with domain adaptation and additional features) on the ARZ development set ; see Table 4.
Error Analysis
Table 4: Counts of error categories (out of 100 randomly sampled ARZ development set errors).
Error Analysis
One example of this distinction that appeared in the development set is the pair any)» mawdm“‘my topic” (yo madeZ< + 6.
Experiments
F1 scores provide a more informative assessment of performance than word-level or character-level accuracy scores, as over 80% of tokens in the development sets consist of only one segment, with an average of one segmentation every 4.7 tokens (or one every 20.4 characters).
Experiments
Table 1 contains results on the development set for the model of Green and DeNero and our improvements.
development set is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Nguyen, Thang and Hu, Yuening and Boyd-Graber, Jordan
Regularization Improves Topic Models
We split each dataset into a training fold (70%), development fold (15%), and a test fold (15%): the training data are used to fit models; the development set are used to select parameters (anchor threshold M, document prior 04, regularization weight A); and final results are reported on the test fold.
Regularization Improves Topic Models
We select 04 using grid search on the development set .
Regularization Improves Topic Models
4.1 Grid Search for Parameters on Development Set
development set is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Hall, David and Durrett, Greg and Klein, Dan
Annotations
Table 2: Results for the Penn Treebank development set , sentences of length g 40, for different annotation schemes implemented on top of the X-bar grammar.
Features
Table 1 shows the results of incrementally building up our feature set on the Penn Treebank development set .
Other Languages
(2013) only report results on the development set for the Berkeley-Rep model; however, the task organizers also use a version of the Berkeley parser provided with parts of speech from high-quality POS taggers for each language (Berkeley-Tags).
Other Languages
On the development set , we outperform the Berkeley parser and match the performance of the Berkeley-Rep parser.
development set is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Ma, Ji and Zhang, Yue and Zhu, Jingbo
Experiments
While emails and weblogs are used as the development sets , reviews, news groups and Yahoo!Answers are used as the final test sets.
Experiments
All these parameters are selected according to the averaged accuracy on the development set .
Experiments
Experimental results under the 4 combined settings on the development sets are illustrated in Figure 2, 3 and 4, where the
development set is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Anzaroot, Sam and Passos, Alexandre and Belanger, David and McCallum, Andrew
Citation Extraction Data
There are 660 citations in the development set and 367 citation in the test set.
Citation Extraction Data
We then use the development set to learn the penalties for the soft constraints, using the perceptron algorithm described in section 3.1.
Citation Extraction Data
We instantiate constraints from each template in section 5.1, iterating over all possible labels that contain a B prefix at any level in the hierarchy and pruning all constraints with imp(c) < 2.75 calculated on the development set .
Soft Constraints in Dual Decomposition
We found it beneficial, though it is not theoretically necessary, to learn the constraints on a held-out development set , separately from the other model parameters, as during training most constraints are satisfied due to overfitting, which leads to an underestimation of the relevant penalties.
development set is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Mehdad, Yashar and Carenini, Giuseppe and Ng, Raymond T.
Experimental Setup
For parameters estimation, we tune all parameters (utterance selection and path ranking) ex-haustively with 0.1 intervals using our development set .
Phrasal Query Abstraction Framework
The parameters a and fl are tuned on a development set and sum up to 1.
Phrasal Query Abstraction Framework
We estimate the percentage of the retrieved utterances based on the development set .
development set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Xu, Wenduan and Clark, Stephen and Zhang, Yue
Experiments
The beam size was tuned on the development set , and a value of 128 was found to achieve a reasonable balance of accuracy and speed; hence this value was used for all experiments.
Experiments
dependency length on the development set .
Experiments
Table 1 shows the accuracies of all parsers on the development set , in terms of labeled precision and recall over the predicate-argument dependencies in CCGBank.
development set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Xiong, Deyi and Zhang, Min
Experiments
We used the NIST MT03 evaluation test data as our development set , and the NIST MT05 as the test set.
Experiments
Table 4: Experiment results of the sense-based translation model (STM) with lexicon and sense features extracted from a window of size varying from $5 to $15 words on the development set .
Experiments
Our first group of experiments were conducted to investigate the impact of the window size k on translation performance in terms of BLEUMIST on the development set .
development set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Rasooli, Mohammad Sadegh and Lippincott, Thomas and Habash, Nizar and Rambow, Owen
Evaluation
We sampled data from the training and development set of the Persian dependency treebank (Rasooli et al., 2013) to create a comparable seventh dataset in Persian.
Evaluation
00 is the upper-bound OOV reduction for our expansion model: for each word in the development set , we ask if our model, without any vocabulary size restriction at all, could generate it.
Evaluation
Table 5: Results from running a handcrafted Turkish morphological analyzer (Oflazer, 1996) on different expansions and on the development set .
development set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Auli, Michael and Gao, Jianfeng
Expected BLEU Training
1We tuned AM+1 on the development set but found that AM+1 = 1 resulted in faster training and equal accuracy.
Expected BLEU Training
We fix 6 and re-optimize A in the presence of the recurrent neural network model using Minimum Error Rate Training (Och, 2003) on the development set (§5).
Experiments
ther lattices or the unique 100-best output of the phrase-based decoder and reestimate the log-linear weights by running a further iteration of MERT on the n-best list of the development set , augmented by scores corresponding to the neural network models.
development set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: