Index of papers in Proc. ACL 2013 that mention
  • development set
Choi, Jinho D. and McCallum, Andrew
Experiments
These languages are selected because they contain non-projective trees and are publicly available from the CoNLL-X webpage.6 Since the CoNLL-X data we have does not come with development sets , the last 10% of each training set is used for development.
Experiments
Feature selection is done on the English development set .
Experiments
First, all parameters are tuned on the English development set by using grid search on T = [1, .
Selectional branching
Input: Dt: training set, Dd: development set .
Selectional branching
First, an initial model M0 is trained on all data by taking the one-best sequences, and its score is measured by testing on a development set (lines 2-4).
development set is mentioned in 13 sentences in this paper.
Topics mentioned in this paper:
Sennrich, Rico and Schwenk, Holger and Aransa, Walid
Introduction
If there is a mismatch between the domain of the development set and the test set, domain adaptation can potentially harm performance compared to an unadapted baseline.
Translation Model Architecture
As a way of optimizing instance weights, (Sennrich, 2012b) minimize translation model perplexity on a set of phrase pairs, automatically extracted from a parallel development set .
Translation Model Architecture
Cluster a development set into k clusters.
Translation Model Architecture
4.1 Clustering the Development Set
development set is mentioned in 18 sentences in this paper.
Topics mentioned in this paper:
Silberer, Carina and Ferrari, Vittorio and Lapata, Mirella
Experimental Setup
We optimized this value on the development set and obtained best results with 5 = 0.
Results
We optimized the model parameters on a development set consisting of cue-associate pairs from Nelson et al.
Results
The best performing model on the development set used 500 visual terms and 750 topics and the association measure proposed in Griffiths et al.
The Attribute Dataset
For most concepts the development set contained a maximum of 100 images and the test set a maximum of 200 images.
The Attribute Dataset
Concepts with less than 800 images in total were split into 1/8 test and development set each, and 3/4 training set.
The Attribute Dataset
The development set was used for devising and refining our attribute annotation scheme.
development set is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Radziszewski, Adam
CRF and features
For this purpose the development set was split into training and testing part.
Evaluation
The performed evaluation assumed training of the CRF on the whole development set annotated with the induced transformations and then applying the trained model to tag the evaluation part with transformations.
Evaluation
Observation of the development set suggests that returning the original inflected NPs may be a better baseline.
Preparation of training data
The whole set was divided randomly into the development set (1105 NPs) and evaluation set (564 NPs).
Preparation of training data
The development set was enhanced with word-level transformations that were induced automatically in the following manner.
Preparation of training data
The frequencies of all transformations induced from the development set are given in Tab.
Related works
For development and evaluation, two subsets of NCP were chosen and manually annotated with NP lemmas: development set (112 phrases) and evaluation set (224 phrases).
development set is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Zhu, Muhua and Zhang, Yue and Chen, Wenliang and Zhang, Min and Zhu, Jingbo
Experiments
Table 5: Experimental results on the English and Chinese development sets with the padding technique and new supervised features added incrementally.
Experiments
Table 6: Experimental results on the English and Chinese development sets with different types of semi-supervised features added incrementally to the extended parser.
Experiments
on the development sets .
development set is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Zhai, Feifei and Zhang, Jiajun and Zhou, Yu and Zong, Chengqing
Experiment
The development set and test set come from the NIST evaluation test data (from 2003 to 2005).
Experiment
Finally, the development set includes 595 sentences from NIST MT03 and the test set contains 1,786 sentences from NIST MT04 and MT05.
Experiment
We perform SRL on the source part of the training set, development set and test set by the Chinese SRL system used in (Zhuang and Zong, 2010b).
development set is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Zeller, Britta and Šnajder, Jan and Padó, Sebastian
Building the Resource
To this end, we constructed a development set comprised of a sample of 1,000 derivational families induced using our rules.
Building the Resource
We also estimated the reliability of derivational rules by analyzing the accuracy of each rule on the development set .
Evaluation
We have considered a number of string distance measures and tested them on the development set (cf.
Evaluation
This is based on preliminary experiments on the development set (cf.
Evaluation
Lemmas included in the development set (Section 4.1) were excluded from sampling.
development set is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Lu, Xiaoming and Xie, Lei and Leung, Cheung-Chi and Ma, Bin and Li, Haizhou
Experimental setup
We separated this corpus into three non-overlapping sets: a training set of 500 programs for parameter estimation in topic modeling and LE, a development set of 133 programs for empirical tuning and a test set of 400 programs for performance evaluation.
Experimental setup
A number of parameters were set through empirical tuning on the developent set .
Experimental setup
Figure 1 shows the results on the development set and the test set.
development set is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
liu, lemao and Watanabe, Taro and Sumita, Eiichiro and Zhao, Tiejun
Introduction
MERT (Och, 2003), MIRA (Watanabe et al., 2007; Chiang et al., 2008), PRO (Hopkins and May, 2011) and so on, which itera-tively optimize a weight such that, after re-ranking a k-best list of a given development set with this weight, the loss of the resulting l-best list is minimal.
Introduction
where f is a source sentence in a given development set , and ((6*, d*), (6’, d’ is a preference pair for f; N is the number of all preference pairs; A > 0 is a regularizer.
Introduction
Given a development set , we first run pre-training to obtain an initial parameter 61 for Algorithm 1 in line 1.
development set is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Li, Qi and Ji, Heng and Huang, Liang
Experiments
For comparison, we used the same test set with 40 newswire articles (672 sentences) as in (J i and Grishman, 2008; Liao and Grishman, 2010) for the experiments, and randomly selected 30 other documents (863 sentences) from different genres as the development set .
Experiments
We use the harmonic mean of the trigger’s F1 measure and argument’s F1 measure to measure the performance on the development set .
Experiments
Figure 6 shows the training curves of the averaged perceptron with respect to the performance on the development set when the beam size is 4.
development set is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Liu, Yang
Introduction
Adding dependency language model (“depLM”) and the maximum entropy shift-reduce parsing model (“maxent”) significantly improves BLEU and TER on the development set , both separately and jointly.
Introduction
We used the 2002 NIST MT Chinese-English dataset as the development set and the 2003-2005 NIST datasets as the testsets.
Introduction
BLEU and TER scores are calculated on the development set .
development set is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Jiang, Wenbin and Sun, Meng and Lü, Yajuan and Yang, Yating and Liu, Qun
Experiments
Figure 3: Learning curve of the averaged perceptron classifier on the CTB developing set .
Experiments
We train the baseline perceptron classifier for word segmentation on the training set of CTB 5.0, using the developing set to determine the best training iterations.
Experiments
Figure 3 shows the learning curve of the averaged perceptron on the developing set .
development set is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Poon, Hoifung
Experiments
We used the development set for initial development and tuning hyperparameters.
Experiments
For the GUSP system, we set the hyperparame-ters from initial experiments on the development set , and used them in all subsequent experiments.
Grounded Unsupervised Semantic Parsing
In preliminary experiments on the development set , we found that the naive model (with multinomials as conditional probabilities) did not perform well in EM.
development set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Li, Peifeng and Zhu, Qiaoming and Zhou, Guodong
Experimentation
Besides, we reserve 33 documents in the training set as the development set and use the ground truth entities, times and values for our training and testing.
Experimentation
Our statistics on the development set shows almost 65% of the event mentions are involved in those Correfrence, Parallel and Sequence relations, which occupy 63%, 50%, 9% respectively6.
Inferring Inter-Sentence Arguments on Relevant Event Mentions
development set ; tri and tri ’ are triggers of kth and k’th event mention whose event types are et and et ’ in S<,~,j> and S<,~,,~> respectively.
development set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Sartorio, Francesco and Satta, Giorgio and Nivre, Joakim
Experimental Assessment
We use sections 2-21 for training, 22 as development set , and 23 as test set.
Experimental Assessment
We train all parsers up to 30 iterations, and for each parser we select the weight vector (3 from the iteration with the best accuracy on the development set .
Experimental Assessment
We have computed the average value of 7 on our English data set, resulting in 2.98 (variance 2.15) for training set, and 2.95 (variance 1.96) for development set .
development set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Kauchak, David
Language Model Evaluation: Lexical Simplification
The data set contains a development set of 300 examples and a test set of 1710 examples.3 For our experiments, we evaluated the models on the test set.
Language Model Evaluation: Lexical Simplification
The best lambda was chosen based on a linear search optimized on the SemEval 2012 development set .
Why Does Unsimplified Data Help?
For the simplification task, the optimal lambda value determined on the development set was 0.98, with a very strong bias towards the simple model.
development set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Tian, Zhenhua and Xiang, Hengheng and Liu, Ziqi and Zheng, Qinghua
RSP: A Random Walk Model for SP
We split the test set equally into two parts: one as the development set and the other as the final test set.
RSP: A Random Walk Model for SP
Parameters Tuning: The parameters are tuned on the PTB development set , using AFP as the generalization data.
RSP: A Random Walk Model for SP
This experiment is conducted on the PTB development set with RND confounders.
development set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Wang, Chenguang and Duan, Nan and Zhou, Ming and Zhang, Ming
Experiment
We use the first 1,419 queries together with their annotated documents as the development set to tune paraphrasing parameters (as we discussed in Section 2.3), and use the rest as the test set.
Experiment
The ranking model is trained based on the development set .
Paraphrasing for Web Search
{Qi,D{4abel}f=1 is a human-labeled development set .
development set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Wang, Kun and Zong, Chengqing and Su, Keh-Yih
Experiments
We randomly selected a development set and a test set, and then the remaining sentence pairs are for training set.
Experiments
Furthermore, development set and test set are divided into various intervals according to their best fuzzy match scores.
Experiments
All the feature weights and the weight for each probability factor (3 factors for Model-III) are tuned on the development set with minimum-error-rate training (MERT) (Och, 2003).
development set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Yang, Bishan and Cardie, Claire
Experiments
We set aside 132 documents as a development set and use 350 documents as the evaluation set.
Experiments
We used L2-regu1arization; the regularization parameter was tuned using the development set .
Experiments
The parameter A was tuned using the development set .
development set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Zeng, Xiaodong and Wong, Derek F. and Chao, Lidia S. and Trancoso, Isabel
Experiment
The development sets are mainly used to tune the values of the weight factor 04 in Equation 5.
Experiment
We evaluated the performance (F-score) of our model on the three development sets by using different 04 values, where 04 is progressively increased in steps of 0.1 (0 < 04 < 1.0).
Experiment
1The “baseline” uses a different training configuration so that the oz values in the decoding are also need to be tuned on the development sets .
development set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Feng, Yang and Cohn, Trevor
Experiments
We used minimum error rate training (Och, 2003) to tune the feature weights to maximise the BLEU score on the development set .
Experiments
For the development set we use both ASR devset l and 2 from IWSLT 2005, and
Experiments
For the development set we use the NIST 2002 test set, and evaluate performance on the test sets from NIST 2003
development set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Zhou, Guangyou and Liu, Fang and Liu, Yang and He, Shizhu and Zhao, Jun
Experiments
We tune the parameters on a small development set of 50 questions.
Experiments
This development set is also extracted from Yahoo!
Experiments
For parameter K, we do an experiment on the development set to determine the optimal values among 50, 100, 150, - - - , 300 in terms of MAP.
development set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Berg-Kirkpatrick, Taylor and Durrett, Greg and Klein, Dan
Experiments
We used as a development set ten additional documents from the Old Bailey proceedings and five additional documents from Trove that were not part of our test set.
Results and Analysis
This slightly improves performance on our development set and can be thought of as placing a prior on the glyph shape parameters.
Results and Analysis
We performed error analysis on our development set by randomly choosing 100 word errors from the WER alignment and manually annotating them with relevant features.
development set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: