Index of papers in Proc. ACL that mention
  • development set
Choi, Jinho D. and McCallum, Andrew
Experiments
These languages are selected because they contain non-projective trees and are publicly available from the CoNLL-X webpage.6 Since the CoNLL-X data we have does not come with development sets , the last 10% of each training set is used for development.
Experiments
Feature selection is done on the English development set .
Experiments
First, all parameters are tuned on the English development set by using grid search on T = [1, .
Selectional branching
Input: Dt: training set, Dd: development set .
Selectional branching
First, an initial model M0 is trained on all data by taking the one-best sequences, and its score is measured by testing on a development set (lines 2-4).
development set is mentioned in 13 sentences in this paper.
Topics mentioned in this paper:
Niu, Zheng-Yu and Wang, Haifeng and Wu, Hua
Experiments of Grammar Formalism Conversion
Table 4: Results of the generative parser on the development set , when trained with various weighting of CTB training set and CDTPS .
Experiments of Parsing
We used a standard split of CTB for performance evaluation, articles 1-270 and 400-1151 as training set, articles 301-325 as development set , and articles 271-300 as test set.
Experiments of Parsing
We tried the corpus weighting method when combining CDTPS with CTB training set (abbreviated as CTB for simplicity) as training data, by gradually increasing the weight (including 1, 2, 5, 10, 20, 50) of CTB to optimize parsing performance on the development set .
Experiments of Parsing
Table 4 presents the results of the generative parser with various weights of CTB on the development set .
Our Two-Step Solution
The number of removed trees will be determined by cross validation on development set .
Our Two-Step Solution
The value of A will be tuned by cross validation on development set .
Our Two-Step Solution
Corpus weighting is exactly such an approach, with the weight tuned on development set , that will be used for parsing on homogeneous treebanks in this paper.
development set is mentioned in 13 sentences in this paper.
Topics mentioned in this paper:
Sennrich, Rico and Schwenk, Holger and Aransa, Walid
Introduction
If there is a mismatch between the domain of the development set and the test set, domain adaptation can potentially harm performance compared to an unadapted baseline.
Translation Model Architecture
As a way of optimizing instance weights, (Sennrich, 2012b) minimize translation model perplexity on a set of phrase pairs, automatically extracted from a parallel development set .
Translation Model Architecture
Cluster a development set into k clusters.
Translation Model Architecture
4.1 Clustering the Development Set
development set is mentioned in 18 sentences in this paper.
Topics mentioned in this paper:
Tang, Duyu and Wei, Furu and Yang, Nan and Zhou, Ming and Liu, Ting and Qin, Bing
Related Work
The training and development sets were completely in full to task participants.
Related Work
However, we were unable to download all the training and development sets because some tweets were deleted or not available due to modified authorization status.
Related Work
The tradeoff parameter of ReEmb (Labutov and Lipson, 2013) is tuned on the development set of SemEval 2013.
development set is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Xiong, Deyi and Zhang, Min and Li, Haizhou
Error Detection with a Maximum Entropy Model
To avoid overfitting, we optimize the Gaussian prior on the development set .
Experiments
We find that the LG parser can not fully parse 560 sentences (63.8%) in the training set (MT-02), 731 sentences (67.6%) in the development set (MT-05) and 660 sentences (71.8%) in the test set (MT-03).
Experiments
To compare with previous work using word posterior probabilities for confidence estimation, we carried out experiments using wpp estimated from N -best lists with the classification threshold 7', which was optimized on our development set to minimize CER.
Features
1This does not mean we do not need a development set .
Features
We do validate our feature selection and other experimental settings on the development set .
Features
We optimize the discrete factor on our development set and find the optimal value is 1.
Introduction
3) Divide words into two groups (correct translations and errors) by using a classification threshold optimized on a development set .
Related Work
rectly output prediction results from our dis-criminatively trained classifier without optimizing a classification threshold on a distinct development set beforehand.1 Most previous approaches make decisions based on a pre-tuned classification threshold 7' as follows
SMT System
For minimum error rate tuning (Och, 2003), we use NIST MT-02 as the development set for the translation task.
development set is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Bansal, Mohit and Klein, Dan
Experiments
Note that most previous work does not report (or need) a standard development set; hence, for tuning our features and its hyper-parameters, we randomly split the original training data into a training and development set with a 70/30 ratio (and then use the full original training set during testing).
Experiments
7Note that the development set is used only for ACE04, because for ACE05, and ACE05-ALL, we directly test using the features tuned on ACE04.
Experiments
Table 2: Incremental results for the Web features on the ACE04 development set .
Semantics via Web Features
As a real example from our development set , the co-occurrence count cm for the headword pair (leaden president) is 11383, while it is only 95 for the headword pair (voter president); after normalization and loglo, the values are -10.9 and -12.0, respectively.
Semantics via Web Features
Also, we do not constrain the order of hl and hg because these patterns can hold for either direction of coreference.4 As a real example from our development set , the 012 count for the headword pair (leaden president) is 752, while for (voter president), it is 0.
Semantics via Web Features
We chose the following three context types, based on performance on a development set:
development set is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Berant, Jonathan and Dagan, Ido and Goldberger, Jacob
Experimental Evaluation
The graphs were randomly split into a development set (11 graphs) and a test set (12 graphs)6.
Experimental Evaluation
The left half depicts methods where the development set was needed to tune parameters, and the right half depicts methods that do not require a (manually created) development set at all.
Experimental Evaluation
Results on the left were achieved by optimizing the top-K parameter on the development set , and on the right by optimizing on the training set automatically generated from WordNet.
Learning Entailment Graph Edges
Note that this constant needs to be optimized on a development set .
Learning Entailment Graph Edges
Importantly, while the score-based formulation contains a parameter A that requires optimization, this probabilistic formulation is parameter free and does not utilize a development set at all.
development set is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Kumar, Shankar and Macherey, Wolfgang and Dyer, Chris and Och, Franz
Discussion
In this paper, we have described how MERT can be employed to estimate the weights for the linear loss function to maximize BLEU on a development set .
Experiments
Our development set (dev) consists of the NIST 2005 eval set; we use this set for optimizing MBR parameters.
Experiments
MERT is then performed to optimize the BLEU score on a development set ; For MERT, we use 40 random initial parameters as well as parameters computed using corpus based statistics (Tromble et al., 2008).
Experiments
We select the MBR scaling factor (Tromble et al., 2008) based on the development set ; it is set to 0.1, 0.01, 0.5, 0.2, 0.5 and 1.0 for the aren-phrase, aren-hier, aren-samt, zhen-phrase zhen-hier and zhen-samt systems respectively.
Introduction
We employ MERT to select these weights by optimizing BLEU score on a development set .
development set is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Sun, Ang and Grishman, Ralph and Sekine, Satoshi
Cluster Feature Selection
Our main idea is to learn the best set of prefix lengths, perhaps through the validation of their effectiveness on a development set of data.
Cluster Feature Selection
Because this method does not need validation on the development set , it is the laziest but the fastest method for selecting clusters.
Cluster Feature Selection
Exhaustive Search (ES): ES works by trying every possible combination of the set I and picking the one that works the best for the development set .
Experiments
The set of prefix lengths that worked the best for the development set was chosen to select clusters.
Experiments
It was interesting that ES did not always outperform the two statistical methods which might be because of its overfitting to the development set .
Experiments
For the semi-supervised system, each test fold was the same one used in the baseline and the other 4 folds were further split into a training set and a development set in a ratio of 7:3 for selecting clusters.
development set is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Bodenstab, Nathan and Dunlop, Aaron and Hall, Keith and Roark, Brian
Beam-Width Prediction
Figure 3 is a visual representation of beam-width prediction on a single sentence of the development set using the Berkeley latent-variable grammar and Boundary FOM.
Conclusion and Future Work
Table 1: Section 22 development set results for CYK and Beam-Search (Beam) parsing using the Berkeley latent-variable grammar.
Open/Closed Cell Classification
We found the value A = 102 to give the best performance on our development set , and we use this value in all of our experiments.
Open/Closed Cell Classification
Figures 2a and 2b compare the pruned charts of Chart Constraints and Constituent Closure for a single sentence in the development set .
Open/Closed Cell Classification
We compare the effectiveness of Constituent Closure, Complete Closure, and Chart Constraints, by decreasing the percentage of chart cells closed until accuracy over all sentences in our development set start to decline.
Results
In Table 1 we present the accuracy and parse time for three baseline parsers on the development set : exhaustive CYK parsing, beam-search parsing using only the inside score BC), and beam-search parsing using the Boundary FOM.
development set is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Turian, Joseph and Ratinov, Lev-Arie and Bengio, Yoshua
Supervised evaluation tasks
training partition sentences, and evaluated their F1 on the development set .
Supervised evaluation tasks
After each epoch over the training set, we measured the accuracy of the model on the development set .
Supervised evaluation tasks
Training was stopped after the accuracy on the development set did not improve for 10 epochs, generally about 50—80 epochs total.
development set is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Radziszewski, Adam
CRF and features
For this purpose the development set was split into training and testing part.
Evaluation
The performed evaluation assumed training of the CRF on the whole development set annotated with the induced transformations and then applying the trained model to tag the evaluation part with transformations.
Evaluation
Observation of the development set suggests that returning the original inflected NPs may be a better baseline.
Preparation of training data
The whole set was divided randomly into the development set (1105 NPs) and evaluation set (564 NPs).
Preparation of training data
The development set was enhanced with word-level transformations that were induced automatically in the following manner.
Preparation of training data
The frequencies of all transformations induced from the development set are given in Tab.
Related works
For development and evaluation, two subsets of NCP were chosen and manually annotated with NP lemmas: development set (112 phrases) and evaluation set (224 phrases).
development set is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Silberer, Carina and Ferrari, Vittorio and Lapata, Mirella
Experimental Setup
We optimized this value on the development set and obtained best results with 5 = 0.
Results
We optimized the model parameters on a development set consisting of cue-associate pairs from Nelson et al.
Results
The best performing model on the development set used 500 visual terms and 750 topics and the association measure proposed in Griffiths et al.
The Attribute Dataset
For most concepts the development set contained a maximum of 100 images and the test set a maximum of 200 images.
The Attribute Dataset
Concepts with less than 800 images in total were split into 1/8 test and development set each, and 3/4 training set.
The Attribute Dataset
The development set was used for devising and refining our attribute annotation scheme.
development set is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Bartlett, Susan and Kondrak, Grzegorz and Cherry, Colin
Syllabification Experiments
Recall that 5K training examples were held out as a development set .
Syllabification with Structured SVMs
We use a linear kernel, and tune the SVM’s cost parameter on a development set .
Syllabification with Structured SVMs
While developing our tagging schemes and feature representation, we used a development set of 5K words held out from our CELEX training data.
Syllabification with Structured SVMs
Our development set experiments suggested that numbering ONC tags increases their performance.
development set is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Heilman, Michael and Cahill, Aoife and Madnani, Nitin and Lopez, Melissa and Mulholland, Matthew and Tetreault, Joel
Experiments
For this experiment, all models were estimated from the training set and evaluated on the development set .
Experiments
For test set evaluations, we trained on the combination of the training and development sets (§2), to maximize the amount of training data for the final experiments.
Experiments
12We selected a threshold for binarization from a grid of 1001 points from 1 to 4 that maximized the accuracy of binarized predictions from a model trained on the training set and evaluated on the binarized development set .
System Description
On 10 preliminary runs with the development set , this variance
System Description
Table l: Pearson’s 7“ on the development set , for our full system and variations excluding each feature type.
development set is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Zhang, Yuan and Lei, Tao and Barzilay, Regina and Jaakkola, Tommi and Globerson, Amir
Experimental Setup
For efficiency, we limit the sentence length to 70 tokens in training and development sets .
Experimental Setup
This gives a 99% pruning recall on the CATiB development set .
Experimental Setup
After pruning, we tune the regularization parameter 0 = {0.l,0.01,0.001} on development sets for different languages.
Sampling-Based Dependency Parsing with Global Features
2In our work we choose oz 2 0.003, which gives a 98.9% oracle POS tagging accuracy on the CATiB development set .
development set is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Wang, Zhiguo and Xue, Nianwen
Experiment
We conducted experiments on the Penn Chinese Treebank (CTB) version 5.1 (Xue et al., 2005): Articles 001-270 and 400-1151 were used as the training set, Articles 301-325 were used as the development set , and Articles 271-300 were used
Experiment
We tuned the optimal number of iterations of perceptron training algorithm on the development set .
Experiment
We trained these three systems on the training set and evaluated them on the development set .
development set is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Liu, Zhanyi and Wang, Haifeng and Wu, Hua and Li, Sheng
Experiments on Parsing-Based SMT
The feature weights are tuned on the development set using the minimum error
Experiments on Phrase-Based SMT
We used the NIST MT-2002 set as the development set and the NIST MT-2004 test set as the test set.
Experiments on Phrase-Based SMT
And Koehn's implementation of minimum error rate training (Och, 2003) is used to tune the feature weights on the development set .
Experiments on Word Alignment
(11), we also manually labeled a development set including 100 sentence pairs, in the same manner as the test set.
Experiments on Word Alignment
By minimizing the AER on the development set , the interpolation coefficients of the collocation probabilities on CM-l and CM-2 were set to 0.1 and 0.9.
Improving Phrase Table
For the phrase only including one word, we set a fixed collocation probability that is the average of the collocation probabilities of the sentences on a development set .
development set is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Zweig, Geoffrey and Platt, John C. and Meek, Christopher and Burges, Christopher J.C. and Yessenalina, Ainur and Liu, Qiang
Experimental Results 5.1 Data Resources
This book contains eleven practice tests, and we used all the sentence completion questions in the first five tests as a development set , and all the questions in the last six tests as the test set.
Experimental Results 5.1 Data Resources
To provide human benchmark performance, we asked six native speaking high school students and five graduate students to answer the questions on the development set .
Experimental Results 5.1 Data Resources
For the LSA—LM, an interpolation weight of 0.1 was used for the LSA score, determined through optimization on the development set .
Sentence Completion via Latent Semantic Analysis
In practice, a f of 1.2 was selected on the basis of development set results.
development set is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Zhao, Shiqi and Lan, Xiang and Liu, Ting and Li, Sheng
Experimental Setup
In our experiments, the development set contains 200 sentences and the test set contains 500 sentences, both of which are randomly selected from the human translations of 2008 NIST Open Machine Translation Evaluation: Chinese to English Task.
Statistical Paraphrase Generation
= cdev(+7“)/cdev(7“), where cdev(7“) is the total number of unit replacements in the generated paraphrases on the development set .
Statistical Paraphrase Generation
Replacement rate (rr): rr measures the paraphrase degree on the development set , i.e., the percentage of words that are paraphrased.
Statistical Paraphrase Generation
We define rr as: 77 = wdev(7“)/wdev(s), where wdev(7“) is the total number of words in the replaced units on the development set, and wdev (s) is the number of words of all sentences on the development set .
development set is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
liu, lemao and Watanabe, Taro and Sumita, Eiichiro and Zhao, Tiejun
Introduction
MERT (Och, 2003), MIRA (Watanabe et al., 2007; Chiang et al., 2008), PRO (Hopkins and May, 2011) and so on, which itera-tively optimize a weight such that, after re-ranking a k-best list of a given development set with this weight, the loss of the resulting l-best list is minimal.
Introduction
where f is a source sentence in a given development set , and ((6*, d*), (6’, d’ is a preference pair for f; N is the number of all preference pairs; A > 0 is a regularizer.
Introduction
Given a development set , we first run pre-training to obtain an initial parameter 61 for Algorithm 1 in line 1.
development set is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Lu, Xiaoming and Xie, Lei and Leung, Cheung-Chi and Ma, Bin and Li, Haizhou
Experimental setup
We separated this corpus into three non-overlapping sets: a training set of 500 programs for parameter estimation in topic modeling and LE, a development set of 133 programs for empirical tuning and a test set of 400 programs for performance evaluation.
Experimental setup
A number of parameters were set through empirical tuning on the developent set .
Experimental setup
Figure 1 shows the results on the development set and the test set.
development set is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Cai, Jingsheng and Utiyama, Masao and Sumita, Eiichiro and Zhang, Yujie
Dependency-based Pre-ordering Rule Set
4 Conduct primary experiments which used the same training set and development set as the experiments described in Section 3.
Dependency-based Pre-ordering Rule Set
In the primary experiments, we tested the effectiveness of the candidate rules and filtered the ones that did not work based on the BLEU scores on the development set .
Experiments
Our development set was the official NIST MT evaluation data from 2002 to 2005, consisting of 4476 Chinese-English sentences pairs.
Experiments
lected from the development set .
Experiments
The evaluation set contained 200 sentences randomly selected from the development set .
development set is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Li, Qi and Ji, Heng and Huang, Liang
Experiments
For comparison, we used the same test set with 40 newswire articles (672 sentences) as in (J i and Grishman, 2008; Liao and Grishman, 2010) for the experiments, and randomly selected 30 other documents (863 sentences) from different genres as the development set .
Experiments
We use the harmonic mean of the trigger’s F1 measure and argument’s F1 measure to measure the performance on the development set .
Experiments
Figure 6 shows the training curves of the averaged perceptron with respect to the performance on the development set when the beam size is 4.
development set is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Zeller, Britta and Šnajder, Jan and Padó, Sebastian
Building the Resource
To this end, we constructed a development set comprised of a sample of 1,000 derivational families induced using our rules.
Building the Resource
We also estimated the reliability of derivational rules by analyzing the accuracy of each rule on the development set .
Evaluation
We have considered a number of string distance measures and tested them on the development set (cf.
Evaluation
This is based on preliminary experiments on the development set (cf.
Evaluation
Lemmas included in the development set (Section 4.1) were excluded from sampling.
development set is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Björkelund, Anders and Kuhn, Jonas
Results
the English development set as a function of number of training iterations with two different beam sizes, 20 and 100, over the local and nonlocal feature sets.
Results
In Figure 4 we compare early update with LaSO and delayed LaSO on the English development set .
Results
Table 1 displays the differences in F-measures and CoNLL average between the local and nonlocal systems when applied to the development sets for each language.
development set is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Bender, Emily M.
Wambaya grammar
In addition, this grammar has relatively low ambiguity, assigning on average 11.89 parses per item in the development set .
Wambaya grammar
Of the 92 sentences in this text, 20 overlapped with items in the development set , so the
Wambaya grammar
The parsed portion of the development set (732 items) constitutes a sufficiently large corpus to train a parse selection model using the Redwoods disambiguation technology (Toutanova et al., 2005).
development set is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Zhao, Qiuye and Marcus, Mitch
Abstract
Following most previous work, e. g. (Collins, 2002) and (Shen et al., 2007), we divide this corpus into training set (sections 0-18), development set (sections 19-21) and the final test set (sections 22-24).
Abstract
Following (J iang et al., 2008a), we divide this corpus into training set (chapters 1-260), development set (chapters 271-300) and the final test set (chapters 301-325).
Abstract
Experiments in this section are carried out on the development set .
development set is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Monroe, Will and Green, Spence and Manning, Christopher D.
Error Analysis
We sampled 100 errors randomly from all errors made by our final model (trained on all three datasets with domain adaptation and additional features) on the ARZ development set ; see Table 4.
Error Analysis
Table 4: Counts of error categories (out of 100 randomly sampled ARZ development set errors).
Error Analysis
One example of this distinction that appeared in the development set is the pair any)» mawdm“‘my topic” (yo madeZ< + 6.
Experiments
F1 scores provide a more informative assessment of performance than word-level or character-level accuracy scores, as over 80% of tokens in the development sets consist of only one segment, with an average of one segmentation every 4.7 tokens (or one every 20.4 characters).
Experiments
Table 1 contains results on the development set for the model of Green and DeNero and our improvements.
development set is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Tang, Hao and Keshet, Joseph and Livescu, Karen
Experiments
The regularization parameter A is tuned on the development set .
Experiments
We run all three algorithms for multiple epochs and pick the best epoch based on development set performance.
Experiments
For the first set of experiments, we use the same division of the corpus as in (Livescu and Glass, 2004; Jyothi et al., 2011) into a 2492—word training set, a 165-word development set , and a 236-word test set.
development set is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Nguyen, Thang and Hu, Yuening and Boyd-Graber, Jordan
Regularization Improves Topic Models
We split each dataset into a training fold (70%), development fold (15%), and a test fold (15%): the training data are used to fit models; the development set are used to select parameters (anchor threshold M, document prior 04, regularization weight A); and final results are reported on the test fold.
Regularization Improves Topic Models
We select 04 using grid search on the development set .
Regularization Improves Topic Models
4.1 Grid Search for Parameters on Development Set
development set is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Chambers, Nathanael
Datasets
In other words, the development set includes documents from July 1995, July 1996, July 1997, etc.
Datasets
The development set includes 7,300 from July of each year.
Experiments and Results
The A factor in the joint classifier is optimized on the development set as described in Section 4.3.
Experiments and Results
The features described in this paper were selected solely by studying performance on the development set .
Learning Time Constraints
Figure 4: Development set accuracy and A values.
development set is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Berant, Jonathan and Liang, Percy
Empirical evaluation
We tuned the L1 regularization strength, developed features, and ran analysis experiments on the development set (averaging across random splits).
Empirical evaluation
To further examine this, we ran BCFL13 on the development set , allowing it to use only predicates from logical forms suggested by our logical form construction step.
Empirical evaluation
This improved oracle accuracy on the development set to 64.5%, but accuracy was 32.2%.
development set is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Zhu, Muhua and Zhang, Yue and Chen, Wenliang and Zhang, Min and Zhu, Jingbo
Experiments
Table 5: Experimental results on the English and Chinese development sets with the padding technique and new supervised features added incrementally.
Experiments
Table 6: Experimental results on the English and Chinese development sets with different types of semi-supervised features added incrementally to the extended parser.
Experiments
on the development sets .
development set is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Zhai, Feifei and Zhang, Jiajun and Zhou, Yu and Zong, Chengqing
Experiment
The development set and test set come from the NIST evaluation test data (from 2003 to 2005).
Experiment
Finally, the development set includes 595 sentences from NIST MT03 and the test set contains 1,786 sentences from NIST MT04 and MT05.
Experiment
We perform SRL on the source part of the training set, development set and test set by the Chinese SRL system used in (Zhuang and Zong, 2010b).
development set is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Ji, Heng and Grishman, Ralph
Experimental Results and Analysis
We used 10 newswire texts from ACE 2005 training corpora (from March to May of 2003) as our development set , and then conduct blind test on a separate set of 40 ACE 2005 newswire texts.
Experimental Results and Analysis
We select the thresholds (d with k=1~13) for various confidence metrics by optimizing the F-measure score of each rule on the development set , as shown in Figure 2 and 3 as follows.
Experimental Results and Analysis
The labeled point on each curve shows the best F-measure that can be obtained on the development set by adjusting the threshold for that rule.
development set is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Yang, Qiang and Chen, Yuqiang and Xue, Gui-Rong and Dai, Wenyuan and Yu, Yong
Experiments
To empirically investigate the parameter A and the convergence of our algorithm aPLSA, we generated five more date sets as the development sets .
Experiments
The detailed description of these five development sets , namely tunel to tune5 is listed in Table 1 as well.
Experiments
We have done some experiments on the development sets to investigate how different A affect the performance of aPLSA.
development set is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Chambers, Nathanael and Jurafsky, Daniel
Experiments
We randomly chose 9 documents from the year 2001 for a development set , and 41 documents for testing.
How Frequent is Unseen Data?
We then record every seen (vd, n) pair during training that is seen two or more times3 and then count the number of unseen pairs in the NYT development set (1455 tests).
How Frequent is Unseen Data?
Figure 1: Percentage of NYT development set that is unseen when trained on varying amounts of data.
How Frequent is Unseen Data?
Figure 2: Percentage of subject/object/preposition arguments in the NYT development set that is unseen when trained on varying amounts of NYT data.
development set is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Blunsom, Phil and Cohn, Trevor and Osborne, Miles
Discussion and Further Work
_ tences ( development set )
Evaluation
A comparison on the impact of accounting for all derivations in training and decoding ( development set ).
Evaluation
The effect of the beam width (log-scale) on max-translation decoding ( development set ).
Evaluation
An informal comparison of the outputs on the development set , presented in Table 4, suggests that the
development set is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Anzaroot, Sam and Passos, Alexandre and Belanger, David and McCallum, Andrew
Citation Extraction Data
There are 660 citations in the development set and 367 citation in the test set.
Citation Extraction Data
We then use the development set to learn the penalties for the soft constraints, using the perceptron algorithm described in section 3.1.
Citation Extraction Data
We instantiate constraints from each template in section 5.1, iterating over all possible labels that contain a B prefix at any level in the hierarchy and pruning all constraints with imp(c) < 2.75 calculated on the development set .
Soft Constraints in Dual Decomposition
We found it beneficial, though it is not theoretically necessary, to learn the constraints on a held-out development set , separately from the other model parameters, as during training most constraints are satisfied due to overfitting, which leads to an underestimation of the relevant penalties.
development set is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Jiang, Wenbin and Huang, Liang and Liu, Qun
Experiments
The data splitting convention of other two corpora, People’s Daily doesn’t reserve the development sets , so in the following experiments, we simply choose the model after 7 iterations when training on this corpus.
Experiments
Table 4: Error analysis for Joint S&T on the developing set of CTB.
Experiments
To obtain further information about what kind of errors be alleviated by annotation adaptation, we conduct an initial error analysis for Joint S&T on the developing set of CTB.
development set is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Kruengkrai, Canasai and Uchimoto, Kiyotaka and Kazama, Jun'ichi and Wang, Yiou and Torisawa, Kentaro and Isahara, Hitoshi
Experiments
Note that the development set was only used for evaluating the trained model to obtain the optimal values of tunable parameters.
Experiments
For the baseline policy, we varied 7“ in the range of [1, 5] and found that setting 7“ = 3 yielded the best performance on the development set for both the small and large training corpus experiments.
Experiments
Optimal balances were selected using the development set .
Policies for correct path selection
4In our experiments, the optimal threshold value 7" is selected by evaluating the performance of joint word segmentation and POS tagging on the development set.
development set is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Ganchev, Kuzman and Graça, João V. and Taskar, Ben
Phrase-based machine translation
For each alignment model and decoding type we train Moses and use MERT optimization to tune its parameters on a development set .
Phrase-based machine translation
For Hansards we randomly chose 1000 and 500 sentences from test 1 and test 2 to be testing and development sets respectively.
Phrase-based machine translation
In principle, we would like to tune the threshold by optimizing BLEU score on a development set , but that is impractical for experiments with many pairs of languages.
Word alignment results
Figure 7 shows an example of the same sentence, using the same model where in one case Viterbi decoding was used and in the other case Posterior decoding tuned to minimize AER on a development set
development set is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Wu, Hua and Wang, Haifeng
Experiments
For Chinese-English-Spanish translation, we used the development set (devset3) released for the pivot task as the test set, which contains 506 source sentences, with 7 reference translations in English and Spanish.
Experiments
To be capable of tuning parameters on our systems, we created a development set of 1,000 sentences taken from the training sets, with 3 reference translations in both English and Spanish.
Experiments
This development set is also used to train the regression learning model.
development set is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Liu, Yang
Introduction
Adding dependency language model (“depLM”) and the maximum entropy shift-reduce parsing model (“maxent”) significantly improves BLEU and TER on the development set , both separately and jointly.
Introduction
We used the 2002 NIST MT Chinese-English dataset as the development set and the 2003-2005 NIST datasets as the testsets.
Introduction
BLEU and TER scores are calculated on the development set .
development set is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Hall, David and Durrett, Greg and Klein, Dan
Annotations
Table 2: Results for the Penn Treebank development set , sentences of length g 40, for different annotation schemes implemented on top of the X-bar grammar.
Features
Table 1 shows the results of incrementally building up our feature set on the Penn Treebank development set .
Other Languages
(2013) only report results on the development set for the Berkeley-Rep model; however, the task organizers also use a version of the Berkeley parser provided with parts of speech from high-quality POS taggers for each language (Berkeley-Tags).
Other Languages
On the development set , we outperform the Berkeley parser and match the performance of the Berkeley-Rep parser.
development set is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Feng, Yansong and Lapata, Mirella
BBC News Database
the kernel whose value is optimized on the development set .
BBC News Database
where 0t is a smoothing parameter tuned on the development set , sa is the annotation for the latent variable 5 and sd its corresponding document.
BBC News Database
where ,u is a smoothing parameter estimated on the development set , lama is a Boolean variable denoting whether w appears in the annotation sa, and Nw is the number of latent variables that contain w in their annotations.
development set is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Jiang, Wenbin and Sun, Meng and Lü, Yajuan and Yang, Yating and Liu, Qun
Experiments
Figure 3: Learning curve of the averaged perceptron classifier on the CTB developing set .
Experiments
We train the baseline perceptron classifier for word segmentation on the training set of CTB 5.0, using the developing set to determine the best training iterations.
Experiments
Figure 3 shows the learning curve of the averaged perceptron on the developing set .
development set is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Ma, Ji and Zhang, Yue and Zhu, Jingbo
Experiments
While emails and weblogs are used as the development sets , reviews, news groups and Yahoo!Answers are used as the final test sets.
Experiments
All these parameters are selected according to the averaged accuracy on the development set .
Experiments
Experimental results under the 4 combined settings on the development sets are illustrated in Figure 2, 3 and 4, where the
development set is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Jiang, Wenbin and Liu, Qun
Boosting an MST Parser
The relative weight A is adjusted to maximize the performance on the development set , using an algorithm similar to minimum error-rate training (Och, 2003).
Experiments
Figure 3: Performance curves of the word-pair classification model on the development sets of WSJ and CTB 5.0, with respect to a series of ratio 7“.
Experiments
Figure 4: The performance curve of the word-pair classification model on the development set of CTB 5.0, with respect to a series of threshold (9.
Experiments
Then, on each instance set we train a classifier and test it on the development set of CTB 5.0.
development set is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Xiao, Tong and Zhu, Jingbo and Zhu, Muhua and Wang, Huizhen
Background
1 The data set used for weight training is generally called development set or tuning set in the SMT field.
Background
We see, first of all, that all the three systems are improved during iterations on the development set .
Background
iteration number Figure 2: BLEU scores on the development set
development set is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Simianer, Patrick and Riezler, Stefan and Dyer, Chris
Abstract
We present eXperiments on learning on 1.5 million training sentences, and show significant improvements over tuning discriminative models on small development sets .
Experiments
The results on the news-commentary (nc) data show that training on the development set does not benefit from adding large feature sets — BLEU result differences between tuning 12 default features
Experiments
However, scaling all features to the full training set shows significant improvements for algorithm 3, and especially for algorithm 4, which gains 0.8 BLEU points over tuning 12 features on the development set .
Introduction
Our resulting models are learned on large data sets, but they are small and outperform models that tune feature sets of various sizes on small development sets .
development set is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Chen, Wenliang and Zhang, Min and Li, Haizhou
Experiments
Figure 4 shows the UAS curves on the development set , where K is beam size for Intersect and K-best for Rescoring, the X-aXis represents K, and the Y—aXis represents the UAS scores.
Experiments
Table 3: The parsing times on the development set (seconds for all the sentences)
Experiments
Table 3 shows the parsing times of Intersect on the development set for English.
Implementation Details
The numbers, 10% and 30%, are tuned on the development sets in the experiments.
development set is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Cheung, Jackie Chi Kit and Penn, Gerald
Introduction
There are 216 documents and 4126 original-permutation pairs in the training set, and 24 documents and 465 pairs in the development set .
Introduction
Transition length, salience, and a regularization parameter are tuned on the development set .
Introduction
We only report results using the setting of transition length g 4, and no salience threshold, because they give the best performance on the development set .
development set is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Davidov, Dmitry and Rappoport, Ari
Corpora and Parameters
We used part of the Russian corpus as a development set for determining the parameters.
Corpora and Parameters
On our development set we have tested various parameter settings.
Corpora and Parameters
In our experiments we have used the following values (again, determined using a development set ) for these parameters: F0: 1,000 words per million (wpm); FH: 100 wpm; FB: 1.2 wpm; N: 500 words; W: 5 words; L: 30%; S: 2/3; 04: 0.1.
development set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Boxwell, Stephen and Mehay, Dennis and Brew, Chris
Argument Mapping Model
Given these features with gold standard parses, our argument mapping model can predict entire argument mappings with an accuracy rate of 87.96% on the test set, and 87.70% on the development set .
Identification and Labeling Models
All classifiers were trained to 500 iterations of L-BFGS training — a quasi-Newton method from the numerical optimization literature (Liu and N o-cedal, 1989) — using Zhang Le’s maxent toolkit.2 To prevent overfitting we used Gaussian priors with global variances of l and 5 for the identifier and labeler, respectively.3 The Gaussian priors were determined empirically by testing on the development set .
Identification and Labeling Models
4The size of the window was determined experimentally on the development set — we use the same window sizes throughout.
development set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Zhou, Guangyou and Liu, Fang and Liu, Yang and He, Shizhu and Zhao, Jun
Experiments
We tune the parameters on a small development set of 50 questions.
Experiments
This development set is also extracted from Yahoo!
Experiments
For parameter K, we do an experiment on the development set to determine the optimal values among 50, 100, 150, - - - , 300 in terms of MAP.
development set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Koo, Terry and Carreras, Xavier and Collins, Michael
Experiments
The English experiments were performed on the Penn Treebank (Marcus et al., 1993), using a standard set of head-selection rules (Yamada and Matsumoto, 2003) to convert the phrase structure syntax of the Treebank to a dependency tree representation.6 We split the Treebank into a training set (Sections 2—21), a development set (Section 22), and several test sets (Sections 0,7 1, 23, and 24).
Experiments
of iterations of perceptron training, we performed up to 30 iterations and chose the iteration which optimized accuracy on the development set .
Experiments
Table 6: Parent-prediction accuracies of unlabeled Czech parsers on the PDT 1.0 development set .
development set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Jeon, Je Hun and Liu, Yang
Co-training strategy for prosodic event detection
Development Set 20 1,356 2,275 f2b, f3b Labeled set L 5 347 573 m2b, m3b Unlabeled set U 1,027 77,207 129,305 m4b
Conclusions
In our experiment, we used some labeled data as development set to estimate some parameters.
Experiments and results
Among labeled data, 102 utterances of all f] a and m] 19 speakers are used for testing, 20 utterances randomly chosen from f2b, f3b, m2b, m3b, and m4b are used as development set to optimize parameters such as A and confidence level threshold, 5 utterances are used as the initial training set L, and the rest of the data is used as unlabeled set U, which has 1027 unlabeled utterances (we removed the human labels for co-training experiments).
development set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Auli, Michael and Gao, Jianfeng
Expected BLEU Training
1We tuned AM+1 on the development set but found that AM+1 = 1 resulted in faster training and equal accuracy.
Expected BLEU Training
We fix 6 and re-optimize A in the presence of the recurrent neural network model using Minimum Error Rate Training (Och, 2003) on the development set (§5).
Experiments
ther lattices or the unique 100-best output of the phrase-based decoder and reestimate the log-linear weights by running a further iteration of MERT on the n-best list of the development set , augmented by scores corresponding to the neural network models.
development set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Xu, Wenduan and Clark, Stephen and Zhang, Yue
Experiments
The beam size was tuned on the development set , and a value of 128 was found to achieve a reasonable balance of accuracy and speed; hence this value was used for all experiments.
Experiments
dependency length on the development set .
Experiments
Table 1 shows the accuracies of all parsers on the development set , in terms of labeled precision and recall over the predicate-argument dependencies in CCGBank.
development set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Xiong, Deyi and Zhang, Min
Experiments
We used the NIST MT03 evaluation test data as our development set , and the NIST MT05 as the test set.
Experiments
Table 4: Experiment results of the sense-based translation model (STM) with lexicon and sense features extracted from a window of size varying from $5 to $15 words on the development set .
Experiments
Our first group of experiments were conducted to investigate the impact of the window size k on translation performance in terms of BLEUMIST on the development set .
development set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Huang, Liang
Experiments
We use the standard split of the Treebank: sections 02-21 as the training data (39832 sentences), section 22 as the development set (1700 sentences), and section 23 as the test set (2416 sentences).
Experiments
The development set and the test set are parsed with a model trained on all 39832 training sentences.
Experiments
We use the development set to determine the optimal number of iterations for averaged perceptron, and report the F1 score on the test set.
development set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Branavan, S.R.K. and Chen, Harr and Eisenstein, Jacob and Barzilay, Regina
Experimental Setup
Training Our model needs to be provided with the number of clusters K. We set K large enough for the model to learn effectively on the development set .
Experimental Setup
A threshold for this proportion is set for each property via the development set .
Model Description
Properties with proportions above a set threshold (tuned on a development set ) are predicted as being supported.
development set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Rasooli, Mohammad Sadegh and Lippincott, Thomas and Habash, Nizar and Rambow, Owen
Evaluation
We sampled data from the training and development set of the Persian dependency treebank (Rasooli et al., 2013) to create a comparable seventh dataset in Persian.
Evaluation
00 is the upper-bound OOV reduction for our expansion model: for each word in the development set , we ask if our model, without any vocabulary size restriction at all, could generate it.
Evaluation
Table 5: Results from running a handcrafted Turkish morphological analyzer (Oflazer, 1996) on different expansions and on the development set .
development set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Deng, Yonggang and Xu, Jia and Gao, Yuqing
A Generic Phrase Training Procedure
In the final step 4 (line 15), parameters {Am 7'} are discriminatively trained on a development set using the downhill simplex method (Nelder and Mead, 1965).
Discussions
We use feature functions to decide the order and the threshold 7' to locate the boundary guided with a development set .
Introduction
A significant deviation from most other approaches is that the framework is parameterized and can be optimized jointly with the decoder to maximize translation performance on a development set .
development set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Espinosa, Dominic and White, Michael and Mehay, Dennis
Results and Discussion
The the whole feature set was found in feature ablation testing on the development set to outperform all other feature subsets significantly (p < 2.2 - 10—16).
Results and Discussion
The development set (00) was used to tune the 6 parameter to obtain reasonable hypertag ambiguity levels; the model was not otherwise tuned to it.
Results and Discussion
limit; on the development set , this improvement eli-mates more than the number of known search errors (cf.
development set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Mehdad, Yashar and Carenini, Giuseppe and Ng, Raymond T.
Experimental Setup
For parameters estimation, we tune all parameters (utterance selection and path ranking) ex-haustively with 0.1 intervals using our development set .
Phrasal Query Abstraction Framework
The parameters a and fl are tuned on a development set and sum up to 1.
Phrasal Query Abstraction Framework
We estimate the percentage of the retrieved utterances based on the development set .
development set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Kauchak, David
Language Model Evaluation: Lexical Simplification
The data set contains a development set of 300 examples and a test set of 1710 examples.3 For our experiments, we evaluated the models on the test set.
Language Model Evaluation: Lexical Simplification
The best lambda was chosen based on a linear search optimized on the SemEval 2012 development set .
Why Does Unsimplified Data Help?
For the simplification task, the optimal lambda value determined on the development set was 0.98, with a very strong bias towards the simple model.
development set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Wing, Benjamin and Baldridge, Jason
Data
Running on the full data set is time-consuming, so development was done on a subset of about 80,000 articles (19.9 million tokens) as a training set and 500 articles as a development set .
Experiments
For both Wikipedia and Twitter, preliminary experiments on the development set were run to plot the prediction error for each method for each level of resolution, and the optimal resolution for each method was chosen for obtaining test results.
Experiments
We recomputed the distributions using several values for both parameters and evaluated on the development set .
development set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Zollmann, Andreas and Vogel, Stephan
Experiments
We evaluate our approach by comparing translation quality, as evaluated by the IBM-BLEU (Papineni et al., 2002) metric on the NIST Chinese-to-English translation task using MT04 as development set to train the model parameters A, and MTOS, MT06 and MT08 as test sets.
Experiments
We therefore choose N merely based on development set performance.
Experiments
Unfortunately, variance in development set BLEU scores tends to be higher than test set scores, despite of SAMT MERT’s inbuilt algorithms to overcome local optima, such as random restarts and zeroing-out.
development set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Lin, Hui and Bilmes, Jeff
Experiments
We used DUC-03 as our development set , and tested on DUC-04 data.
Experiments
DUC-03 was used as development set .
Experiments
Figure 1 illustrates how ROUGE-1 scores change when 04 and K vary on the development set (DUC-03).
development set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Auli, Michael and Lopez, Adam
Experiments
We measured the evolving accuracy of the models on the development set (Figure 4).
Oracle Parsing
The reason that using the gold-standard supertags doesn’t result in 100% oracle parsing accuracy is that some of the development set parses cannot be constructed by the learned grammar.
Oracle Parsing
riety of fixed beam settings (Figure l), considering only the subset of our development set which could be parsed with all beam settings.
development set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Chen, Xiao and Kit, Chunyu
Constituent Recombination
The parameters )V; and p are tuned by the Powell’s method (Powell, 1964) on a development set , using the F1 score of PARSEVAL (Black et al., 1991) as objective.
Experiment
For parser combination, we follow the setting of Fossum and Knight (2009), using Section 24 instead of Section 22 of WSJ treebank as development set .
Experiment
It is tuned on a development set using the gold sec-
development set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Green, Spence and DeNero, John
Experiments
Table 1: Intrinsic evaluation accuracy [‘70] ( development set ) for Arabic segmentation and tagging.
Experiments
1 shows development set accuracy for two settings.
Experiments
We tuned the feature weights on a development set using lattice-based minimum error rate training (MERT) (Macherey et al.,
development set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Shindo, Hiroyuki and Miyao, Yusuke and Fujino, Akinori and Nagata, Masaaki
Experiment
We estimated the optimal values of the stopping probabilities s by using the development set .
Experiment
In all our experiments, we conducted ten independent runs to train our model, and selected the one that performed best on the development set in terms of parsing accuracy.
Experiment
development set (S 100).
development set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Sun, Jun and Zhang, Min and Tan, Chew Lim
Substructure Spaces for BTKs
The coefficient Oi for the composite kernel are tuned with respect to F-measure (F) on the development set of HIT corpus.
Substructure Spaces for BTKs
Those thresholds are also tuned on the development set of HIT corpus with respect to F-measure.
Substructure Spaces for BTKs
We use these sentences with less than 50 characters from the NIST MT-2002 test set as the development set (to speed up tuning for syntax based system) and the NIST MT-2005 test set as our test set.
development set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Berg-Kirkpatrick, Taylor and Durrett, Greg and Klein, Dan
Experiments
We used as a development set ten additional documents from the Old Bailey proceedings and five additional documents from Trove that were not part of our test set.
Results and Analysis
This slightly improves performance on our development set and can be thought of as placing a prior on the glyph shape parameters.
Results and Analysis
We performed error analysis on our development set by randomly choosing 100 word errors from the WER alignment and manually annotating them with relevant features.
development set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Feng, Yang and Cohn, Trevor
Experiments
We used minimum error rate training (Och, 2003) to tune the feature weights to maximise the BLEU score on the development set .
Experiments
For the development set we use both ASR devset l and 2 from IWSLT 2005, and
Experiments
For the development set we use the NIST 2002 test set, and evaluate performance on the test sets from NIST 2003
development set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Zeng, Xiaodong and Wong, Derek F. and Chao, Lidia S. and Trancoso, Isabel
Experiment
The development sets are mainly used to tune the values of the weight factor 04 in Equation 5.
Experiment
We evaluated the performance (F-score) of our model on the three development sets by using different 04 values, where 04 is progressively increased in steps of 0.1 (0 < 04 < 1.0).
Experiment
1The “baseline” uses a different training configuration so that the oz values in the decoding are also need to be tuned on the development sets .
development set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Li, Peifeng and Zhu, Qiaoming and Zhou, Guodong
Experimentation
Besides, we reserve 33 documents in the training set as the development set and use the ground truth entities, times and values for our training and testing.
Experimentation
Our statistics on the development set shows almost 65% of the event mentions are involved in those Correfrence, Parallel and Sequence relations, which occupy 63%, 50%, 9% respectively6.
Inferring Inter-Sentence Arguments on Relevant Event Mentions
development set ; tri and tri ’ are triggers of kth and k’th event mention whose event types are et and et ’ in S<,~,j> and S<,~,,~> respectively.
development set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Poon, Hoifung
Experiments
We used the development set for initial development and tuning hyperparameters.
Experiments
For the GUSP system, we set the hyperparame-ters from initial experiments on the development set , and used them in all subsequent experiments.
Grounded Unsupervised Semantic Parsing
In preliminary experiments on the development set , we found that the naive model (with multinomials as conditional probabilities) did not perform well in EM.
development set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Sartorio, Francesco and Satta, Giorgio and Nivre, Joakim
Experimental Assessment
We use sections 2-21 for training, 22 as development set , and 23 as test set.
Experimental Assessment
We train all parsers up to 30 iterations, and for each parser we select the weight vector (3 from the iteration with the best accuracy on the development set .
Experimental Assessment
We have computed the average value of 7 on our English data set, resulting in 2.98 (variance 2.15) for training set, and 2.95 (variance 1.96) for development set .
development set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Wang, Dong and Liu, Yang
Experiments
The 18 conversations annotated by all 3 annotators are used as test set, and the rest of 70 conversations are used as development set to tune the parameters (determining the best combination weights).
Experiments
From the development set , we used the grid search method to obtain the best combination weights for the two summarization methods.
Experiments
In the sentence-ranking method, the best parameters found on the development set are Asim = 0, Are; 2 0.3, Agent 2 0.3, Alen = 0.4.
development set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Tian, Zhenhua and Xiang, Hengheng and Liu, Ziqi and Zheng, Qinghua
RSP: A Random Walk Model for SP
We split the test set equally into two parts: one as the development set and the other as the final test set.
RSP: A Random Walk Model for SP
Parameters Tuning: The parameters are tuned on the PTB development set , using AFP as the generalization data.
RSP: A Random Walk Model for SP
This experiment is conducted on the PTB development set with RND confounders.
development set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Wang, Chenguang and Duan, Nan and Zhou, Ming and Zhang, Ming
Experiment
We use the first 1,419 queries together with their annotated documents as the development set to tune paraphrasing parameters (as we discussed in Section 2.3), and use the rest as the test set.
Experiment
The ranking model is trained based on the development set .
Paraphrasing for Web Search
{Qi,D{4abel}f=1 is a human-labeled development set .
development set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Wang, Kun and Zong, Chengqing and Su, Keh-Yih
Experiments
We randomly selected a development set and a test set, and then the remaining sentence pairs are for training set.
Experiments
Furthermore, development set and test set are divided into various intervals according to their best fuzzy match scores.
Experiments
All the feature weights and the weight for each probability factor (3 factors for Model-III) are tuned on the development set with minimum-error-rate training (MERT) (Och, 2003).
development set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Yang, Bishan and Cardie, Claire
Experiments
We set aside 132 documents as a development set and use 350 documents as the evaluation set.
Experiments
We used L2-regu1arization; the regularization parameter was tuned using the development set .
Experiments
The parameter A was tuned using the development set .
development set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Cheung, Jackie Chi Kit and Penn, Gerald
Experiments
The number of iterations was determined by experiments on the development set .
Experiments
Tuning the parameter settings on the development set , we found that parameterized categories, binarization, and including punctuation gave the best F1 performance.
Experiments
While experimenting with the development set of TuBa-D/Z, we noticed that the parser sometimes returns parses, in which paired punctuation (e.g.
development set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: