Index of papers in Proc. ACL 2009 that mention
  • development set
Niu, Zheng-Yu and Wang, Haifeng and Wu, Hua
Experiments of Grammar Formalism Conversion
Table 4: Results of the generative parser on the development set , when trained with various weighting of CTB training set and CDTPS .
Experiments of Parsing
We used a standard split of CTB for performance evaluation, articles 1-270 and 400-1151 as training set, articles 301-325 as development set , and articles 271-300 as test set.
Experiments of Parsing
We tried the corpus weighting method when combining CDTPS with CTB training set (abbreviated as CTB for simplicity) as training data, by gradually increasing the weight (including 1, 2, 5, 10, 20, 50) of CTB to optimize parsing performance on the development set .
Experiments of Parsing
Table 4 presents the results of the generative parser with various weights of CTB on the development set .
Our Two-Step Solution
The number of removed trees will be determined by cross validation on development set .
Our Two-Step Solution
The value of A will be tuned by cross validation on development set .
Our Two-Step Solution
Corpus weighting is exactly such an approach, with the weight tuned on development set , that will be used for parsing on homogeneous treebanks in this paper.
development set is mentioned in 13 sentences in this paper.
Topics mentioned in this paper:
Kumar, Shankar and Macherey, Wolfgang and Dyer, Chris and Och, Franz
Discussion
In this paper, we have described how MERT can be employed to estimate the weights for the linear loss function to maximize BLEU on a development set .
Experiments
Our development set (dev) consists of the NIST 2005 eval set; we use this set for optimizing MBR parameters.
Experiments
MERT is then performed to optimize the BLEU score on a development set ; For MERT, we use 40 random initial parameters as well as parameters computed using corpus based statistics (Tromble et al., 2008).
Experiments
We select the MBR scaling factor (Tromble et al., 2008) based on the development set ; it is set to 0.1, 0.01, 0.5, 0.2, 0.5 and 1.0 for the aren-phrase, aren-hier, aren-samt, zhen-phrase zhen-hier and zhen-samt systems respectively.
Introduction
We employ MERT to select these weights by optimizing BLEU score on a development set .
development set is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Zhao, Shiqi and Lan, Xiang and Liu, Ting and Li, Sheng
Experimental Setup
In our experiments, the development set contains 200 sentences and the test set contains 500 sentences, both of which are randomly selected from the human translations of 2008 NIST Open Machine Translation Evaluation: Chinese to English Task.
Statistical Paraphrase Generation
= cdev(+7“)/cdev(7“), where cdev(7“) is the total number of unit replacements in the generated paraphrases on the development set .
Statistical Paraphrase Generation
Replacement rate (rr): rr measures the paraphrase degree on the development set , i.e., the percentage of words that are paraphrased.
Statistical Paraphrase Generation
We define rr as: 77 = wdev(7“)/wdev(s), where wdev(7“) is the total number of words in the replaced units on the development set, and wdev (s) is the number of words of all sentences on the development set .
development set is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Yang, Qiang and Chen, Yuqiang and Xue, Gui-Rong and Dai, Wenyuan and Yu, Yong
Experiments
To empirically investigate the parameter A and the convergence of our algorithm aPLSA, we generated five more date sets as the development sets .
Experiments
The detailed description of these five development sets , namely tunel to tune5 is listed in Table 1 as well.
Experiments
We have done some experiments on the development sets to investigate how different A affect the performance of aPLSA.
development set is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Jiang, Wenbin and Huang, Liang and Liu, Qun
Experiments
The data splitting convention of other two corpora, People’s Daily doesn’t reserve the development sets , so in the following experiments, we simply choose the model after 7 iterations when training on this corpus.
Experiments
Table 4: Error analysis for Joint S&T on the developing set of CTB.
Experiments
To obtain further information about what kind of errors be alleviated by annotation adaptation, we conduct an initial error analysis for Joint S&T on the developing set of CTB.
development set is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Kruengkrai, Canasai and Uchimoto, Kiyotaka and Kazama, Jun'ichi and Wang, Yiou and Torisawa, Kentaro and Isahara, Hitoshi
Experiments
Note that the development set was only used for evaluating the trained model to obtain the optimal values of tunable parameters.
Experiments
For the baseline policy, we varied 7“ in the range of [1, 5] and found that setting 7“ = 3 yielded the best performance on the development set for both the small and large training corpus experiments.
Experiments
Optimal balances were selected using the development set .
Policies for correct path selection
4In our experiments, the optimal threshold value 7" is selected by evaluating the performance of joint word segmentation and POS tagging on the development set.
development set is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Wu, Hua and Wang, Haifeng
Experiments
For Chinese-English-Spanish translation, we used the development set (devset3) released for the pivot task as the test set, which contains 506 source sentences, with 7 reference translations in English and Spanish.
Experiments
To be capable of tuning parameters on our systems, we created a development set of 1,000 sentences taken from the training sets, with 3 reference translations in both English and Spanish.
Experiments
This development set is also used to train the regression learning model.
development set is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Boxwell, Stephen and Mehay, Dennis and Brew, Chris
Argument Mapping Model
Given these features with gold standard parses, our argument mapping model can predict entire argument mappings with an accuracy rate of 87.96% on the test set, and 87.70% on the development set .
Identification and Labeling Models
All classifiers were trained to 500 iterations of L-BFGS training — a quasi-Newton method from the numerical optimization literature (Liu and N o-cedal, 1989) — using Zhang Le’s maxent toolkit.2 To prevent overfitting we used Gaussian priors with global variances of l and 5 for the identifier and labeler, respectively.3 The Gaussian priors were determined empirically by testing on the development set .
Identification and Labeling Models
4The size of the window was determined experimentally on the development set — we use the same window sizes throughout.
development set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Cheung, Jackie Chi Kit and Penn, Gerald
Experiments
The number of iterations was determined by experiments on the development set .
Experiments
Tuning the parameter settings on the development set , we found that parameterized categories, binarization, and including punctuation gave the best F1 performance.
Experiments
While experimenting with the development set of TuBa-D/Z, we noticed that the parser sometimes returns parses, in which paired punctuation (e.g.
development set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Jeon, Je Hun and Liu, Yang
Co-training strategy for prosodic event detection
Development Set 20 1,356 2,275 f2b, f3b Labeled set L 5 347 573 m2b, m3b Unlabeled set U 1,027 77,207 129,305 m4b
Conclusions
In our experiment, we used some labeled data as development set to estimate some parameters.
Experiments and results
Among labeled data, 102 utterances of all f] a and m] 19 speakers are used for testing, 20 utterances randomly chosen from f2b, f3b, m2b, m3b, and m4b are used as development set to optimize parameters such as A and confidence level threshold, 5 utterances are used as the initial training set L, and the rest of the data is used as unlabeled set U, which has 1027 unlabeled utterances (we removed the human labels for co-training experiments).
development set is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: