Validating Automatic Measures | Cross Validation d_TUR d_QLT d_PAT Regular 0.176 0.155 0.151 Minus—one—model 0.224 0.180 0.178 |
Validating Automatic Measures | Table 7: LOSS scores for Regular and Minus-one-model (during training) Cross Validations |
Validating Automatic Measures | First, we use regular 4-fold cross validation where we randomly hold out 25% of the data for testing and train on the remaining 75% of the data for 4 rounds. |
Experiments | For the error-driven policy, we collected unidentified unknown words using 10-fold cross validation on the training set, as previously described in Section 3. |
Experiments | Table 9: Comparison of averaged F1 results (by 10-fold cross validation ) with previous studies on CTB 3.0. |
Experiments | Unfortunately, Zhang and Clark’s experimental setting did not allow us to use our error-driven policy since performing 10-fold cross validation again on each main cross validation trial is computationally too expensive. |
Policies for correct path selection | 0 Divide the training corpus into ten equal sets and perform 10-fold cross validation to find the errors. |
Policies for correct path selection | After ten cross validation runs, we get a list of the unidentified unknown words derived from the whole training corpus. |
Policies for correct path selection | Note that the unidentified unknown words in the cross validation are not necessary to be infrequent words, but some overlap may exist. |
Experiments | We used 10-fold cross validation for all tests. |
Experiments | Table 2 compares the performance using 10-fold cross validation . |
Experiments | Table 2: Accuracy for SO-PMI with different dataset sizes, the spin model, and the random walks model for 10-fold cross validation and 14 seeds. |
Baseline Approaches | This amounts to using three folds for training and one fold for development in each cross validation experiment. |
Dataset | Since we will perform 5-fold cross validation in our experiments, we also show the number of reports labeled with each shaper under the “F” columns for each fold. |
Evaluation | Micro-averaged 5-fold cross validation results of this baseline for all 14 shapers and for just 10 minority classes (due to our focus on improving minority class prediction) are expressed as percentages in terms of precision (P), recall (R), and F-measure (F) in the first row of Table 4. |
Evaluation | Table 4: 5-fold cross validation results. |
Our Bootstrapping Algorithm | Whichever baseline is used, we need to reserve one of the five folds to tune the parameter k in our cross validation experiments. |
Experiments | This is done by computing an accuracy in the 10-fold cross validation . |
Experiments | Table 2: 10-fold cross validation results for three feature categories and their combination, for classifiers trained on English SVO and AN training sets. |
Experiments | ACC column reports an accuracy score in the 10-fold cross validation . |
Experiments | Figure 7 shows the results of a 10-fold cross validation on the 200-review dataset (light grey bars show the accuracy of the model trained without using transition cue features). |
Experiments | Table 4 shows the results obtained by 10-fold cross validation . |
Experiments | Each pairwise comparison is evaluated on a testing dataset with 10-fold cross validation . |
Results | Figure 7: Value precision vs. recall for 10-fold cross validation on TempEval-3 Dev and WikiWars Dev. |
Results | Figure 7 shows the precision vs. recall of the resolved values from 10-fold cross validation of TempEval-3 Dev and WikiWars Dev. |
Results | We also manually categorized all resolution errors for end-to-end performance with 10-fold cross validation of the TempEval-3 Dev dataset, |
Empirical Evaluation | The values of various parameters were selected using 10-fold Cross Validation on |
Empirical Evaluation | 5, plots the 10-fold cross validation performance of the models with increasing values of H for the two datasets. |
Empirical Evaluation | Figure 5: Cross validation performances of the two models with increasing number of categories. |
Automated Classification | 5.2 Cross Validation Experiments |
Automated Classification | We performed 10-fold cross validation on our dataset, and, for the purpose of comparison, we also performed 5-fold cross validation on C) Séaghdha’s (2007) dataset using his folds. |
Automated Classification | To assess the impact of the various features, we ran the cross validation experiments for each feature type, alternating between including only one |
Additional experiments | Because cross validation is applied, errors are always measured on testing subsets that are disjoint from the corresponding training subsets. |
Experimental design | We use tenfold cross validation for the experiments. |
Experimental design | These are learning experiments so we also use tenfold cross validation in the same way as with CRF++. |
Conclusions | When training and testing on the same corpus, we run a 10-fold cross validation . |
Experimental Setup | One part is used as the development dataset; the rest are used for 10-fold cross validation . |
Experimental Setup | The reported figures come from 10-fold cross validations on different corpora. |
Experiments | Each value is the average over different test sets of fivefold cross validation . |
Experiments | Each value is the average over different test sets of fivefold cross validation . |
Perplexity on Reduced Corpora | This assumption is natural, considering the situation of an in-domain test or cross validation |
Conclusions | Table 6: F1 scores for the 10-fold cross validation of the SVMs with RBF kernel on all datasets using NGRAM features |
Evaluation and Discussion | The results have been obtained by 10-fold cross validation on 2,000 documents per flaw. |
Experiments | The performance has been evaluated with 10-fold cross validation on 2,000 documents split equally into positive and negative instances. |
Approach to Sentence-Level Dialect Identification | For both sets of experiments, we apply 10-fold cross validation on the training data. |
Experiments | Table 2: Performance Accuracies of the different configurations of the 8M LM (best-performing LM size) using 10-fold cross validation against the different baselines. |
Introduction | The presented system outperforms the approach presented by Zaidan and Callison-Burch (2011) on the same dataset using 10-fold cross validation . |
Seminar Extraction Task | Experiments We conducted 5-fold cross validation experiments using the seminar extraction dataset. |
Seminar Extraction Task | For comparison, we used 5-fold cross validation , where only a subset of each train fold that corresponds to 50% of the corpus was used for training. |
Seminar Extraction Task | We compare our approach to their work, having obtained and used the same 5-fold cross validation splits as both works. |
Experiment | We set the hyper-parameters by conducting cross validations on the labeled data. |
Experiment | We conduct 5-fold cross validations on Chinese labeled data. |
Experiment | Another reason is that we use 5-fold cross validations in this setting, while the previous setting is an open test setting. |
Evaluation | We performed 10-fold cross validation on the labeled sentences (unsuitable vs all other categories) in dataset] . |
Evaluation | We also performed 10-fold cross validation on the labeled sentences (the five functional categories). |
Evaluation | We performed 10-fold cross validation on the labeled sentences of dataset] . |
Experiments | We estimated the ROUGE metric using 10-fold cross validation . |
Experiments | Each corpus was then subjected to 10-fold cross validation , and the average results for training and testing were calculated. |
Experiments | Table 3: Results of 10-fold cross validation ENG HEB MULT Train 0.4483 0.5993 0.5205 Test 0.4461 0.5936 0.5027 |
Unsupervised Mining of Personal and Impersonal Views | xmm =< P(x'l=1),P(x'l=2),P(x'l=3) > In our experiments, we perform stacking with 4-fold cross validation to generate meta-training data where each fold is used as the development data and the other three folds are used to train the base classifiers in the training phase. |
Unsupervised Mining of Personal and Impersonal Views | 4-fold cross validation is performed for supervised sentiment classification. |
Unsupervised Mining of Personal and Impersonal Views | Also, we find that our performances are similar to the ones (described as fully supervised results) reported in Dasgupta and Ng (2009) where the same data in the four domains are used and 10-fold cross validation is performed. |
Evaluation | This comprises 108K sentences from the data made available by the University of Leipzig4 + 5600 sentences from the training data of each fold during cross validation . |
Evaluation | We perform a 5-fold cross validation taking 4/5 of the data as training and 1/5 as test data. |
Evaluation | Baseline P190: We ran Moses (Koehn et al., 2007) using Koehn’s training scriptslo, doing a 5-fold cross validation with no reordering“. |
Recognition as a Generation Task | (2008), we perform 10-fold cross validation . |
Results and Discussion | Table 5: Results of English abbreviation generation with fivefold cross validation . |
Results and Discussion | Concerning the training time in the cross validation , we simply chose the DPLVM for comparison. |
Experiments of Parsing | Here we tried the corpus weighting technique for an optimal combination of CTB, CDTfs and parsed PDC, and chose the relative weight of both CTB and CDTfs as 10 by cross validation on the development set. |
Our Two-Step Solution | The number of removed trees will be determined by cross validation on development set. |
Our Two-Step Solution | The value of A will be tuned by cross validation on development set. |
Experiments | The accuracy is measured by abstract-wise 10-fold cross validation and the one-answer-per-occurrence criterion (Giuliano et al., 2006). |
Experiments | Table 3 shows the time for parsing the entire AImed corpus, and Table 4 shows the time required for 10-fold cross validation with GENIA-retrained parsers. |
Experiments | Since we did not run experiments on protein-pair—wise cross validation , our system cannot be compared directly to the results reported by Erkan et al. |