Index of papers in Proc. ACL 2008 that mention
  • statistically significant
Nakov, Preslav and Hearst, Marti A.
Comparison to Human Judgments
As Table 6 shows, we achieved 78.4% accuracy using all verbs (and and 72.3% with the first verb from each worker), which is a statistically significant improve-
Relational Similarity Experiments
Our best model 2) + p + c performs a bit better, 71.3% vs. 67.4%, but the difference is not statistically significant .
Relational Similarity Experiments
Our best model achieves 40.5% accuracy, which is slightly better than LRA’s 39.8%, but the difference is not statistically significant .
Relational Similarity Experiments
However, this time coordinating conjunctions (with prepositions) do help a bit (the difference is not statistically significant ) since SAT verbal analogy questions ask for a broader range of relations, e. g., antonymy, for which coordinating conjunctions like but are helpful.
statistically significant is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Agirre, Eneko and Baldwin, Timothy and Martinez, David
Discussion
The improvement in PP attachment was larger (20.5% ERR), and also statistically significant .
Experimental setting
We use Bikel’s randomized parsing evaluation comparator3 (with p < 0.05 throughout) to test the statistical significance of the results using word sense information, relative to the respective baseline parser using only lexical features.
Experimental setting
Statistical significance was calculated based on
Results
These results are statistically significant in some cases (as indicated by *).
Results
As in full-parsing, Bikel outperforms Charniak, but in this case the difference in the baselines is not statistically significant .
Results
As was the case for parsing, the performance with IST reaches and in many instances surpasses gold-standard levels, achieving statistical significance over the baseline in places.
statistically significant is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Andreevskaia, Alina and Bergler, Sabine
Experiments
2All results are statistically significant at oz 2 0.01 with two exceptions: the difference between tri grams and bi grams for the system trained and tested on texts is statistically significant at alpha=0.l and for the system trained on sentences and tested on texts is not statistically significant at oz 2 0.01.
Experiments
The statistical significance of the
Experiments
results depends on the genre and size of the n-gram: on product reviews, all results are statistically significant at oz 2 0.025 level; on movie reviews, the difference between NaVe Bayes and SVM is statistically significant at oz 2 0.01 but the significance diminishes as the size of the n- gram increases; on news, only bigrams produce a statistically significant (a = 0.01) difference between the two machine learning methods, while on blogs the difference between SVMs and NaVe Bayes is most pronounced when unigrams are used (a = 0.025).
Integrating the Corpus-based and Dictionary-based Approaches
Using then an SVM meta-classifier trained on a small number of target domain examples to combine the nine base classifiers, they obtained a statistically significant improvement on out-of-domain texts from book reviews, knowledge-base feedback, and product support services survey data.
Integrating the Corpus-based and Dictionary-based Approaches
The results reported in Table 6 are statistically significant at 04 = 0.01.
Integrating the Corpus-based and Dictionary-based Approaches
are statistically significant at 04 = 0.01, except the runs on movie reviews where the difference between the LBS and Ensemble classifiers was significant at 04 = 0.05.
statistically significant is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Olsson, J. Scott and Oard, Douglas W.
Experiments
We apply the same reasoning to test for statistical significance in GMAP improvements.
Results
combination is a statistically significant improvement (04 = 0.05) over our new transcript set (that is, over the best single transcript result).
Results
Tests for statistically significant improvements in GMAP are computed using our paired log AP test, as discussed in Section 4.2.2.
Results
Secondly, it is the only combination approach able to produce statistically significant relative improvements on both measures for both conditions.
statistically significant is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Zapirain, Beñat and Agirre, Eneko and Màrquez, Llu'is
Conclusion and Future work
While results are similar and not statistically significant in the WSJ test set, when testing on the Brown out—of—domain test set the difference in favor of PropBank plus mapping step is statistically significant .
Mapping into VerbNet Thematic Roles
If we compare these results to those obtained by VerbNet in the SemEval setting (second row of Table 5), they are 0.5 points better, but the difference is not statistically significant .
Mapping into VerbNet Thematic Roles
The performance drop compared to the use of the hand-annotated VerbNet class is of 2 points and statistically significant , and 0.2 points above the results obtained using VerbNet directly on the same conditions (fourth row of the same Table).
Mapping into VerbNet Thematic Roles
In this case, the difference is larger, 1.9 points, and statistically significant in favor of the mapping approach.
On the Generalization of Role Sets
In the second setting (‘CoNLL setting’ row in the same table) the PropBank classifier degrades slightly, but the difference is not statistically significant .
On the Generalization of Role Sets
The results in the ‘CoNLL setting (no 5th)’ rows of Table 1 show that the drop for PropBank is negligible and not significant, while the drop for VerbNet is more important, and statistically significant .
statistically significant is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Kennedy, Alistair and Szpakowicz, Stan
Abstract
Although the 1987 version of the Thesaurus is better, we show that the 1911 version performs surprisingly well and that often the differences between the versions of R0-get’s and WordNet are not statistically significant .
Comparison on applications
Even on the largest set (Finkelstein et al., 2001), however, the differences between Roget’s Thesaurus and the Vector method are not statistically significant at the p < 0.05 level for either thesaurus on a two-tailed test4.
Comparison on applications
On the (Miller and Charles, 1991) and (Rubenstein and Goodenough, 1965) data sets the best system did not show a statistically significant improvement over the 1911 or 1987 Roget’s Thesauri, even at p < 0.1 for a two-tailed test.
Comparison on applications
Much like (Miller and Charles, 1991), the data set used here is not large enough to determine if any system’s improvement is statistically significant .
Conclusion and future work
The 1987 version of Roget’s Thesaurus performed better than the 1911 version on all our tests, but we did not find the differences to be statistically significant .
statistically significant is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Szpektor, Idan and Dagan, Ido and Bar-Haim, Roy and Goldberger, Jacob
Results and Analysis
Our main result is that the allCP and allCP+pr methods rank matches statistically significantly better than the baselines in all setups (according to the Wilcoxon double-sided signed-ranks test at the level of 0.01 (Wilcoxon, 1945)).
Results and Analysis
Furthermore, relative to this cpv(7“, 25) model from (Pantel et al., 2007), our combined allCP model, with or without the prior (first row of Table 2), obtains statistically significantly better ranking (at the level of 0.01).
Results and Analysis
Comparing between the algorithms for matching 0pm (Section 3.2.2) we found that while mnkedC B C is statistically significantly better than binaryCBC, mnkedCBC and LIN generally achieve the same results.
statistically significant is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Bergsma, Shane and Lin, Dekang and Goebel, Randy
Results
The difference between COMBO and DISTRIB is not statistically significant , while both are significantly better than the rule-based approaches.8 This provides strong motivation for a “lightweight” approach to non-referential it detection — one that does not require parsing or handcrafted rules and — is easily ported to new languages and text domains.
Results
Using no truncation (Unaltered) drops the F-Score by 4.3%, while truncating the patterns to a length of four only drops the F-Score by 1.4%, a difference which is not statistically significant .
Results
Neither are statistically significant ; however there seems to be diminishing returns from longer context patterns.
statistically significant is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Cao, Guihong and Robertson, Stephen and Nie, Jian-Yun
Experiments 6.1 Experiment Settings
We also conducted t-tests to determine whether the improvement is statistically significant .
Experiments 6.1 Experiment Settings
model is statistically significant with p-value<0.05,
Experiments 6.1 Experiment Settings
Moreover, the improvements on five collections are statistically significant .
statistically significant is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Carenini, Giuseppe and Ng, Raymond T. and Zhou, Xiaodong
Empirical Evaluation
As to the statistical significance , we use the 2-tail pairwise student t-test in all the experiments to compare two specific methods.
Empirical Evaluation
Meanwhile, both semantic similarities have lower accuracy than CWS, and the differences are also statistically significant even with the conservative Bonferroni adjustment (i.e., the p-values in Table 1 are multiplied by three).
Empirical Evaluation
The increase in precision is at least 0.04 with statistical significance .
statistically significant is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Mitchell, Jeff and Lapata, Mirella
Evaluation Setup
A Wilcoxon rank sum test confirmed that the difference is statistically significant (p < 0.01).
Results
The difference between High and Low similarity values estimated by these models are statistically significant (p < 0.01 using the Wilcoxon rank sum test).
Results
However, the difference between the two models is not statistically significant .
statistically significant is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Penn, Gerald and Zhu, Xiaodan
Future Work
It is entirely possible that, within this protocol, the baselines that have performed so well in our experiments, such as length or, in read news, position, will utterly fail, and that less traditional acoustic or spoken language features will genuinely, and with statistical significance , add value to a purely transcript-based text summarization system.
Results and analysis
The best performance is achieved by using all of the features together, but the length baseline, which uses only those features in bold type from Figure 3, is very close (no statistically significant difference), as is MMR.6
Results and analysis
The difference with respect to either of these baselines is statistically significant within the popular 10—30% compression range, as is the classifier trained on all features but acoustic
statistically significant is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Uszkoreit, Jakob and Brants, Thorsten
Conclusion
When using predictive class-based models in combination with a word-based language model trained on very large amounts of data, the improvements continue to be statistically significant on the test and nist06 sets.
Experiments
Adding the class-based models leads to small improvements in BLEU score, with the highest improvements for both dev and nist06 being statistically significant 2.
Experiments
2Differences of more than 0.0051 are statistically significant at the 0.05 level using bootstrap resampling (Noreen, 1989; Koehn, 2004)
statistically significant is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: