Index of papers in Proc. ACL 2013 that mention
  • statistically significant
Yancheva, Maria and Rudzicz, Frank
Related Work
Despite statistically significant predictors of deception such as shorter talking time, fewer semantic details, and less coherent statements, DePaulo et al.
Related Work
and revealed that verbal cues based on lexical categories extracted using the LIWC tool show statistically significant , though small, differences between truth- and lie-tellers.
Results
Statistically significant results are in bold.
Results
performs best, with 59.5% cross-validation accuracy, which is a statistically significant improvement over the baselines of LR (75(4) 2 22.25, p < .0001), and NB (25(4) 2 16.19,]?
Results
In comparison with classification accuracy on pooled data, a paired t-test shows statistically significant improvement across all age groups using RF, 75(3) = 1037,]?
statistically significant is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Kondadadi, Ravi and Howald, Blake and Schilder, Frank
Evaluation and Discussion
There is no statistically significant difference between DocSys and DocBase generations for METEOR and BLEU—4.4 However, there is a statistically significant difference in the syntactic variability metric for both domains (weather - X2=l37.16, d.f.=1, p<.0001; biography - X2=96.641, d.f.=1, p<.
Evaluation and Discussion
In terms of significance, there are no statistically significant differences between the systems for weather (DocOrig vs. DocSyS - X2=.347, d.f.=l, p=.555; DocOrig vs. DocBase - X2=.090, d.f.=l, p=.764; DocSyS vs. DocBase - X2=.790, d.f.=l, p=.373).
Evaluation and Discussion
For biography, the trend fits nicely both numerically and in terms of statistical significance (DocOrig vs. DocSys -X2=5 .094, d.f.=l, p=.
statistically significant is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Wang, Lu and Raghavan, Hema and Castelli, Vittorio and Florian, Radu and Cardie, Claire
Abstract
Our best model achieves statistically significant improvement over the state-of-the-art systems on several metrics (e. g. 8.0% and 5.4% improvements in ROUGE-2 respectively) for the DUC 2006 and 2007 summarization task.
Introduction
We evaluate the summarization models on the standard Document Understanding Conference (DUC) 2006 and 2007 corpora 2 for query-focused MDS and find that all of our compression-based summarization models achieve statistically significantly better performance than the best DUC 2006 systems.
Introduction
With these results we believe we are the first to successfully show that sentence compression can provide statistically significant improvements over pure extraction-based approaches for query-focused MDS.
Results
Our sentence-compression-based systems (marked with T) show statistically significant improvements over pure extractive summarization for both R-2 and R-SU4 (paired t-test, p < 0.01).
Results
In Table 7, our context-aware and head-driven tree-based compression systems show statistically significantly (p < 0.01) higher precisions (Uni-
Results
For grammatical relation evaluation, our head-driven tree-based system obtains statistically significantly (p < 0.01) better Fl score (Rel-F1 than all the other systems except the rule-based system).
statistically significant is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Wang, Aobo and Kan, Min-Yen
Experiment
To determine statistical significance of the improvements, we also compute paired, one-tailed t tests.
Experiment
‘i’(‘*’) in the top four lines indicates statistical significance at p < 0.001 (0.05) when compared with the previous row.
Experiment
‘i’ or ‘*’ in the top four rows indicates statistical significance at p < 0.001 or < 005 compared with the previous row.
statistically significant is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Cheung, Jackie Chi Kit and Penn, Gerald
Experiments
The model average is statistically significantly different from all the other conditions p < 10—7 (Study 1).
Experiments
The averages of the tested conditions are shown in Table 2, and are statistically significant .
Experiments
The differences in density between the human average and the non-baseline conditions are highly statistically significant , according to paired two-tailed Wilcoxon signed-rank tests for the statistic calculated for each topic cluster.
statistically significant is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Guinaudeau, Camille and Strube, Michael
Experiments
Indeed, the difference between our best results and those of Elsner and Charniak are not statistically significant .
Experiments
However, this improvement is not statistically significant .
Experiments
The best results, that present a statistically significant improvement when compared to the random baseline, are obtained when distance information and the number of entities “shared” by two sentences are taken into account (PW).
statistically significant is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Silberer, Carina and Ferrari, Vittorio and Lapata, Mirella
Experimental Setup
All correlation coefficients are statistically significant (p < 0.01, N = 435).
Results
Differences in correlation coefficients between models with two versus one modality are all statistically significant (p < 0.01 using a t-test), with the exception of Concat when compared against VisAttr.
Results
All correlation coefficients are statistically significant (p < 0.01, N = 1,716).
Results
All correlation coefficients are statistically significant (p < 0.01, N = 435).
statistically significant is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Zhou, Guangyou and Liu, Fang and Liu, Yang and He, Shizhu and Zhao, Jun
Experiments
(2) Taking advantage of potentially rich semantic information drawn from other languages via statistical machine translation, question retrieval performance can be significantly improved (row 3, row 4, row 5 and row 6 vs. row 7, row 8 and row 9, all these comparisons are statistically significant at p < 0.05).
Experiments
(2012) (row 7 vs. row 8, the comparison is statistically significant at p < 0.05).
Experiments
(1) Our proposed matrix factorization can significantly improve the performance of question retrieval (row 1 vs. row2; row3 vs. row4, the improvements are statistically significant at p < 0.05).
statistically significant is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Alfonseca, Enrique and Pighin, Daniele and Garrido, Guillermo
Introduction
When compared to a state-of-the-art open-domain headline abstraction system (Filippova, 2010), the new headlines are statistically significantly better both in terms of readability and informativeness.
Results
0 Amongst the automatic systems, HEADY performed better than MSC, with statistical significance at 95% for all the metrics.
Results
o The most frequent pattern baseline and HEADY have comparable performance across all the metrics (not statistically significantly different), although HEADY has slightly better scores for all metrics except for informativeness.
statistically significant is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Braune, Fabienne and Seemann, Nina and Quernheim, Daniel and Maletti, Andreas
Experiments
The starred results are statistically significant improvements over the Baseline (at confidence p < 0.05).
Experiments
This improvement is statistically significant at confidence p < 0.05, which we computed using the pairwise bootstrap resampling technique of Koehn (2004).
Experiments
However this improvement (0.34) is not statistically significant .
statistically significant is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Faruqui, Manaal and Dyer, Chris
Abstract
To evaluate our method, we use the word clusters in an NER system and demonstrate a statistically significant improvement in F1 score when using bilingual word clusters instead of monolingual clusters.
Experiments
For English and Turkish we observe a statistically significant improvement over the monolingual model (cf.
Experiments
English again has a statistically significant improvement over the baseline.
statistically significant is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Melamud, Oren and Berant, Jonathan and Dagan, Ido and Goldberger, Jacob and Szpektor, Idan
Results
to compute MAP values and corresponding statistical significance , we randomly split each test set into 30 subsets.
Results
This improvement is statistically significant at p < 0.01 for BInc and Lin, and p < 0.015 for Cosine, using paired t-test.
Results
On test-setivc, where context mismatches are abundant, our model outperformed all other baselines ( statistically significant at p < 0.01).
statistically significant is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Valitutti, Alessandro and Toivonen, Hannu and Doucet, Antoine and Toivanen, Jukka M.
Background
The evaluations indicate statistical significance , but the test settings are relatively specific.
Conclusions
The statistical significance is particularly high, even though there were several limitations in the experimental setting.
Evaluation
According to the one-sided Wilcoxon rank-sum test, both Collective Funniness and all Upper Agreements increase from FORM to FORM+TABOO and from FORM+TABOO to FORM+TABOO+CONT statistically significantly (in all cases p < .002).
statistically significant is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Varga, István and Sano, Motoki and Torisawa, Kentaro and Hashimoto, Chikara and Ohtake, Kiyonori and Kawai, Takao and Oh, Jong-Hoon and De Saeger, Stijn
Experiments
The improvement in precision when using TR&EX is statistically significant (p < 0.05).9 Note that F-measure dropped
Experiments
The improvement in precision when using TR&EX is statistically significant (p < 0.01).
Experiments
statistically significant in both settings (p < 0.01).
statistically significant is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: