Index of papers in Proc. ACL that mention

statistically significant

Seen in text as:

statistically significant (210)
statistical significance (45)
statistically significantly (26)
statistical significant (9)
Statistical significance (5)
Statistically significant (4)

Seen in 286 sentences in 64 papers.

1. Automatic detection of deception in child-produced speech using syntactic complexity features

Yancheva, Maria and Rudzicz, Frank

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Related Work	Despite statistically significant predictors of deception such as shorter talking time, fewer semantic details, and less coherent statements, DePaulo et al.
Related Work	and revealed that verbal cues based on lexical categories extracted using the LIWC tool show statistically significant , though small, differences between truth- and lie-tellers.
Results	Statistically significant results are in bold.
Results	performs best, with 59.5% cross-validation accuracy, which is a statistically significant improvement over the baselines of LR (75(4) 2 22.25, p < .0001), and NB (25(4) 2 16.19,]?
Results	In comparison with classification accuracy on pooled data, a paired t-test shows statistically significant improvement across all age groups using RF, 75(3) = 1037,]?

statistically significant is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

2. Solving Relational Similarity Problems Using the Web as a Corpus

Nakov, Preslav and Hearst, Marti A.

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Comparison to Human Judgments	As Table 6 shows, we achieved 78.4% accuracy using all verbs (and and 72.3% with the first verb from each worker), which is a statistically significant improve-
Relational Similarity Experiments	Our best model 2) + p + c performs a bit better, 71.3% vs. 67.4%, but the difference is not statistically significant .
Relational Similarity Experiments	Our best model achieves 40.5% accuracy, which is slightly better than LRA’s 39.8%, but the difference is not statistically significant .
Relational Similarity Experiments	However, this time coordinating conjunctions (with prepositions) do help a bit (the difference is not statistically significant ) since SAT verbal analogy questions ask for a broader range of relations, e. g., antonymy, for which coordinating conjunctions like but are helpful.

statistically significant is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

3. Kernel Based Discourse Relation Recognition with Temporal Ordering Information

Wang, WenTing and Su, Jian and Tan, Chew Lim

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	The experiment shows tree kernel approach is able to give statistical significant improvements over flat syntactic path feature.
Abstract	Besides, we further propose to leverage on temporal ordering information to constrain the interpretation of discourse relation, which also demonstrate statistical significant improvements for discourse relation recognition on PDTB 2.0 for both explicit and implicit as well.
Conclusions and Future Works	The experimental results on PDTB v2.0 show that our kernel-based approach is able to give statistical significant improvement over flat syntactic path method.
Conclusions and Future Works	In addition, we also propose to incorporate temporal ordering information to constrain the interpretation of discourse relations, which also demonstrate statistical significant improvements for discourse relation recognition, both explicit and implicit.
Experiments and Results	conduct chi square statistical significance test on All relations between flat path approach and Simple-Expansion approach, which shows the performance improvements are statistical significant (p < 0.05) through incorporating tree kernel.
Experiments and Results	We conduct chi square statistical significant test on All relations, which shows the performance improvement is statistical significant (p < 0.05).
Introduction	The experiment shows that tree kernel is able to effectively incorporate syntactic structural information and produce statistical significant improvements over flat syntactic path feature for the recognition of both explicit and implicit relation in Penn Discourse Treebank (PDTB; Prasad et al., 2008).
Introduction	Besides, inspired by the linguistic study on tense and discourse anaphor (Webber, 1988), we further propose to incorporate temporal ordering information to constrain the interpretation of discourse relation, which also demonstrates statistical significant improvements for discourse relation recognition on PDTB v2.0 for both explicit and implicit relations.

statistically significant is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

4. Response-based Learning for Grounded Machine Translation

Riezler, Stefan and Simianer, Patrick and Haas, Carolin

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Statistical significance is measured using Approximate Randomization (Noreen, 1989) where result differences with a p-value smaller than 0.05 are considered statistically significant .
Experiments	All result differences are statistically significant .
Experiments	ever, the result differences between these two systems do not score as statistically significant .

statistically significant is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

5. A Statistical NLG Framework for Aggregated Planning and Realization

Kondadadi, Ravi and Howald, Blake and Schilder, Frank

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Evaluation and Discussion	There is no statistically significant difference between DocSys and DocBase generations for METEOR and BLEU—4.4 However, there is a statistically significant difference in the syntactic variability metric for both domains (weather - X2=l37.16, d.f.=1, p<.0001; biography - X2=96.641, d.f.=1, p<.
Evaluation and Discussion	In terms of significance, there are no statistically significant differences between the systems for weather (DocOrig vs. DocSyS - X2=.347, d.f.=l, p=.555; DocOrig vs. DocBase - X2=.090, d.f.=l, p=.764; DocSyS vs. DocBase - X2=.790, d.f.=l, p=.373).
Evaluation and Discussion	For biography, the trend fits nicely both numerically and in terms of statistical significance (DocOrig vs. DocSys -X2=5 .094, d.f.=l, p=.

statistically significant is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

6. Discriminative Strategies to Integrate Multiword Expression Recognition and Parsing

Constant, Matthieu and Sigogne, Anthony and Watrin, Patrick

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	However, it has no statistically significant impact in terms of F-score as incorrect multiword expression recognition has important side effects on parsing.
Evaluation	In order to establish the statistical significance of results between two parsing experiments in terms of F1 and UAS, we used a unidirectional t-test for two independent samples”.
Evaluation	The statistical significance between two MWE identification experiments was established by using the McNemar—s test (Gillick and Cox, 1989).
Evaluation	The results of the two experiments are considered statistically significant with the computed value p < 0.01.

statistically significant is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

7. Automatic Evaluation Method for Machine Translation Using Noun-Phrase Chunking

Echizen-ya, Hiroshi and Araki, Kenji

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Underlining in our method signifies that the differences between correlation coefficients obtained using our method and IMPACT are statistically significant at the 5% significance level.
Experiments	8 and “All” of Tables 2 and 4, the differences between correlation coefficients obtained using our method and IMPACT are statistically significant at the 5% significance level.
Experiments	The differences between correlation coefficients obtained using our method and IMPACT are statistically significant at the 5% significance level for adequacy of SMT.
Introduction	Moreover, the differences between correlation coefficients obtained using our method and other methods are statistically significant at the 5% or lower significance level for adequacy.

statistically significant is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

8. When Specialists and Generalists Work Together: Overcoming Domain Dependence in Sentiment Tagging

Andreevskaia, Alina and Bergler, Sabine

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	2All results are statistically significant at oz 2 0.01 with two exceptions: the difference between tri grams and bi grams for the system trained and tested on texts is statistically significant at alpha=0.l and for the system trained on sentences and tested on texts is not statistically significant at oz 2 0.01.
Experiments	The statistical significance of the
Experiments	results depends on the genre and size of the n-gram: on product reviews, all results are statistically significant at oz 2 0.025 level; on movie reviews, the difference between NaVe Bayes and SVM is statistically significant at oz 2 0.01 but the significance diminishes as the size of the n- gram increases; on news, only bigrams produce a statistically significant (a = 0.01) difference between the two machine learning methods, while on blogs the difference between SVMs and NaVe Bayes is most pronounced when unigrams are used (a = 0.025).
Integrating the Corpus-based and Dictionary-based Approaches	Using then an SVM meta-classifier trained on a small number of target domain examples to combine the nine base classifiers, they obtained a statistically significant improvement on out-of-domain texts from book reviews, knowledge-base feedback, and product support services survey data.
Integrating the Corpus-based and Dictionary-based Approaches	The results reported in Table 6 are statistically significant at 04 = 0.01.
Integrating the Corpus-based and Dictionary-based Approaches	are statistically significant at 04 = 0.01, except the runs on movie reviews where the difference between the LBS and Ensemble classifiers was significant at 04 = 0.05.

statistically significant is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

in-domain (26)
unigrams (14)
SVM (10)

9. An Empirical Study on the Effect of Negation Words on Sentiment

Zhu, Xiaodan and Guo, Hongyu and Mohammad, Saif and Kiritchenko, Svetlana

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusions	formance statistically significantly .
Experimental results	Models marked with an asterisk (*) are statistically significantly better than the random baseline.
Experimental results	Double asterisks ** indicates a statistically significantly different from model (6), and the model with the double dagger His significantly better than model (7).
Experimental results	We can observe statistically significant differences of shifting abilities between many negator pairs such as that between “is_never” and “do_not” as well as between “does_not” and “cangot”.
Negation models based on heuristics	We will show that this simple modification improves the fitting performance statistically significantly .
Negation models based on heuristics	We will show that this model also statistically significantly outperforms the basic shifting without overfitting, although the number of parameters have increased.

statistically significant is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

10. Sentence Level Dialect Identification for Machine Translation System Selection

Salloum, Wael and Elfardy, Heba and Alamir-Salloum, Linda and Habash, Nizar and Diab, Mona

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Discussion and Error Analysis	However, our best system selection approach improves over MSA-Pivot by a small margin of 0.2% BLEU absolute only, albeit a statistically significant improvement.
MT System Selection	It improves over the best single system baseline (MSA-Pivot) by a statistically significant 0.5% BLEU.
MT System Selection	Improvements are statistically significant .
MT System Selection	The differences in BLEU are statistically significant .
Machine Translation Experiments	All differences in BLEU scores between the four systems are statistically significant above the 95% level.
Machine Translation Experiments	Statistical significance is computed using paired bootstrap re-sampling (Koehn, 2004).

statistically significant is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

11. Mining Informal Language from Chinese Microtext: Joint Word Recognition and Segmentation

Wang, Aobo and Kan, Min-Yen

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiment	To determine statistical significance of the improvements, we also compute paired, one-tailed t tests.
Experiment	‘i’(‘*’) in the top four lines indicates statistical significance at p < 0.001 (0.05) when compared with the previous row.
Experiment	‘i’ or ‘*’ in the top four rows indicates statistical significance at p < 0.001 or < 005 compared with the previous row.

statistically significant is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

SVM (13)
baseline systems (11)
CRF (11)

12. A Sentence Compression Based Framework to Query-Focused Multi-Document Summarization

Wang, Lu and Raghavan, Hema and Castelli, Vittorio and Florian, Radu and Cardie, Claire

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Our best model achieves statistically significant improvement over the state-of-the-art systems on several metrics (e. g. 8.0% and 5.4% improvements in ROUGE-2 respectively) for the DUC 2006 and 2007 summarization task.
Introduction	We evaluate the summarization models on the standard Document Understanding Conference (DUC) 2006 and 2007 corpora 2 for query-focused MDS and find that all of our compression-based summarization models achieve statistically significantly better performance than the best DUC 2006 systems.
Introduction	With these results we believe we are the first to successfully show that sentence compression can provide statistically significant improvements over pure extraction-based approaches for query-focused MDS.
Results	Our sentence-compression-based systems (marked with T) show statistically significant improvements over pure extractive summarization for both R-2 and R-SU4 (paired t-test, p < 0.01).
Results	In Table 7, our context-aware and head-driven tree-based compression systems show statistically significantly (p < 0.01) higher precisions (Uni-
Results	For grammatical relation evaluation, our head-driven tree-based system obtains statistically significantly (p < 0.01) better Fl score (Rel-F1 than all the other systems except the rule-based system).

statistically significant is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

13. Using Deep Morphology to Improve Automatic Error Detection in Arabic Handwriting Recognition

Habash, Nizar and Roth, Ryan

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Results	However, the differences among this model and the other models using lem Table 4 are not statistically significant .
Results	The differences between this model and the other lower performing models are statistically significant (p<0.05).
Results	The differences among the last four models (all including lem) in Table 5 are not statistically significant .

statistically significant is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

14. Entity-Based Local Coherence Modelling Using Topological Fields

Cheung, Jackie Chi Kit and Penn, Gerald

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Then, we incorporate the model enhanced with topological fields into a natural language generation system that generates constituent orders for German text, and show that the added coherence component improves performance slightly, though not statistically significantly .
Introduction	We add contextual features using topological field transitions to the model of Filippova and Strube (2007b) and achieve a slight improvement over their model in a constituent ordering task, though not statistically significantly .
Introduction	Two-tailed sign tests were calculated for each result against the best performing model in each column (1: p = 0.101; 2: p = 0.053; +: statistically significant, p < 0.05; ++: very statistically significant , p < 0.01 ).
Introduction	We embed entity topological field transitions into their probabilistic model, and show that the added coherence component slightly improves the performance of the baseline NLG system in generating constituent orderings in a German corpus, though not to a statistically significant degree.

statistically significant is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

15. Improving Parsing and PP Attachment Performance with Sense Information

Agirre, Eneko and Baldwin, Timothy and Martinez, David

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Discussion	The improvement in PP attachment was larger (20.5% ERR), and also statistically significant .
Experimental setting	We use Bikel’s randomized parsing evaluation comparator3 (with p < 0.05 throughout) to test the statistical significance of the results using word sense information, relative to the respective baseline parser using only lexical features.
Experimental setting	Statistical significance was calculated based on
Results	These results are statistically significant in some cases (as indicated by *).
Results	As in full-parsing, Bikel outperforms Charniak, but in this case the difference in the baselines is not statistically significant .
Results	As was the case for parsing, the performance with IST reaches and in many instances surpasses gold-standard levels, achieving statistical significance over the baseline in places.

statistically significant is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

16. Combining Speech Retrieval Results with Generalized Additive Models

Olsson, J. Scott and Oard, Douglas W.

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	We apply the same reasoning to test for statistical significance in GMAP improvements.
Results	combination is a statistically significant improvement (04 = 0.05) over our new transcript set (that is, over the best single transcript result).
Results	Tests for statistically significant improvements in GMAP are computed using our paired log AP test, as discussed in Section 4.2.2.
Results	Secondly, it is the only combination approach able to produce statistically significant relative improvements on both measures for both conditions.

statistically significant is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

17. Robustness and Generalization of Role Sets: PropBank vs. VerbNet

Zapirain, Beñat and Agirre, Eneko and Màrquez, Llu'is

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion and Future work	While results are similar and not statistically significant in the WSJ test set, when testing on the Brown out—of—domain test set the difference in favor of PropBank plus mapping step is statistically significant .
Mapping into VerbNet Thematic Roles	If we compare these results to those obtained by VerbNet in the SemEval setting (second row of Table 5), they are 0.5 points better, but the difference is not statistically significant .
Mapping into VerbNet Thematic Roles	The performance drop compared to the use of the hand-annotated VerbNet class is of 2 points and statistically significant , and 0.2 points above the results obtained using VerbNet directly on the same conditions (fourth row of the same Table).
Mapping into VerbNet Thematic Roles	In this case, the difference is larger, 1.9 points, and statistically significant in favor of the mapping approach.
On the Generalization of Role Sets	In the second setting (‘CoNLL setting’ row in the same table) the PropBank classifier degrades slightly, but the difference is not statistically significant .
On the Generalization of Role Sets	The results in the ‘CoNLL setting (no 5th)’ rows of Table 1 show that the drop for PropBank is negligible and not significant, while the drop for VerbNet is more important, and statistically significant .

statistically significant is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

18. Models of Semantic Representation with Visual Attributes

Silberer, Carina and Ferrari, Vittorio and Lapata, Mirella

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Setup	All correlation coefficients are statistically significant (p < 0.01, N = 435).
Results	Differences in correlation coefficients between models with two versus one modality are all statistically significant (p < 0.01 using a t-test), with the exception of Concat when compared against VisAttr.
Results	All correlation coefficients are statistically significant (p < 0.01, N = 1,716).
Results	All correlation coefficients are statistically significant (p < 0.01, N = 435).

statistically significant is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

19. Large-Scale Syntactic Language Modeling with Treelets

Pauls, Adam and Klein, Dan

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	The differences between scores marked with l are not statistically significant .
Experiments	Our model outperforms all other generative models, though the improvement over the 71- gram model is not statistically significant .
Experiments	We did not find that the use of our syntactic language model made any statistically significant increases in BLEU score.

statistically significant is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

20. Towards Robust Abstractive Multi-Document Summarization: A Caseframe Analysis of Centrality and Domain

Cheung, Jackie Chi Kit and Penn, Gerald

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	The model average is statistically significantly different from all the other conditions p < 10—7 (Study 1).
Experiments	The averages of the tested conditions are shown in Table 2, and are statistically significant .
Experiments	The differences in density between the human average and the non-baseline conditions are highly statistically significant , according to paired two-tailed Wilcoxon signed-rank tests for the statistic calculated for each topic cluster.

statistically significant is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

21. Graph-based Local Coherence Modeling

Guinaudeau, Camille and Strube, Michael

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Indeed, the difference between our best results and those of Elsner and Charniak are not statistically significant .
Experiments	However, this improvement is not statistically significant .
Experiments	The best results, that present a statistically significant improvement when compared to the random baseline, are obtained when distance information and the number of entities “shared” by two sentences are taken into account (PW).

statistically significant is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

22. Topic Models for Word Sense Disambiguation and Token-Based Idiom Detection

Li, Linlin and Roth, Benjamin and Sporleder, Caroline

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	In all three cases, we outperform state-of-the-art systems either quantitatively or statistically significantly .
Conclusion	We find that all models outperform comparable state-of-the-art systems either quantitatively or statistically significantly .
Experiments	We find that Model I performs better than both the best unsupervised system, RACAI (Ion and Tufis, 2007) and the most frequent sense baseline (BmeS), although these differences are not statistically significant due to the small size of the available test data (465).
Experiments	For both tasks, our models outperform the state-of-the-art systems of the same type either quantitatively or statistically significantly .
Experiments	system by Li and Sporleder (2009), although not statistically significantly .

statistically significant is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

23. Evaluating Roget's Thesauri

Kennedy, Alistair and Szpakowicz, Stan

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Although the 1987 version of the Thesaurus is better, we show that the 1911 version performs surprisingly well and that often the differences between the versions of R0-get’s and WordNet are not statistically significant .
Comparison on applications	Even on the largest set (Finkelstein et al., 2001), however, the differences between Roget’s Thesaurus and the Vector method are not statistically significant at the p < 0.05 level for either thesaurus on a two-tailed test4.
Comparison on applications	On the (Miller and Charles, 1991) and (Rubenstein and Goodenough, 1965) data sets the best system did not show a statistically significant improvement over the 1911 or 1987 Roget’s Thesauri, even at p < 0.1 for a two-tailed test.
Comparison on applications	Much like (Miller and Charles, 1991), the data set used here is not large enough to determine if any system’s improvement is statistically significant .
Conclusion and future work	The 1987 version of Roget’s Thesaurus performed better than the 1911 version on all our tests, but we did not find the differences to be statistically significant .

statistically significant is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

24. Shallow Analysis Based Assessment of Syntactic Complexity for Automated Speech Scoring

Bhat, Suma and Xue, Huichao and Yoon, Su-Youn

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusions	Including the measure of syntactic complexity in an automatic scoring model resulted in statistically significant performance gains over the state-of-the-art.
Experimental Results	The correlation was approximately 0.1 higher in absolute value than that of 0034, which was the best performing feature in the VSM-based model and the difference is statistically significant .
Experimental Results	We note that the performance gain of Base+mescore over Base as well as over Base + cos4 is statistically significant at level = 0.01.
Experimental Results	The performance gain of Base+cos4 over Base, however, is not statistically significant at level = 0.01.
Introduction	In addition, including our proposed measure of syntactic complexity in an automatic scoring model results in a statistically significant performance gain over the state-of-the-art.

statistically significant is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

25. Strategies for Contiguous Multiword Expression Analysis and Dependency Parsing

Candito, Marie and Constant, Matthieu

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	To evaluate statistical significance of parsing performance differences, we use eva107.pl14 with -b 0p-tion, and then Dan Bikel’s comparator.15 For MWEs, we use the Fmeasure for recognition of untagged MWEs (hereafter FUM) and for recognition of tagged MWEs (hereafter FTM).
Experiments	For each architecture except the PIPELINE one, differences between the baseline and the best setting are statistically significant (p < 0.01).
Experiments	Best JOINT has statistically significant difference (p < 0.01) over both best JOINT-REG and best PIPELINE.

statistically significant is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

26. Topological Ordering of Function Words in Hierarchical Phrase-based Translation

Setiawan, Hendra and Kan, Min Yen and Li, Haizhou and Resnik, Philip

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Results	Doubling the number of words (N = 64) produces a small gain, and defining the pairwise dominance model using N = 128 most frequent words produces a statistically significant 1-point gain over the baseline (p < 0.01).
Experimental Results	Larger values of N yield statistically significant performance above the baseline, but without further improvements over N = 128.
Experimental Results	Statistically significant results (p < 0.01) over the baseline are in bold.
Experimental Setup	all experiments, we report performance using the BLEU score (Papineni et al., 2002), and we assess statistical significance using the standard bootstrapping approach introduced by (Koehn, 2004).

statistically significant is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

27. Reducing the Annotation Effort for Letter-to-Phoneme Conversion

Dwyer, Kenneth and Kondrak, Grzegorz

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Results	Statistically significant improvements were realized on Dutch, French, and German.
Results	The only case where it had no statistically significant effect was on English.
Results	From this perspective, to achieve statistically significant improvements on five of six L2P datasets (without ever being beaten by random) is an excellent result for QBB.

statistically significant is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

28. Statistical Machine Translation Improves Question Retrieval in Community Question Answering via Matrix Factorization

Zhou, Guangyou and Liu, Fang and Liu, Yang and He, Shizhu and Zhao, Jun

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	(2) Taking advantage of potentially rich semantic information drawn from other languages via statistical machine translation, question retrieval performance can be significantly improved (row 3, row 4, row 5 and row 6 vs. row 7, row 8 and row 9, all these comparisons are statistically significant at p < 0.05).
Experiments	(2012) (row 7 vs. row 8, the comparison is statistically significant at p < 0.05).
Experiments	(1) Our proposed matrix factorization can significantly improve the performance of question retrieval (row 1 vs. row2; row3 vs. row4, the improvements are statistically significant at p < 0.05).

statistically significant is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

29. Discourse Complements Lexical Semantics for Non-factoid Answer Reranking

Jansen, Peter and Surdeanu, Mihai and Clark, Peter

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

CR + LS + DMM + DPM 39.32* +24% 47.86* +20%	The transferred models always outperform the baselines, but only the ensemble model’s improvement is statistically significant .
CR + LS + DMM + DPM 39.32* +24% 47.86* +20%	The results of the transferred models that include LS features are slightly lower, but still approach statistical significance for P@1 and are significant for MRR.
CR + LS + DMM + DPM 39.32* +24% 47.86* +20%	T indicates approaching statistical significance with p = 0.07 or 0.06.
Introduction	Our results show statistically significant improvements of up to 24% on top of state-of-the-art LS models (Yih et al., 2013).

statistically significant is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

30. Modeling the Translation of Predicate-Argument Structure for SMT

Xiong, Deyi and Zhang, Min and Li, Haizhou

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Statistical significance in BLEU differences
Experiments	Such an improvement is statistically significant (p < 0.01).
Experiments	Such a gain, which is statistically significant , confirms the effectiveness of semantic features.

statistically significant is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

31. A Class-Based Agreement Model for Generating Accurately Inflected Translations

Green, Spence and DeNero, John

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Discussion of Translation Results	We realized smaller, yet statistically significant , gains on the mixed genre data sets.
Discussion of Translation Results	The baseline contained 78 errors, while our system produced 66 errors, a statistically significant 15.4% error reduction at p S 0.01 according to a paired t-test.
Experiments	The MT06 result is statistically significant at p g 0.01; MT08 is significant at p g 0.02.
Experiments	We evaluated translation quality with BLEU-4 (Pa-pineni et al., 2002) and computed statistical significance with the approximate randomization method of Riezler and Maxwell (2005).9

statistically significant is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

phrase-based (10)
CRF (9)
LM (9)

32. Spice it up? Mining Refinements to Online Instructions from User Generated Content

Druck, Gregory and Pang, Bo

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	They provide statistically significant improvements in review and segment F1, as well as accuracy, over the baseline models.
Experiments	Augmenting RS-MiXMiX with sequential dependencies, yielding RS-MiXHMM, provides a moderate (though not statistically significant ) improvement in segment F1.
Experiments	As a result, in addition to yielding alignments, RSA-MiXHMM provides small improvements over RS-MiXHMM (though they are not statistically significant ).

statistically significant is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

33. Quantitative modeling of the neural representation of adjective-noun phrases to account for fMRI activation

Chang, Kai-min K. and Cherkassky, Vladimir L. and Mitchell, Tom M. and Just, Marcel Adam

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Brain Imaging Experiments on Adj ec-tive-Noun Comprehension	However, the difference between the multiplicative model and the noun model is not statistically significant in this case.
Brain Imaging Experiments on Adj ec-tive-Noun Comprehension	The difference is statistically significant at p < 0.05.
Brain Imaging Experiments on Adj ec-tive-Noun Comprehension	Although neither difference is statistically significant , this clearly shows a pattern different from the attribute-specifying adjectives.
Introduction	They compared the composition models to human similarity ratings and found that all models were statistically significantly correlated with human judgements.

statistically significant is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

34. Accurate Context-Free Parsing with Combinatory Categorial Grammar

Fowler, Timothy A. D. and Penn, Gerald

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

A Latent Variable CCG Parser	To determine statistical significance , we obtain p-values from Bikel’s randomized parsing evaluation comparator6, modified for use with tagging accuracy, F-score and dependency accuracy.
A Latent Variable CCG Parser	The difference in accuracy is only statistically significant between Clark and Curran’s Normal Form model ignoring features and the Petrov parser trained on CCGbank without features (p-value = 0.013).
A Latent Variable CCG Parser	These results show that the features in CCGbank actually inhibit accuracy (to a statistically significant degree in the case of unlabeled accuracy on section ()0) when used as training data for the Petrov parser.

statistically significant is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

CCG (46)
Penn treebank (11)
treebank (11)

35. Incorporating Information Status into Generation Ranking

Cahill, Aoife and Riester, Arndt

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We show that it achieves a statistically significantly higher BLEU score than the baseline system without these features.
Conclusions	In comparison to a baseline model, we achieve statistically significant improvement in BLEU score.
Generation Ranking Experiments	The improvement in BLEU is statistically significant (p < 0.01) using the paired bootstrap resampling significance test (Koehn, 2004).
Generation Ranking Experiments	(2007) and the model that only takes syntactic-based asymmetries into account is not statistically significant, while the difference between Model 1 and this model is statistically significant (p < 0.05).

statistically significant is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

36. Faster Parsing by Supertagger Adaptation

Kummerfeld, Jonathan K. and Roesner, Jessika and Dawborn, Tim and Haggerty, James and Curran, James R. and Clark, Stephen

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Evaluation	To check whether changes were statistically significant we applied the test described by Chinchor (1995).
Results	The BFGS, GIS and MIRA models produced mixed results, but no statistically significant decrease in accuracy, and as the amount of parser-annotated data was increased, parsing speed increased by up to 85%.
Results	All changes in F-score are statistically significant .
Results	All of the new models in the table make a statistically significant improvement over the baseline.

statistically significant is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

37. A Hybrid Hierarchical Model for Multi-Document Summarization

Celikyilmaz, Asli and Hakkani-Tur, Dilek

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments and Discussions	Results in bold show statistical significance over baseline in corresponding metric.
Experiments and Discussions	When stop words are used the HybHSumg outperforms state-of-the-art by 2.5-7% except R-2 (with statistical significance ).
Experiments and Discussions	Results are statistically significant based on t—test.

statistically significant is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

38. Complexity Metrics in an Incremental Right-Corner Parser

Wu, Stephen and Bachrach, Asaf and Cardenas, Carlos and Schuler, William

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Evaluation	We report factors as statistically significant contributors to reading time if the absolute value of the t-value is greater than 2.
Results	The first data column shows the regression on all data; the second and third columns divide the data into open and closed classes, because an evaluation (not reported in detail here) showed statistically significant interactions between word class and 3 of the predictors.
Results	Out of the non-parser-based metrics, word order and bigram probability are statistically significant regardless of the data subset; though reciprocal length and unigram frequency do not reach significance here, likelihood ratio tests (not shown) confirm that they contribute to the model as a whole.
Results	It can be seen that nearly all the slopes have been estimated with signs as expected, with the exception of reciprocal length (which is not statistically significant ).

statistically significant is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

39. Joint Annotation of Search Queries

Bendersky, Michael and Croft, W. Bruce and Smith, David A.

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	* and Jr denote statistically significant differences with i-QRY and i-PRF, respectively.
Experiments	In order to test the statistical significance of improvements attained by the proposed methods we use a two-sided Fisher’s randomization test with 20,000 permutations.
Experiments	Results with p-value < 0.05 are considered statistically significant .

statistically significant is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

40. Contextual Preferences

Szpektor, Idan and Dagan, Ido and Bar-Haim, Roy and Goldberger, Jacob

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Results and Analysis	Our main result is that the allCP and allCP+pr methods rank matches statistically significantly better than the baselines in all setups (according to the Wilcoxon double-sided signed-ranks test at the level of 0.01 (Wilcoxon, 1945)).
Results and Analysis	Furthermore, relative to this cpv(7“, 25) model from (Pantel et al., 2007), our combined allCP model, with or without the prior (first row of Table 2), obtains statistically significantly better ranking (at the level of 0.01).
Results and Analysis	Comparing between the algorithms for matching 0pm (Section 3.2.2) we found that while mnkedC B C is statistically significantly better than binaryCBC, mnkedCBC and LIN generally achieve the same results.

statistically significant is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

41. Age Prediction in Blogs: A Study of Style, Content, and Online Behavior in Pre- and Post-Social Media Generations

Rosenthal, Sara and McKeown, Kathleen

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments and Results	the averages of the accuracies from the 10 cross-validation runs and all results were compared for statistical significance using the t—test where applicable.
Experiments and Results	Unless otherwise marked, all accuracies are statistically significant at p<=.0005 for both baselines.
Experiments and Results	1 not statistically significant over Online-Behavior and Interests.

statistically significant is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

42. Hybrid Simplification using Deep Semantics and Machine Translation

Narayan, Shashi and Gardent, Claire

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Pairwise comparisons between all models and their statistical significance were carried out using a one-way ANOVA with post-hoc Tukey HSD tests and are shown in Table 6.
Experiments	With regard to simplification, our system ranks first and is very close to the manually simplified input (the difference is not statistically significant ).
Related Work	No human evaluation is provided but the approach is shown to result in statistically significant improvements over a traditional phrase based approach.

statistically significant is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

43. Abstractive Summarization of Spoken and Written Conversations Based on Phrasal Queries

Mehdad, Yashar and Carenini, Giuseppe and Ng, Raymond T.

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Setup	Abstractive vs. Extractive: our full query-based abstractive summariztion system show statistically significant improvements over baselines
Experimental Setup	4The statistical significance tests was calculated by approximate randomization, as described in (Yeh, 2000).
Introduction	Automatic evaluation on the chat dataset and manual evaluation over the meetings and emails show that our system uniformly and statistically significantly outperforms baseline systems, as well as a state-of-the-art query-based extractive summarization system.

statistically significant is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

44. Learning Word Sense Distributions, Detecting Unattested Senses and Identifying Novel Senses Using Topic Models

Lau, Jey Han and Cook, Paul and McCarthy, Diana and Gella, Spandana and Baldwin, Timothy

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

WordNet Experiments	Based on the McNemar’s Test with Yates correction for continuity, MKWC is significantly better over BNC and HDP-WSI is significantly better over FINANCE (p < 0.0001 in both cases), but the difference over SPORTS is not statistically significance (p > 0.1).
WordNet Experiments	Testing for statistical significance over the paired J S divergence values for each lemma using the Wilcoxon signed-rank test, the result for F1-NANCE is significant (p < 0.05) but the results for the other two datasets are not (p > 0.1 in each case).
WordNet Experiments	To summarise, the results for MKWC and HDP-WSI are fairly even for predominant sense leam-ing (each outperforms the other at a level of statistical significance over one dataset), but HDP—WSI is better at inducing the overall sense distribution.

statistically significant is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

45. Selecting Query Term Alternations for Web Search by Exploiting Query Contexts

Cao, Guihong and Robertson, Stephen and Nie, Jian-Yun

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments 6.1 Experiment Settings	We also conducted t-tests to determine whether the improvement is statistically significant .
Experiments 6.1 Experiment Settings	model is statistically significant with p-value<0.05,
Experiments 6.1 Experiment Settings	Moreover, the improvements on five collections are statistically significant .

statistically significant is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

46. Generalized Character-Level Spelling Error Correction

Farra, Noura and Tomeh, Nadi and Rozovskaya, Alla and Habash, Nizar

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Rows marked with an asterisk (*) are statistically significant compared to CEC (for the first half of the table) or CEC+MLE (for the second half of the table), with p < 0.05.
Experiments	In fact, CEC+MLE and GSEC+MLE perform similarly (p = 0.36, not statistically significant ).
Experiments	The two results are statistically significant (p < 0.0001) with respect to CBC and CEC+MLE respectively.

statistically significant is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

word-level (5)
statistically significant (3)

47. Distributional Identification of Non-Referential Pronouns

Bergsma, Shane and Lin, Dekang and Goebel, Randy

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Results	The difference between COMBO and DISTRIB is not statistically significant , while both are significantly better than the rule-based approaches.8 This provides strong motivation for a “lightweight” approach to non-referential it detection — one that does not require parsing or handcrafted rules and — is easily ported to new languages and text domains.
Results	Using no truncation (Unaltered) drops the F-Score by 4.3%, while truncating the patterns to a length of four only drops the F-Score by 1.4%, a difference which is not statistically significant .
Results	Neither are statistically significant ; however there seems to be diminishing returns from longer context patterns.

statistically significant is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

F-Score (13)
coreference (11)
n-gram (9)

48. Improving the Use of Pseudo-Words for Evaluating Selectional Preferences

Chambers, Nathanael and Jurafsky, Daniel

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Results	Statistical significance tests were calculated using the approximate randomization test (Yeh, 2000) with 1000 iterations.
Results	* indicates statistical significance with the column’s Baseline at the p < 0.01 level, T at p < 005.
Results	All numbers are statistically significant * with p-value < 0.01 from the number to their left.

statistically significant is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

49. Adaptive Quality Estimation for Machine Translation

Turchi, Marco and Anastasopoulos, Antonios and C. de Souza, José G. and Negri, Matteo

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments with CAT data	These results (MAE reductions are always statistically significant ) suggest that, when dealing with datasets with very different label distributions, the evident limitations of batch methods are more easily overcome by learning from scratch from the feedback of a new post-editor.
Experiments with WMT12 data	11Results marked with the “*” symbol are NOT statistically significant compared to the corresponding batch model.
Experiments with WMT12 data	The others are always statistically significant at p§0.005, calculated with approximate randomization (Yeh, 2000).

statistically significant is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

50. Aid is Out There: Looking for Help from Tweets during a Large Scale Disaster

Varga, István and Sano, Motoki and Torisawa, Kentaro and Hashimoto, Chikara and Ohtake, Kiyonori and Kawai, Takao and Oh, Jong-Hoon and De Saeger, Stijn

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	The improvement in precision when using TR&EX is statistically significant (p < 0.05).9 Note that F-measure dropped
Experiments	The improvement in precision when using TR&EX is statistically significant (p < 0.01).
Experiments	statistically significant in both settings (p < 0.01).

statistically significant is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

51. Summarizing Emails with Conversational Cohesion and Subjectivity

Carenini, Giuseppe and Ng, Raymond T. and Zhou, Xiaodong

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Empirical Evaluation	As to the statistical significance , we use the 2-tail pairwise student t-test in all the experiments to compare two specific methods.
Empirical Evaluation	Meanwhile, both semantic similarities have lower accuracy than CWS, and the differences are also statistically significant even with the conservative Bonferroni adjustment (i.e., the p-values in Table 1 are multiplied by three).
Empirical Evaluation	The increase in precision is at least 0.04 with statistical significance .

statistically significant is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

52. Open-Domain Semantic Role Labeling by Modeling Word Spans

Huang, Fei and Yates, Alexander

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	Differences in both precision and recall between the baseline and the other systems are statistically significant at p < 0.01 using the two-tailed Fisher’s exact test.
Introduction	Differences in both precision and recall between the baseline and the Span-HMM systems are statistically significant at p < 0.01 using the two-tailed Fisher’s exact test.
Introduction	were not statistically significant , except that the difference in precision between the Multi-Span-HMM and the Span-HMM-Base10 is significant at p < .1.

statistically significant is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

53. Vector-based Models of Semantic Composition

Mitchell, Jeff and Lapata, Mirella

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Evaluation Setup	A Wilcoxon rank sum test confirmed that the difference is statistically significant (p < 0.01).
Results	The difference between High and Low similarity values estimated by these models are statistically significant (p < 0.01 using the Wilcoxon rank sum test).
Results	However, the difference between the two models is not statistically significant .

statistically significant is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

54. Verb Classification using Distributional Similarity in Syntactic and Semantic Structures

Croce, Danilo and Moschitti, Alessandro and Basili, Roberto and Palmer, Martha

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	We also derive statistical significance of the results by using the model described in (Yeh, 2000) and implemented in (Pado, 2006).
Experiments	Third, PTK, which produces more general structures, improves over BR by almost 1.5 ( statistically significant result) when using our dependency structures GRCT and LCT.
Experiments	Finally, the best model of SPTK (i.e, using LCT) improves over the best PTK (i.e., using LCT) by almost 1 point ( statistically significant result): this difference is only given by lexical similarity.

statistically significant is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

55. "Let Everything Turn Well in Your Wife": Generation of Adult Humor Using Lexical Constraints

Valitutti, Alessandro and Toivonen, Hannu and Doucet, Antoine and Toivanen, Jukka M.

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Background	The evaluations indicate statistical significance , but the test settings are relatively specific.
Conclusions	The statistical significance is particularly high, even though there were several limitations in the experimental setting.
Evaluation	According to the one-sided Wilcoxon rank-sum test, both Collective Funniness and all Upper Agreements increase from FORM to FORM+TABOO and from FORM+TABOO to FORM+TABOO+CONT statistically significantly (in all cases p < .002).

statistically significant is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

56. A Two Level Model for Context Sensitive Inference Rules

Melamud, Oren and Berant, Jonathan and Dagan, Ido and Goldberger, Jacob and Szpektor, Idan

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Results	to compute MAP values and corresponding statistical significance , we randomly split each test set into 30 subsets.
Results	This improvement is statistically significant at p < 0.01 for BInc and Lin, and p < 0.015 for Cosine, using paired t-test.
Results	On test-setivc, where context mismatches are abundant, our model outperformed all other baselines ( statistically significant at p < 0.01).

statistically significant is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

57. Distributed Word Clustering for Large Scale Class-Based Language Modeling in Machine Translation

Uszkoreit, Jakob and Brants, Thorsten

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion	When using predictive class-based models in combination with a word-based language model trained on very large amounts of data, the improvements continue to be statistically significant on the test and nist06 sets.
Experiments	Adding the class-based models leads to small improvements in BLEU score, with the highest improvements for both dev and nist06 being statistically significant 2.
Experiments	2Differences of more than 0.0051 are statistically significant at the 0.05 level using bootstrap resampling (Noreen, 1989; Koehn, 2004)

statistically significant is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

58. An Information Theoretic Approach to Bilingual Word Clustering

Faruqui, Manaal and Dyer, Chris

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	To evaluate our method, we use the word clusters in an NER system and demonstrate a statistically significant improvement in F1 score when using bilingual word clusters instead of monolingual clusters.
Experiments	For English and Turkish we observe a statistically significant improvement over the monolingual model (cf.
Experiments	English again has a statistically significant improvement over the baseline.

statistically significant is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

59. Shallow Local Multi-Bottom-up Tree Transducers in Statistical Machine Translation

Braune, Fabienne and Seemann, Nina and Quernheim, Daniel and Maletti, Andreas

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	The starred results are statistically significant improvements over the Baseline (at confidence p < 0.05).
Experiments	This improvement is statistically significant at confidence p < 0.05, which we computed using the pairwise bootstrap resampling technique of Koehn (2004).
Experiments	However this improvement (0.34) is not statistically significant .

statistically significant is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

60. HEADY: News headline abstraction through event pattern clustering

Alfonseca, Enrique and Pighin, Daniele and Garrido, Guillermo

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	When compared to a state-of-the-art open-domain headline abstraction system (Filippova, 2010), the new headlines are statistically significantly better both in terms of readability and informativeness.
Results	0 Amongst the automatic systems, HEADY performed better than MSC, with statistical significance at 95% for all the metrics.
Results	o The most frequent pattern baseline and HEADY have comparable performance across all the metrics (not statistically significantly different), although HEADY has slightly better scores for all metrics except for informativeness.

statistically significant is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

61. Exploiting Multiple Treebanks for Parsing with Quasi-synchronous Grammars

Li, Zhenghua and Liu, Ting and Che, Wanxiang

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments and Analysis	The p—values in parentheses present the statistical significance of the improvements.
Experiments and Analysis	The improvements shown in parentheses are all statistically significant (p < 10—5).
Experiments and Analysis	The improvements shown in parentheses are all statistically significant (p < 10—5).

statistically significant is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

62. Community Answer Summarization for Multi-Sentence Question with Group L1 Regularization

Chan, Wen and Zhou, Xiangdong and Wang, Wei and Chua, Tat-Seng

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Results	The uparrow denotes the performance improvement compared to the precious method (above) with statistical significance under p value of 0.05, the short line ’-’ denotes there is no difference in statistical significance .
Experimental Results	which just utilize the independent sentence-level features, behave not vary well here, and there is no statistically significant performance difference between them.
Experimental Results	We also find that LCRF which utilizes the local context information between sentences perform better than the LR method in precision and F1 with statistical significance .

statistically significant is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

CRF (21)
sentence-level (6)
SVM (6)

63. A Critical Reassessment of Evaluation Baselines for Speech Summarization

Penn, Gerald and Zhu, Xiaodan

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Future Work	It is entirely possible that, within this protocol, the baselines that have performed so well in our experiments, such as length or, in read news, position, will utterly fail, and that less traditional acoustic or spoken language features will genuinely, and with statistical significance , add value to a purely transcript-based text summarization system.
Results and analysis	The best performance is achieved by using all of the features together, but the length baseline, which uses only those features in bold type from Figure 3, is very close (no statistically significant difference), as is MMR.6
Results and analysis	The difference with respect to either of these baselines is statistically significant within the popular 10—30% compression range, as is the classifier trained on all features but acoustic

statistically significant is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

64. You Had Me at Hello: How Phrasing Affects Memorability

Danescu-Niculescu-Mizil, Cristian and Cheng, Justin and Kleinberg, Jon and Lee, Lillian

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

I’m ready for my closeup.	For the null hypothesis of random guessing, these results are statistically significant , p < 2‘6 m .016.
I’m ready for my closeup.	Table 2 shows that all the subjects performed (sometimes much) better than chance, and against the null hypothesis that all subjects are guessing randomly, the results are statistically significant , p < 2‘6 m .016.
Never send a human to do a machine’s job.	Accuracies statistically significantly greater than bag-of-words according to a two-tailed t-test are indicated with (p<.05) and *(p<.01).

statistically significant is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: