Conclusions and Future Work | Experiments conducted on a real CQA data show some promising findings: (1) the proposed method significantly outperforms the previous work for question retrieval; (2) the proposed matrix factorization can significantly improve the performance of question retrieval, no matter whether considering the translation languages or not; (3) considering more languages can further improve the performance but it does not seem to produce significantly better performance; (4) different languages contribute unevenly for question retrieval; (5) our proposed method can be easily adapted to the large-scale information retrieval task. |
Experiments | (2) Taking advantage of potentially rich semantic information drawn from other languages via statistical machine translation, question retrieval performance can be significantly improved (row 3, row 4, row 5 and row 6 vs. row 7, row 8 and row 9, all these comparisons are statistically significant at p < 0.05). |
Experiments | (1) Our proposed matrix factorization can significantly improve the performance of question retrieval (row 1 vs. row2; row3 vs. row4, the improvements are statistically significant at p < 0.05). |
Experiments | (3) Compared to VSM, the performance of SMT + IBM is significantly improved (row 1 vs. row 3), which supports the motivation that the word ambiguity and word mismatch problems could be partially addressed by Google Translate. |
Conclusion | Our experiments show that this model significantly improves machine learning performance. |
Discussion | Significant improvement (p < 0.05) in- |
Experimental Results | For self-empowerment recognition, all methods that we introduce are significant improvements in H, the |
Experimental Results | Statis-:ically significant improvements over baseline are narked (p < .01, T; p < .05, *; p < 0.1, +). |
Introduction | Using these techniques, we demonstrate a significant improvement in classifier performance when recognizing the language of empowerment in support group chatrooms, a critical application area for researchers studying conversational interactions in healthcare (Uden-Kraan et al., 2009). |
Abstract | Empirical results demonstrate significant improvements on prediction performance and time efficiency. |
Conclusions and Discussions | Empirical results for both binary and multi-class classification demonstrate significant improvements over the existing logistic supervised topic models. |
Introduction | Finally, our empirical results on real data sets demonstrate significant improvements on time efficiency. |
Introduction | The classification performance is also significantly improved by using appropriate regularization parameters. |
Application to Essay Scoring | Finally, we report on an experiment where we significantly improve the performance of a very competitive, state-of-art system for automated scoring of essays, using a feature derived from WAP. |
Application to Essay Scoring | Significant improvements are underlined. |
Conclusion | Finally, we demonstrated that the information provided by word association profiles leads to a significant improvement in a highly competitive, state-of-art essay scoring system that already measures various aspects of writing quality. |
Related Work | The results were similar to those of the blind test presented here, with e-rater+HAT significantly improving upon e-rater alone, using Wilcoxon test, W=374, n=29, p<0.05. |
Abstract | Our best model achieves statistically significant improvement over the state-of-the-art systems on several metrics (e. g. 8.0% and 5.4% improvements in ROUGE-2 respectively) for the DUC 2006 and 2007 summarization task. |
Introduction | Our tree-based methods rely on a scoring function that allows for easy and flexible tailoring of sentence compression to the summarization task, ultimately resulting in significant improvements for MDS, while at the same time remaining competitive with existing methods in terms of sentence compression, as discussed next. |
Introduction | With these results we believe we are the first to successfully show that sentence compression can provide statistically significant improvements over pure extraction-based approaches for query-focused MDS. |
Results | Our sentence-compression-based systems (marked with T) show statistically significant improvements over pure extractive summarization for both R-2 and R-SU4 (paired t-test, p < 0.01). |
Abstract | We achieve significant improvement over both Hiero and phrase-based baselines for Arabic-English, Chinese-English and German-English translation. |
Experiment Results | Phrase-based with lexicalized reordering fea-tures(PB+leX) shows significant improvement on all test sets over the simple phrase-based system without lexicalized reordering (PB +nolex). |
Experiment Results | distance-based reordering feature (P.H+dist) to the Arabic-English experiment but get significant improvements when adding the six features of the lexicalized reordering (P.H+lex). |
Experiment Results | And our best Phrasal-Hiero significantly improves over the best phrase-based baseline by 0.54 BLEU points. |
Abstract | Our experiments show significant improvement of our new model over baselines with three evaluation metrics in the new task. |
Conclusion | We achieve significant improvement over baselines. |
Experiments | With WTMF being a very challenging baseline, WTMF-G can still significantly improve all 3 metrics. |
Experiments | In the case k = 4, 6 = 3 compared to WTMF, WTMF-G receives +1.37l% TOP10, +2.727% RR, and +0.518% ATOP value (this is a significant improvement of ATOP value considering that it is averaged on 30,000 data points, at an already high level of 96% reducing error rate by 13%). |
Experiments | The best results, that present a statistically significant improvement when compared to the random baseline, are obtained when distance information and the number of entities “shared” by two sentences are taken into account (PW). |
Experiments | A statistically significant improvement is provided by including syntactic information. |
Experiments | Finally, when distance is accounted for together with syntactic information, the accuracy is significantly improved (p < 0.01) with regard to the results obtained with syntactic information only. |
Abstract | This paper proposes a new method for significantly improving the performance of pairwise coreference models. |
Conclusion and perspectives | Using different kinds of greedy decoders, we showed a significant improvement of the system. |
Modeling pairs | Instead of imposing heuristic product of features, we will show that a clever separation of instances leads to significant improvements of the pairwise model. |
Abstract | As our approach combines the merits of phrase-based and string-to-dependency models, it achieves significant improvements over the two baselines on the NIST Chinese-English datasets. |
Introduction | Adding dependency language model (“depLM”) and the maximum entropy shift-reduce parsing model (“maxent”) significantly improves BLEU and TER on the development set, both separately and jointly. |
Introduction | We find that adding dependency language and maximum entropy shift-reduce models consistently brings significant improvements , both separately and jointly. |
Abstract | With Chinese word segmentation as a case study, experiments show that the segmenter enhanced with the Chinese wikipedia achieves significant improvement on a series of testing sets from different domains, even with a single classifier and local features. |
Conclusion and Future Work | Experiments on Chinese word segmentation show that, the enhanced word segmenter achieves significant improvement on testing sets of different domains, although using a single classifier with only local features. |
Introduction | Experimental results show that, the knowledge implied in the natural annotations can significantly improve the performance of a baseline segmenter trained on CTB 5.0, an F-measure increment of 0.93 points on CTB test set, and an average increment of 1.53 points on 7 other domains. |
Abstract | The experiments on Japanese and Chinese WS have shown that the proposed models achieve significant improvement over state-of-the-art, reducing 16% errors in Japanese. |
Conclusion and Future Works | The experimental results show that the model achieves a significant improvement over the baseline and LM augmentation, achieving 16% WER reduction in the EC domain. |
Introduction | The results show that we achieved a significant improvement in WS accuracy in both languages. |
Abstract | Experiments on the OntoNotes dataset demonstrate that our method yields significant improvements in both NER and word alignment over state-of-the-art monolingual baselines. |
Conclusion | Results from NER and word alignment experiments suggest that our method gives significant improvements in both NER and word alignment. |
Joint NER and Alignment Results | By jointly decoding NER with word alignment, our model not only maintains significant improvements in NER performance, but also yields significant improvements to alignment performance. |
Abstract | We show that the recovered empty categories not only improve the word alignment quality, but also lead to significant improvements in a large-scale state-of-the-art syntactic MT system. |
Conclusions and Future Work | We also applied the predicted ECs to a large-scale Chinese-to-English machine translation task and achieved significant improvement over two strong MT base- |
Introduction | 0 Show significant improvement on top of the state-of-the-art large-scale hierarchical and syntactic machine translation systems. |
Discussion and future work | While past research has used logistic regression as a binary classifier (Newman et al., 2003), our experiments show that the best-performing classifiers allow for highly nonlinear class boundaries; SVM and RF models achieve between 62.5% and 91.7% accuracy across age groups — a significant improvement over the baselines of LR and NB, as well as over previous results. |
Results | performs best, with 59.5% cross-validation accuracy, which is a statistically significant improvement over the baselines of LR (75(4) 2 22.25, p < .0001), and NB (25(4) 2 16.19,]? |
Results | In comparison with classification accuracy on pooled data, a paired t-test shows statistically significant improvement across all age groups using RF, 75(3) = 1037,]? |
Abstract | Experiments show that our approach helps to achieve significant improvements on translation quality. |
Experiment | Moreover, after we import the MEPD model into system PASTR, we get a significant improvement over PASTR (by 0.54 BLEU points). |
Introduction | Experiments show that the two PAS disambiguation methods significantly improve the baseline translation system. |
Abstract | To evaluate our method, we use the word clusters in an NER system and demonstrate a statistically significant improvement in F1 score when using bilingual word clusters instead of monolingual clusters. |
Experiments | For English and Turkish we observe a statistically significant improvement over the monolingual model (cf. |
Experiments | English again has a statistically significant improvement over the baseline. |