Index of papers in Proc. ACL that mention
  • significant improvements
Amigó, Enrique and Giménez, Jesús and Gonzalo, Julio and Verdejo, Felisa
Alternatives to Correlation-based Meta-evaluation
PreCISIon of Significant improvement prediction
Alternatives to Correlation-based Meta-evaluation
Recall of Significant Improvement
Alternatives to Correlation-based Meta-evaluation
We now investigate to what extent a significant system improvement according to the metric implies a significant improvement according to human assessors, and Viceversa.
significant improvements is mentioned in 12 sentences in this paper.
Topics mentioned in this paper:
Gao, Wei and Blitzer, John and Zhou, Ming and Wong, Kam-Fai
Abstract
Using publicly available Chinese and English query logs, we demonstrate for both languages that our ranking technique exploiting bilingual data leads to significant improvements over a state-of-the-art monolingual ranking algorithm.
Conclusions and Future Work
Our pairwise ranking scheme based on bilingual document pairs can easily integrate all kinds of similarities into the existing framework and significantly improves both English and Chinese ranking performance.
Experiments and Results
The significant improvements over baseline (99% confidence) are bolded with the p-values given in parenthesis.
Experiments and Results
* indicates significant improvement over IR (no similarity).
Experiments and Results
Relatively fewer significant improvements can be made by heuristic H-1 (max score).
Introduction
For both languages, we achieve significant improvements over monolingual Ranking SVM (RSVM) baselines (Herbrich et al., 2000; J oachims, 2002), which exploit a variety of monolingual features.
significant improvements is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Agirre, Eneko and Baldwin, Timothy and Martinez, David
Abstract
This paper shows that semantic classes help to obtain significant improvement in both parsing and PP attachment tasks.
Background
The combination of word sense and first-level hypernyms produced a significant improvement over their basic model.
Conclusions
This paper shows that semantic classes achieve significant improvement both on full parsing and PP attachment tasks relative to the baseline parsers.
Conclusions
The results are highly significant in demonstrating that a simplistic approach to incorporating lexical semantics into a parser significantly improves parser performance.
Introduction
This paper shows that semantic classes help to obtain significant improvements for both PP attachment and parsing.
Introduction
The results are notable in demonstrating that very simple preprocessing of the parser input facilitates significant improvements in parser performance.
Results
The results are similar to IST, with significant improvements for verbs.
significant improvements is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Fang, Hui
Abstract
In this paper, we reexamine this problem using recently proposed axiomatic approaches and find that, with appropriate term weighting strategy, we are able to exploit the information from lexical resources to significantly improve the retrieval performance.
Conclusions
In this paper, we reexamine the problem of query expansion using lexical resources in recently proposed axiomatic framework and find that we are able to significantly improve retrieval performance through query expansion using only handcrafted lexical resources.
Experiments
By incorporating this similarity function into the axiomatic retrieval models, we show that query expansion using the information from only WordNet can lead to significant improvement of retrieval performance, which has not been shown in the previous studies (Voorhees, 1994; Stairmand, 1997).
Experiments
QEdef significantly improves the retrieval performance for all the data sets.
Introduction
Using this similarity function in query expansion can significantly improve the retrieval performance.
Introduction
Unlike previous studies, we are able to show that query expansion using only manually created lexical resources can significantly improve the retrieval performance.
significant improvements is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Zhong, Zhi and Ng, Hwee Tou
Abstract
Our experimental results on standard TRE C collections show that using the word senses tagged by a supervised WSD system, we obtain significant improvements over a state-of-the-art IR system.
Experiments
tistically significant improvements over Stemprf on TREC7, TREC8, and RBO3.
Introduction
In the application of WSD to MT, research has shown that integrating WSD in appropriate ways significantly improves the performance of MT systems (Chan et al., 2007; Carpuat and Wu, 2007).
Introduction
Our evaluation on standard TREC1 data sets shows that supervised WSD outperforms two other WSD baselines and significantly improves IR.
Related Work
They obtained significant improvements by representing documents and queries with accurate senses as well as synsets (synonym sets).
Related Work
Their evaluation on TREC collections achieved significant improvements over a standard term based vector space model.
significant improvements is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Wang, WenTing and Su, Jian and Tan, Chew Lim
Abstract
The experiment shows tree kernel approach is able to give statistical significant improvements over flat syntactic path feature.
Abstract
Besides, we further propose to leverage on temporal ordering information to constrain the interpretation of discourse relation, which also demonstrate statistical significant improvements for discourse relation recognition on PDTB 2.0 for both explicit and implicit as well.
Conclusions and Future Works
The experimental results on PDTB v2.0 show that our kernel-based approach is able to give statistical significant improvement over flat syntactic path method.
Conclusions and Future Works
In addition, we also propose to incorporate temporal ordering information to constrain the interpretation of discourse relations, which also demonstrate statistical significant improvements for discourse relation recognition, both explicit and implicit.
Introduction
The experiment shows that tree kernel is able to effectively incorporate syntactic structural information and produce statistical significant improvements over flat syntactic path feature for the recognition of both explicit and implicit relation in Penn Discourse Treebank (PDTB; Prasad et al., 2008).
Introduction
Besides, inspired by the linguistic study on tense and discourse anaphor (Webber, 1988), we further propose to incorporate temporal ordering information to constrain the interpretation of discourse relation, which also demonstrates statistical significant improvements for discourse relation recognition on PDTB v2.0 for both explicit and implicit relations.
significant improvements is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Mi, Haitao and Liu, Qun
Abstract
Medium-scale experiments show an absolute and statistically significant improvement of +0.7 BLEU points over a state-of-the-art forest-based tree-to-string system even with fewer rules.
Experiments
With the help of the dependency language model, our new model achieves a significant improvement of +0.7 BLEU points over the forest 625 baseline system (p < 0.05, using the sign-test suggested by
Introduction
Both string-to-constituency system (e.g., (Galley et al., 2006; Marcu et al., 2006)) and string-to-dependency model (Shen et al., 2008) have achieved significant improvements over the state-of-the-art formally syntax-based system Hiero (Chiang, 2007).
Introduction
Medium data experiments (Section 5) show a statistically significant improvement of +0.7 BLEU points over a state-of-the-art forest-based tree-to-string system even with less translation rules, this is also the first time that a tree-to-tree model can surpass tree-to-string counterparts.
Model
(2009), their forest-based constituency-to-constituency system achieves a comparable performance against Moses (Koehn et al., 2007), but a significant improvement of +3.6 BLEU points over the 1-best tree-based constituency-to-constituency system.
Related Work
This model shows a significant improvement over the state-of-the-art hierarchical phrase-based system (Chiang, 2005).
significant improvements is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Lu, Shixiang and Chen, Zhenbiao and Xu, Bo
Abstract
On two Chinese-English tasks, our semi-supervised DAE features obtain statistically significant improvements of l.34/2.45 (IWSLT) and 0.82/1.52 (NIST) BLEU points over the unsupervised DBN features and the baseline features, respectively.
Conclusions
The results also demonstrate that DNN (DAE and HCDAE) features are complementary to the original features for SMT, and adding them together obtain statistically significant improvements of 3.16 (IWSLT) and 2.06 (NIST) BLEU points over the baseline features.
Experiments and Results
Adding new DNN features as extra features significantly improves translation accuracy (row 2-17 vs. 1), with the highest increase of 2.45 (IWSLT) and 1.52 (NIST) (row 14 vs. 1) BLEU points over the baseline features.
Experiments and Results
Also, adding more input features (X vs. X1) not only significantly improves the performance of DAE feature learning, but also slightly improves the performance of DBN feature learning.
Introduction
To address the first shortcoming, we adapt and extend some simple but effective phrase features as the input features for new DNN feature leam-ing, and these features have been shown significant improvement for SMT, such as, phrase pair similarity (Zhao et al., 2004), phrase frequency, phrase length (Hopkins and May, 2011), and phrase generative probability (Foster et al., 2010), which also show further improvement for new phrase feature learning in our experiments.
Introduction
Our semi-supervised DAE features significantly outperform the unsupervised DBN features and the baseline features, and our introduced input phrase features significantly improve the performance of DAE feature
significant improvements is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Wu, Xianchao and Matsuzaki, Takuya and Tsujii, Jun'ichi
Conclusion
Extensive experiments on large-scale bidirectional Japanese-English translations testified the significant improvements on BLEU score.
Experiments
Comparing the BLEU-4 scores of PTT+C'3;g and PTT+03, we gained 0.56 (t2s) and 0.57 (s2t) BLEU-4 points which are significant improvements (p < 0.05).
Experiments
Furthermore, we gained 0.50 (t2s) and 0.62 (s2t) BLEU-4 points from PTT+FS to PTT+F, which are also significant improvements (p < 0.05).
Related Work
By introducing supertags into the target language side, i.e., the target language model and the target side of the phrase table, significant improvement was achieved for Arabic-to-English translation.
Related Work
(2007) also reported a significant improvement for Dutch-English translation by applying CCG supertags at a word level to a factorized SMT system (Koehn et al., 2007).
significant improvements is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Mayfield, Elijah and Adamson, David and Penstein Rosé, Carolyn
Conclusion
Our experiments show that this model significantly improves machine learning performance.
Discussion
Significant improvement (p < 0.05) in-
Experimental Results
For self-empowerment recognition, all methods that we introduce are significant improvements in H, the
Experimental Results
Statis-:ically significant improvements over baseline are narked (p < .01, T; p < .05, *; p < 0.1, +).
Introduction
Using these techniques, we demonstrate a significant improvement in classifier performance when recognizing the language of empowerment in support group chatrooms, a critical application area for researchers studying conversational interactions in healthcare (Uden-Kraan et al., 2009).
significant improvements is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Vaswani, Ashish and Huang, Liang and Chiang, David
Abstract
We explain how to implement this extension efliciently for large-scale data (also released as a modification to GIZA++) and demonstrate, in experiments on Czech, Arabic, Chinese, and Urdu to English translation, significant improvements over IBM Model 4 in both word alignment (up to +6.7 F1) and translation quality (up to +1.4 B ).
Conclusion
The method is implemented as a modification to the open-source toolkit GIZA++, and we have shown that it significantly improves translation quality across four different language pairs.
Experiments
All of the tests showed significant improvements (1) < 0.01), ranging from +0.4 B to +1.4 B For Urdu, even though we didn’t have manual alignments to tune hyperparameters, we got significant gains over a good baseline.
Introduction
In this paper, we propose a simple extension to the IBM/HMM models that is unsupervised like the IBM models, is as scalable as GIZA++ because it is implemented on top of GIZA++, and provides significant improvements in both alignment and translation quality.
Introduction
Experiments on Czech-, Arabic-, Chinese- and Urdu-English translation (Section 3) demonstrate consistent significant improvements over IBM Model 4 in both word alignment (up to +6.7 F1) and translation quality (up to +1.4 B ).
significant improvements is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Zhou, Guangyou and Liu, Fang and Liu, Yang and He, Shizhu and Zhao, Jun
Conclusions and Future Work
Experiments conducted on a real CQA data show some promising findings: (1) the proposed method significantly outperforms the previous work for question retrieval; (2) the proposed matrix factorization can significantly improve the performance of question retrieval, no matter whether considering the translation languages or not; (3) considering more languages can further improve the performance but it does not seem to produce significantly better performance; (4) different languages contribute unevenly for question retrieval; (5) our proposed method can be easily adapted to the large-scale information retrieval task.
Experiments
(2) Taking advantage of potentially rich semantic information drawn from other languages via statistical machine translation, question retrieval performance can be significantly improved (row 3, row 4, row 5 and row 6 vs. row 7, row 8 and row 9, all these comparisons are statistically significant at p < 0.05).
Experiments
(1) Our proposed matrix factorization can significantly improve the performance of question retrieval (row 1 vs. row2; row3 vs. row4, the improvements are statistically significant at p < 0.05).
Experiments
(3) Compared to VSM, the performance of SMT + IBM is significantly improved (row 1 vs. row 3), which supports the motivation that the word ambiguity and word mismatch problems could be partially addressed by Google Translate.
significant improvements is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Duan, Manjuan and White, Michael
Background
To improve word ordering decisions, White & Rajkumar (2012) demonstrated that incorporating a feature into the ranker inspired by Gibson’s (2000) dependency locality theory can deliver statistically significant improvements in automatic evaluation scores, better match the distributional characteristics of sentence orderings, and significantly reduce the number of serious ordering errors (some involving vicious ambiguities) as confirmed by a targeted human evaluation.
Introduction
With the SVM reranker, we obtain a significant improvement in BLEU scores over
Reranking with SVMs 4.1 Methods
tures and the n-best parse features contributed to achieving a significant improvement compared to the perceptron model.
Reranking with SVMs 4.1 Methods
The complete model, BBS+dep+nbest, achieved a BLEU score of 88.73, significantly improving upon the perceptron model (p < 0.02).
Simple Reranking
However, as shown in Table 2, none of the parsers yielded significant improvements on the top of the perceptron model.
significant improvements is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Mitchell, Jeff and Lapata, Mirella and Demberg, Vera and Keller, Frank
Method
likelihood ratio is significant, then this indicates that the new factor significantly improves model fit.
Results
The addition of the semantic factor significantly improves model fit for both the simple semantic space and LDA.
Results
Considering the trigram model first, we find that adding this factor to the model gives a significant improvement in fit.
Results
As far as LDA is concerned, the additive model significantly improves model fit, whereas the multiplicative one does not.
significant improvements is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Salameh, Mohammad and Cherry, Colin and Kondrak, Grzegorz
Abstract
We investigate this technique in the context of English-to-Arabic and English-to-Finnish translation, showing significant improvements in translation quality over desegmentation of l-best decoder outputs.
Conclusion
We have also applied our approach to English-to-Finnish translation, and although segmentation in general does not currently help, we are able to show significant improvements over a 1-best desegmentation baseline.
Introduction
We demonstrate that significant improvements in translation quality can be achieved by training a linear model to re-rank this transformed translation space.
Results
In fact, even with our lattice desegmenter providing a boost, we are unable to see a significant improvement over the unsegmented model.
Results
Nonetheless, the 1000-best and lattice desegmenters both produce significant improvements over the 1-best desegmentation baseline, with Lattice Deseg achieving a 1-point improvement in TER.
significant improvements is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Olsson, J. Scott and Oard, Douglas W.
Experiments
That is, we test for significant improvements in GMAP by applying the Wilcoxon signed rank test to the paired, transformed average precisions, log AP.
Experiments
This represents significant improvements over IBM transcripts used in earlier CL-SR evaluations, which had a best reported WER of 39.6% (Byrne et al., 2004).
Results
combination is a statistically significant improvement (04 = 0.05) over our new transcript set (that is, over the best single transcript result).
Results
Tests for statistically significant improvements in GMAP are computed using our paired log AP test, as discussed in Section 4.2.2.
Results
One surprising observation from Table 1 is that the mean improvement in log AP for interleaving is fairly large and yet not statistically significant (it is in fact a larger mean improvement than several other baseline combination approaches which are significant improvements .
significant improvements is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Mylonakis, Markos and Sima'an, Khalil
Abstract
We obtain statistically significant improvements across 4 different language pairs with English as source, mounting up to +1.92 BLEU for Chinese as target.
Experiments
Even for Dutch and German, which pose additional challenges such as compound words and morphology which we do not explicitly treat in the current system, LTS still delivers significant improvements in performance.
Experiments
Notably, as can be seen in Table 2(b), switching to a 4-gram LM results in performance gains for both the baseline and our system and while the margin between the two systems decreases, our system continues to deliver a considerable and significant improvement in translation BLEU scores.
Introduction
By advancing from structures which mimic linguistic syntax, to learning linguistically aware latent recursive structures targeting translation, we achieve significant improvements in translation quality for 4 different language pairs in comparison with a strong hierarchical translation baseline.
significant improvements is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Rieser, Verena and Lemon, Oliver
Comparison of Results
The results show that only the RL strategy leads to significantly improved user ratings (increasing average Task Ease by 49% and Future Use by 19%), whereas the ratings for the SL policy are not significantly better than those for the WOZ data, see Table 3.
Simulated Learning Environment
Table 1: Predicted accuracy for presentation timing and modality (with standard deviation i), * denotes statistically significant improvement at p < .05
Simulated Learning Environment
For presentation timing, none of the classifiers produces significantly improved results.
User Tests
data show significantly improved Task Ease, better presentation timing, more agreeable verbal and multimodal presentation, and that more users would use the RL-based system in the future (Future Use).
significant improvements is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Wang, Lu and Raghavan, Hema and Castelli, Vittorio and Florian, Radu and Cardie, Claire
Abstract
Our best model achieves statistically significant improvement over the state-of-the-art systems on several metrics (e. g. 8.0% and 5.4% improvements in ROUGE-2 respectively) for the DUC 2006 and 2007 summarization task.
Introduction
Our tree-based methods rely on a scoring function that allows for easy and flexible tailoring of sentence compression to the summarization task, ultimately resulting in significant improvements for MDS, while at the same time remaining competitive with existing methods in terms of sentence compression, as discussed next.
Introduction
With these results we believe we are the first to successfully show that sentence compression can provide statistically significant improvements over pure extraction-based approaches for query-focused MDS.
Results
Our sentence-compression-based systems (marked with T) show statistically significant improvements over pure extractive summarization for both R-2 and R-SU4 (paired t-test, p < 0.01).
significant improvements is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Sun, Weiwei and Uszkoreit, Hans
Combining Both
The significant improvement of the POS tagging also help successive language processing.
Conclusion
Both enhancements significantly improve the state-of-the-art of Chinese POS tagging.
Introduction
Experiments show that this model are significantly improved by word cluster features in accuracy across a wide range of conditions.
Introduction
We then present a comparative study of our tagger and the Berkeley parser, and show that the combination of the two models can significantly improve tagging accuracy.
significant improvements is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Nguyen, ThuyLinh and Vogel, Stephan
Abstract
We achieve significant improvement over both Hiero and phrase-based baselines for Arabic-English, Chinese-English and German-English translation.
Experiment Results
Phrase-based with lexicalized reordering fea-tures(PB+leX) shows significant improvement on all test sets over the simple phrase-based system without lexicalized reordering (PB +nolex).
Experiment Results
distance-based reordering feature (P.H+dist) to the Arabic-English experiment but get significant improvements when adding the six features of the lexicalized reordering (P.H+lex).
Experiment Results
And our best Phrasal-Hiero significantly improves over the best phrase-based baseline by 0.54 BLEU points.
significant improvements is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Chen, Boxing and Foster, George and Kuhn, Roland
Abstract
Significant improvements are obtained over a state-of-the—art hierarchical phrase-based machine translation system.
Conclusions and Future Work
We saw that the bilingual sense similarity computed by our algorithm led to significant improvements .
Experiments
Alg2 significantly improved the performance over the baseline.
Experiments
We can see that IBM model 1 and cosine distance similarity function both obtained significant improvement on all test sets of the two tasks.
significant improvements is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Bernhard, Delphine and Gurevych, Iryna
Abstract
We also show that the monolingual translation probabilities obtained (i) are comparable to traditional semantic relatedness measures and (ii) significantly improve the results over the query likelihood and the vector-space model for answer finding.
Conclusion and Future Work
Moreover, models based on translation probabilities yield significant improvement over baseline approaches for answer finding, especially when different types of training data are combined.
Introduction
This extrinsic evaluation shows that our translation models significantly improve the results over the query likelihood and the vector-space model.
Related Work
All in all, translation models have been shown to significantly improve the retrieval results over traditional baselines for document retrieval (Berger and Lafferty, 1999), question retrieval in Question & Answer archives (Jeon et al., 2005; Lee et al., 2008; Xue et al., 2008) and for sentence retrieval (Murdock and Croft, 2005).
significant improvements is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Constant, Matthieu and Sigogne, Anthony and Watrin, Patrick
Conclusions and Future Work
We showed that MWE pre-grouping significantly improves compound recognition and unlabeled dependency annotation, which implies that this strategy could be useful for dependency parsing.
Evaluation
Furthermore, pre-grouping has no statistically significant impact on the F-score14, whereas reranking leads to a statistically significant improvement (except for collocations).
Introduction
Although experiments always relied on a corpus where the MWEs were perfectly pre-identified, they showed that pre-grouping such expressions could significantly improve parsing accuracy.
Two strategies, two discriminative models
Chamiak and Johnson (2005) introduced different features that showed significant improvement in general parsing accuracy (e.g.
significant improvements is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Zhu, Jun and Zheng, Xun and Zhang, Bo
Abstract
Empirical results demonstrate significant improvements on prediction performance and time efficiency.
Conclusions and Discussions
Empirical results for both binary and multi-class classification demonstrate significant improvements over the existing logistic supervised topic models.
Introduction
Finally, our empirical results on real data sets demonstrate significant improvements on time efficiency.
Introduction
The classification performance is also significantly improved by using appropriate regularization parameters.
significant improvements is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Lu, Bin and Tan, Chenhao and Cardie, Claire and K. Tsou, Benjamin
Abstract
Experiments on multiple data sets show that the proposed approach (1) outperforms the monolingual baselines, significantly improving the accuracy for both languages by 3.44%-8.l2%; (2) outperforms two standard approaches for leveraging unlabeled data; and (3) produces (albeit smaller) performance gains when employing pseudo-parallel data from machine translation engines.
Conclusion
Our experiments show that the proposed approach can significantly improve sentiment classification for both languages.
Introduction
Accuracy is significantly improved for both languages, by 3.44%-8.12%.
Results and Analysis
Preliminary experiments showed that Equation 5 does not significantly improve the performance in our case, which is reasonable since we choose only sentence pairs with the highest translation probabilities to be our unlabeled data (see Section 4.1).
significant improvements is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Cui, Lei and Zhang, Dongdong and Liu, Shujie and Chen, Qiming and Li, Mu and Zhou, Ming and Yang, Muyun
Abstract
Experimental results show that our method significantly improves translation accuracy in the NIST Chinese-to-English translation task compared to a state-of-the-art baseline.
Conclusion and Future Work
It is a significant improvement over the state-of-the-art Hiero system, as well as a conventional LDA-based method.
Introduction
Experimental results demonstrate that our model significantly improves translation
Related Work
They incorporated the bilingual topic information into language model adaptation and lexicon translation model adaptation, achieving significant improvements in the large-scale evaluation.
significant improvements is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Beigman Klebanov, Beata and Flor, Michael
Application to Essay Scoring
Finally, we report on an experiment where we significantly improve the performance of a very competitive, state-of-art system for automated scoring of essays, using a feature derived from WAP.
Application to Essay Scoring
Significant improvements are underlined.
Conclusion
Finally, we demonstrated that the information provided by word association profiles leads to a significant improvement in a highly competitive, state-of-art essay scoring system that already measures various aspects of writing quality.
Related Work
The results were similar to those of the blind test presented here, with e-rater+HAT significantly improving upon e-rater alone, using Wilcoxon test, W=374, n=29, p<0.05.
significant improvements is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Guinaudeau, Camille and Strube, Michael
Experiments
The best results, that present a statistically significant improvement when compared to the random baseline, are obtained when distance information and the number of entities “shared” by two sentences are taken into account (PW).
Experiments
A statistically significant improvement is provided by including syntactic information.
Experiments
Finally, when distance is accounted for together with syntactic information, the accuracy is significantly improved (p < 0.01) with regard to the results obtained with syntactic information only.
significant improvements is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Guo, Weiwei and Li, Hao and Ji, Heng and Diab, Mona
Abstract
Our experiments show significant improvement of our new model over baselines with three evaluation metrics in the new task.
Conclusion
We achieve significant improvement over baselines.
Experiments
With WTMF being a very challenging baseline, WTMF-G can still significantly improve all 3 metrics.
Experiments
In the case k = 4, 6 = 3 compared to WTMF, WTMF-G receives +1.37l% TOP10, +2.727% RR, and +0.518% ATOP value (this is a significant improvement of ATOP value considering that it is averaged on 30,000 data points, at an already high level of 96% reducing error rate by 13%).
significant improvements is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Wu, Hua and Wang, Haifeng
Abstract
Experimental results on spoken language translation show that this hybrid method significantly improves the translation quality, which outperforms the method using a source-target corpus of the same size.
Abstract
Experimental results indicate that our method achieves consistent and significant improvement over individual translation outputs.
Conclusion
Experimental results indicate that this method can consistently and significantly improve translation quality over individual translation outputs.
Introduction
And this translation quality is higher than that of those produced by the system trained with a real Chinese-Spanish corpus; (3) Our sentence-level translation selection method consistently and significantly improves the translation quality over individual translation outputs in all of our experiments.
significant improvements is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Zhai, Feifei and Zhang, Jiajun and Zhou, Yu and Zong, Chengqing
Abstract
Experiments show that our approach helps to achieve significant improvements on translation quality.
Experiment
Moreover, after we import the MEPD model into system PASTR, we get a significant improvement over PASTR (by 0.54 BLEU points).
Introduction
Experiments show that the two PAS disambiguation methods significantly improve the baseline translation system.
significant improvements is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Tu, Mei and Zhou, Yu and Zong, Chengqing
Abstract
The experimental results show that significant improvements are achieved on various test data meanwhile the translations are more cohesive and smooth.
Conclusion
The experimental results show that significant improvements have been achieved on various test data, meanwhile the translations are more cohesive and smooth, which together demonstrate the effectiveness of our proposed models.
Related Work
To the best of our knowledge, our work is the first attempt to exploit the source functional relationship to generate the target transitional expressions for grammatical cohesion, and we have successfully incorporated the proposed models into an SMT system with significant improvement of BLEU metrics.
significant improvements is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Liu, Yang
Abstract
As our approach combines the merits of phrase-based and string-to-dependency models, it achieves significant improvements over the two baselines on the NIST Chinese-English datasets.
Introduction
Adding dependency language model (“depLM”) and the maximum entropy shift-reduce parsing model (“maxent”) significantly improves BLEU and TER on the development set, both separately and jointly.
Introduction
We find that adding dependency language and maximum entropy shift-reduce models consistently brings significant improvements , both separately and jointly.
significant improvements is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Lassalle, Emmanuel and Denis, Pascal
Abstract
This paper proposes a new method for significantly improving the performance of pairwise coreference models.
Conclusion and perspectives
Using different kinds of greedy decoders, we showed a significant improvement of the system.
Modeling pairs
Instead of imposing heuristic product of features, we will show that a clever separation of instances leads to significant improvements of the pairwise model.
significant improvements is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Wang, Mengqiu and Che, Wanxiang and Manning, Christopher D.
Abstract
Experiments on the OntoNotes dataset demonstrate that our method yields significant improvements in both NER and word alignment over state-of-the-art monolingual baselines.
Conclusion
Results from NER and word alignment experiments suggest that our method gives significant improvements in both NER and word alignment.
Joint NER and Alignment Results
By jointly decoding NER with word alignment, our model not only maintains significant improvements in NER performance, but also yields significant improvements to alignment performance.
significant improvements is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Xiang, Bing and Luo, Xiaoqiang and Zhou, Bowen
Abstract
We show that the recovered empty categories not only improve the word alignment quality, but also lead to significant improvements in a large-scale state-of-the-art syntactic MT system.
Conclusions and Future Work
We also applied the predicted ECs to a large-scale Chinese-to-English machine translation task and achieved significant improvement over two strong MT base-
Introduction
0 Show significant improvement on top of the state-of-the-art large-scale hierarchical and syntactic machine translation systems.
significant improvements is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Yancheva, Maria and Rudzicz, Frank
Discussion and future work
While past research has used logistic regression as a binary classifier (Newman et al., 2003), our experiments show that the best-performing classifiers allow for highly nonlinear class boundaries; SVM and RF models achieve between 62.5% and 91.7% accuracy across age groups — a significant improvement over the baselines of LR and NB, as well as over previous results.
Results
performs best, with 59.5% cross-validation accuracy, which is a statistically significant improvement over the baselines of LR (75(4) 2 22.25, p < .0001), and NB (25(4) 2 16.19,]?
Results
In comparison with classification accuracy on pooled data, a paired t-test shows statistically significant improvement across all age groups using RF, 75(3) = 1037,]?
significant improvements is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
van Schijndel, Marten and Elsner, Micha
Discussion
Training significantly improves role labelling in the case of object-extractions, which improves the overall accuracy of the model.
Evaluation
This slight, though significant in Eve, deficit is counterbalanced by a very substantial and significant improvement in object-extraction labelling accuracy.
Evaluation
Similarly, training confers a large and significant improvement for role assignment in wh-relative constructions, but it yields less of an improvement for that-relative constructions.
significant improvements is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Shen, Mo and Liu, Hongxiao and Kawahara, Daisuke and Kurohashi, Sadao
Abstract
Through experiments, we demonstrate that by introducing character-level POS information, the performance of a baseline morphological analyzer can be significantly improved .
Evaluation
The results show that, while the differences between the baseline model and the proposed model in word segmentation accuracies are small, the proposed model achieves significant improvement in the experiment of joint segmentati-
Introduction
Through experiments, we demonstrate that by introducing character-level POS information, the performance of a baseline morphological analyzer can be significantly improved .
significant improvements is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Li, Junhui and Marton, Yuval and Resnik, Philip and Daumé III, Hal
Abstract
Experiments on Chinese-English translation show that the reordering approach can significantly improve a state-of-the-art hierarchical phrase-based translation system.
Conclusion and Future Work
Experiments on Chinese-English translation show that the reordering approach can significantly improve a state-of-the-art hierarchical phrase-based translation system.
Discussion
We clearly see that using gold syntactic reordering types significantly improves the performance (e.g., 34.9 vs. 33.4 on average) and there is still some room for improvement by building a better maximum entropy classifiers (e.g., 34.9 vs. 34.3).
significant improvements is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Li, Zhenghua and Zhang, Min and Chen, Wenliang
Experiments and Analysis
Using unlabeled data with the results of Berkeley Parser (“Unlabeled <— B”) significantly improves parsing accuracy by 0.55% (93.40-92.85) on English and 1.06% (83.34-82.28) on Chinese.
Experiments and Analysis
However, we find that although the parser significantly outperforms the supervised GParser on English, it does not gain significant improvement over co-training with ZPar (“Unlabeled <— Z”) on both English and Chinese.
Introduction
All above work leads to significant improvement on parsing accuracy.
significant improvements is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Ng, Jun-Ping and Chen, Yan and Kan, Min-Yen and Li, Zhoujun
Experiments and Results
A statistically significant improvement of 4.1% is obtained with the use of all three features over SWING.
Experiments and Results
to guide the use of timelines such that significant improvements in R-2 over SWING are obtained.
Introduction
Compared to a competitive baseline, significant improvements of up to 4.1% are obtained.
significant improvements is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Wu, Xianchao and Matsuzaki, Takuya and Tsujii, Jun'ichi
Abstract
Extensive experiments involving large-scale English-to-Japanese translation revealed a significant improvement of 1.8 points in BLEU score, as compared with a strong forest-to-string baseline system.
Conclusion
Extensive experiments on large-scale English-to-Japanese translation resulted in a significant improvement in BLEU score of 1.8 points (p < 0.01), as compared with our implementation of a strong forest-to-string baseline system (Mi et al., 2008; Mi and Huang, 2008).
Experiments
Taking M&H-F as the baseline translation rule set, we achieved a significant improvement (p < 0.01) of 1.81 points.
significant improvements is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Carenini, Giuseppe and Ng, Raymond T. and Zhou, Xiaodong
Abstract
Moreover, subjective words can significantly improve accuracy.
Introduction
Our empirical evaluations show that subjective words and phrases can significantly improve email summarization.
Related Work
Their experiments showed that features about emails and the email thread could significantly improve the accuracy of summarization.
significant improvements is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Nakov, Preslav and Hearst, Marti A.
Relational Similarity Experiments
Turney (2006b) achieves 56% accuracy on this dataset, which matches the average human performance of 57%, and represents a significant improvement over the 20% random-guessing baseline.
Relational Similarity Experiments
Both results represent a statistically significant improvement over the majority class baseline and over using sentence words only, and a slight improvement over the best type A and type 0 systems on SemEval ’07, which achieved 66% and 67% accuracy, respectively.4
Relational Similarity Experiments
As we can see, using prepositions alone yields about 33% accuracy, which is a statistically significant improvement over the maj ority-class baseline.
significant improvements is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Li, Mu and Duan, Nan and Zhang, Dongdong and Li, Chi-Ho and Zhou, Ming
Abstract
Experimental results on data sets for NIST Chinese-to—English machine translation task show that the co-decoding method can bring significant improvements to all baseline decoders, and the outputs from co-decoding can be used to further improve the result of system combination.
Experiments
However, we did not observe any significant improvements for both combination schemes when n-best size is larger than 20.
Introduction
We will present experimental results on the data sets of NIST Chinese-to-English machine translation task, and demonstrate that co-decoding can bring significant improvements to baseline systems.
significant improvements is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Sun, Xu and Okazaki, Naoaki and Tsujii, Jun'ichi
Recognition as a Generation Task
As can be seen in Table 6, using the latent variables significantly improved the performance (see DPLVM vs. CRF), and using the GI encoding improved the performance of both the DPLVM and the CRF.
Results and Discussion
The results revealed that the latent variable model significantly improved the performance over the CRF model.
Results and Discussion
Whereas the use of the latent variables still significantly improves the generation performance, using the GI encoding undermined the performance in this task.
significant improvements is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Xiong, Deyi and Zhang, Min and Aw, Aiti and Li, Haizhou
Analysis
0 The constituent boundary matching feature (CBMF) is a very important feature, which by itself achieves significant improvement over the baseline (up to 1.13 BLEU).
Experiments
Like (Marton and Resnik, 2008), we find that the XP+ feature obtains a significant improvement of 1.08 BLEU over the baseline.
Introduction
Although experiments show that this constituent matching/violation counting feature achieves significant improvements on various language-pairs, one issue is that matching syntactic analysis can not always guarantee a good translation, and violating syntactic structure does not always induce a bad translation.
significant improvements is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Jiang, Wenbin and Liu, Qun
Conclusion and Future Works
In addition, when integrated into a 2nd-ordered MST parser, the projected parser brings significant improvement to the baseline, especially for the baseline trained on smaller treebanks.
Experiments
It indicates that, the smaller the human-annotated treebank we have, the more significant improvement we can achieve by integrating the projecting classifier.
Introduction
More importantly, when this classifier is integrated into a 2nd-ordered maXimum spanning tree (MST) dependency parser (McDonald and Pereira, 2006) in a weighted average manner, significant improvement is obtained over the MST baselines.
significant improvements is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Sun, Xu and Gao, Jianfeng and Micol, Daniel and Quirk, Chris
Clickthrough Data and Spelling Correction
Unfortunately, we found in our experiments that the pairs extracted using the method are too noisy for reliable error model training, even with a very tight threshold, and we did not see any significant improvement .
Experiments
a new phrase-based error model, which leads to significant improvement in our spelling correction experiments.
Introduction
Results show that the error models learned from clickthrough data lead to significant improvements on the task of query spelling correction.
significant improvements is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Sun, Jun and Zhang, Min and Tan, Chew Lim
Abstract
The experimental results show that our approach achieves a significant improvement on both gold standard tree bank and automatically parsed tree pairs against a heuristic similarity based method.
Substructure Spaces for BTKs
By introducing BTKs to construct a composite kernel, the performance in both corpora is significantly improved against only using the polynomial kernel for plain features.
Substructure Spaces for BTKs
Recent research on tree based systems shows that relaxing the restriction from tree structure to tree sequence structure (Synchronous Tree Sequence Substitution Grammar: STSSG) significantly improves the translation performance (Zhang et al., 2008).
significant improvements is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Vickrey, David and Kipersztok, Oscar and Koller, Daphne
Set Expansion
Using logistic regression instead of the untrained weights significantly improves performance.
Set Expansion
Using active learning also significantly improves performance: L(MWD,+) outscores L(MWD,—) by 13%.
Set Expansion
We expect that adding these types of data would significantly improve our system.
significant improvements is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Xiao, Tong and Zhu, Jingbo and Zhu, Muhua and Wang, Huizhen
Abstract
The experimental results on three NIST evaluation test sets show that our method leads to significant improvements in translation accuracy over the baseline systems.
Background
It also gives us a rational eXplanation for the significant improvements achieved by our method as shown in Section 5.3.
Introduction
Experimental results show that our method leads to significant improvements in translation accuracy over the baseline systems.
significant improvements is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Jiang, Wenbin and Sun, Meng and Lü, Yajuan and Yang, Yating and Liu, Qun
Abstract
With Chinese word segmentation as a case study, experiments show that the segmenter enhanced with the Chinese wikipedia achieves significant improvement on a series of testing sets from different domains, even with a single classifier and local features.
Conclusion and Future Work
Experiments on Chinese word segmentation show that, the enhanced word segmenter achieves significant improvement on testing sets of different domains, although using a single classifier with only local features.
Introduction
Experimental results show that, the knowledge implied in the natural annotations can significantly improve the performance of a baseline segmenter trained on CTB 5.0, an F-measure increment of 0.93 points on CTB test set, and an average increment of 1.53 points on 7 other domains.
significant improvements is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Feng, Vanessa Wei and Hirst, Graeme
Abstract
We significantly improve its tree-building step by incorporating our own rich linguistic features.
Conclusions
We chose the HILDA discourse parser (Hemault et al., 2010b) as the basis of our work, and significantly improved its tree-building step by incorporating our own rich linguistic features, together with features suggested by Lin et al.
Introduction
We significantly improve the performance of HILDA’s tree-building step (introduced in Section 5.1 below) by incorporating rich linguistic features (Section 5.3).
significant improvements is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
He, Wei and Wu, Hua and Wang, Haifeng and Liu, Ting
Abstract
The experimental results show that our proposed approach achieves significant improvements of l.6~3.6 points of BLEU in the oral domain and 0.5~l points in the news domain.
Experiments
Our system gains significant improvements of 1.6~3.6 points of BLEU in the oral domain, and 0.5~1 points of BLEU in the news domain.
Introduction
The experimental results show that our proposed approach achieves significant improvements of l.6~3.6 points of BLEU in the oral domain and 0.5~l points in the news domain.
significant improvements is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Li, Fangtao and Pan, Sinno Jialin and Jin, Ou and Yang, Qiang and Zhu, Xiaoyan
Introduction
Numbers in boldface denote significant improvement .
Introduction
Numbers in boldface denote significant improvement .
Introduction
Numbers in boldface denotes significant improvement .
significant improvements is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Li, Zhenghua and Liu, Ting and Che, Wanxiang
Conclusions
Extensive experiments show that our approach can effectively utilize the syntactic knowledge from another treebank and significantly improve the state—of—the—art parsing accuracy.
Experiments and Analysis
(2011) show that a joint POS tagging and dependency parsing model can significantly improve parsing accuracy over a pipeline model.
Related Work
Their experiments show that the combined treebank can significantly improve the performance of constituency parsers.
significant improvements is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Simianer, Patrick and Riezler, Stefan and Dyer, Chris
Abstract
We present eXperiments on learning on 1.5 million training sentences, and show significant improvements over tuning discriminative models on small development sets.
Experiments
However, scaling all features to the full training set shows significant improvements for algorithm 3, and especially for algorithm 4, which gains 0.8 BLEU points over tuning 12 features on the development set.
Experiments
Here tuning large feature sets on the respective dev sets yields significant improvements of around 2 BLEU points over tuning the 12 default features on the dev sets.
significant improvements is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Sun, Xu and Wang, Houfeng and Li, Wenjie
Introduction
We will show that this can significantly improve the convergence speed of online learning.
System Architecture
We found adding new edge features significantly improves the disambiguation power of our model.
System Architecture
has its own learning rate, and we will show that this can significantly improve the convergence speed of online learning.
significant improvements is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Xiong, Deyi and Zhang, Min and Li, Haizhou
Abstract
Experimental results demonstrate that the two models significantly improve translation accuracy.
Conclusions and Future Work
EXperimental results show that both models are able to significantly improve translation accuracy in terms of BLEU score.
Introduction
Experimental results on large-scale Chinese-to-English translation show that both models are able to obtain significant improvements over the baseline.
significant improvements is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Yang, Nan and Li, Mu and Zhang, Dongdong and Yu, Nenghai
Experiments
All settings significantly improve over the baseline at 95% confidence level.
Experiments
From Table 2, we can see our ranking reordering model significantly improves the performance for both English-to-Japanese and Japanese-to-English experiments over the BTG baseline system.
Introduction
We evaluated our approach on large-scale J apanese-English and English-Japanese machine translation tasks, and experimental results show that our approach can bring significant improvements to the baseline phrase-based SMT system in both pre-ordering and integrated decoding settings.
significant improvements is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Faruqui, Manaal and Dyer, Chris
Abstract
To evaluate our method, we use the word clusters in an NER system and demonstrate a statistically significant improvement in F1 score when using bilingual word clusters instead of monolingual clusters.
Experiments
For English and Turkish we observe a statistically significant improvement over the monolingual model (cf.
Experiments
English again has a statistically significant improvement over the baseline.
significant improvements is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Hagiwara, Masato and Sekine, Satoshi
Abstract
The experiments on Japanese and Chinese WS have shown that the proposed models achieve significant improvement over state-of-the-art, reducing 16% errors in Japanese.
Conclusion and Future Works
The experimental results show that the model achieves a significant improvement over the baseline and LM augmentation, achieving 16% WER reduction in the EC domain.
Introduction
The results show that we achieved a significant improvement in WS accuracy in both languages.
significant improvements is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: