Abstract | Our experimental results on standard TRE C collections show that using the word senses tagged by a supervised WSD system, we obtain significant improvements over a state-of-the-art IR system. |
Experiments | tistically significant improvements over Stemprf on TREC7, TREC8, and RBO3. |
Introduction | In the application of WSD to MT, research has shown that integrating WSD in appropriate ways significantly improves the performance of MT systems (Chan et al., 2007; Carpuat and Wu, 2007). |
Introduction | Our evaluation on standard TREC1 data sets shows that supervised WSD outperforms two other WSD baselines and significantly improves IR. |
Related Work | They obtained significant improvements by representing documents and queries with accurate senses as well as synsets (synonym sets). |
Related Work | Their evaluation on TREC collections achieved significant improvements over a standard term based vector space model. |
Abstract | We explain how to implement this extension efliciently for large-scale data (also released as a modification to GIZA++) and demonstrate, in experiments on Czech, Arabic, Chinese, and Urdu to English translation, significant improvements over IBM Model 4 in both word alignment (up to +6.7 F1) and translation quality (up to +1.4 B ). |
Conclusion | The method is implemented as a modification to the open-source toolkit GIZA++, and we have shown that it significantly improves translation quality across four different language pairs. |
Experiments | All of the tests showed significant improvements (1) < 0.01), ranging from +0.4 B to +1.4 B For Urdu, even though we didn’t have manual alignments to tune hyperparameters, we got significant gains over a good baseline. |
Introduction | In this paper, we propose a simple extension to the IBM/HMM models that is unsupervised like the IBM models, is as scalable as GIZA++ because it is implemented on top of GIZA++, and provides significant improvements in both alignment and translation quality. |
Introduction | Experiments on Czech-, Arabic-, Chinese- and Urdu-English translation (Section 3) demonstrate consistent significant improvements over IBM Model 4 in both word alignment (up to +6.7 F1) and translation quality (up to +1.4 B ). |
Conclusions and Future Work | We showed that MWE pre-grouping significantly improves compound recognition and unlabeled dependency annotation, which implies that this strategy could be useful for dependency parsing. |
Evaluation | Furthermore, pre-grouping has no statistically significant impact on the F-score14, whereas reranking leads to a statistically significant improvement (except for collocations). |
Introduction | Although experiments always relied on a corpus where the MWEs were perfectly pre-identified, they showed that pre-grouping such expressions could significantly improve parsing accuracy. |
Two strategies, two discriminative models | Chamiak and Johnson (2005) introduced different features that showed significant improvement in general parsing accuracy (e.g. |
Combining Both | The significant improvement of the POS tagging also help successive language processing. |
Conclusion | Both enhancements significantly improve the state-of-the-art of Chinese POS tagging. |
Introduction | Experiments show that this model are significantly improved by word cluster features in accuracy across a wide range of conditions. |
Introduction | We then present a comparative study of our tagger and the Berkeley parser, and show that the combination of the two models can significantly improve tagging accuracy. |
Abstract | We significantly improve its tree-building step by incorporating our own rich linguistic features. |
Conclusions | We chose the HILDA discourse parser (Hemault et al., 2010b) as the basis of our work, and significantly improved its tree-building step by incorporating our own rich linguistic features, together with features suggested by Lin et al. |
Introduction | We significantly improve the performance of HILDA’s tree-building step (introduced in Section 5.1 below) by incorporating rich linguistic features (Section 5.3). |
Abstract | The experimental results show that our proposed approach achieves significant improvements of l.6~3.6 points of BLEU in the oral domain and 0.5~l points in the news domain. |
Experiments | Our system gains significant improvements of 1.6~3.6 points of BLEU in the oral domain, and 0.5~1 points of BLEU in the news domain. |
Introduction | The experimental results show that our proposed approach achieves significant improvements of l.6~3.6 points of BLEU in the oral domain and 0.5~l points in the news domain. |
Introduction | Numbers in boldface denote significant improvement . |
Introduction | Numbers in boldface denote significant improvement . |
Introduction | Numbers in boldface denotes significant improvement . |
Conclusions | Extensive experiments show that our approach can effectively utilize the syntactic knowledge from another treebank and significantly improve the state—of—the—art parsing accuracy. |
Experiments and Analysis | (2011) show that a joint POS tagging and dependency parsing model can significantly improve parsing accuracy over a pipeline model. |
Related Work | Their experiments show that the combined treebank can significantly improve the performance of constituency parsers. |
Abstract | We present eXperiments on learning on 1.5 million training sentences, and show significant improvements over tuning discriminative models on small development sets. |
Experiments | However, scaling all features to the full training set shows significant improvements for algorithm 3, and especially for algorithm 4, which gains 0.8 BLEU points over tuning 12 features on the development set. |
Experiments | Here tuning large feature sets on the respective dev sets yields significant improvements of around 2 BLEU points over tuning the 12 default features on the dev sets. |
Introduction | We will show that this can significantly improve the convergence speed of online learning. |
System Architecture | We found adding new edge features significantly improves the disambiguation power of our model. |
System Architecture | has its own learning rate, and we will show that this can significantly improve the convergence speed of online learning. |
Abstract | Experimental results demonstrate that the two models significantly improve translation accuracy. |
Conclusions and Future Work | EXperimental results show that both models are able to significantly improve translation accuracy in terms of BLEU score. |
Introduction | Experimental results on large-scale Chinese-to-English translation show that both models are able to obtain significant improvements over the baseline. |
Experiments | All settings significantly improve over the baseline at 95% confidence level. |
Experiments | From Table 2, we can see our ranking reordering model significantly improves the performance for both English-to-Japanese and Japanese-to-English experiments over the BTG baseline system. |
Introduction | We evaluated our approach on large-scale J apanese-English and English-Japanese machine translation tasks, and experimental results show that our approach can bring significant improvements to the baseline phrase-based SMT system in both pre-ordering and integrated decoding settings. |