Abstract | Experimental results on the NIST MT-2003 Chinese-English translation task show that our method statistically significantly outperforms the four baseline systems. |
Experiment | 1) FTS2S significantly outperforms (p<0.05) FT2S. |
Experiment | 3) Our model statistically significantly outperforms all the baselines system. |
Experiment | 4) All the four syntax-based systems show better performance than Moses and three of them significantly outperforms (p<0.05) Moses. |
Introduction | Experimental results show that our method significantly outperforms the two individual methods and other baseline methods. |
Abstract | The experimental results demonstrate that our model is able to significantly outperform the state-of-the-art coherence model by Barzilay and Lapata (2005), reducing the error rate of the previous approach by an average of 29% over three data sets against human upper bounds. |
Analysis and Discussion | From the curves, our model consistently performs better than the baseline with a significant gap, and the combined model also consistently and significantly outperforms the other two. |
Conclusion | When applied to distinguish a source text from a sentence-reordered permutation, our model significantly outperforms the previous state-of-the-art, |
Experiments | Double (**) and single (*) asterisks indicate that the respective model significantly outperforms the baseline at p < 0.01 and p < 0.05, respectively. |
Experiments | Comparing these accuracies to the baseline, our model significantly outperforms the baseline with p < 0.01 in the WSJ and Earthquakes data sets with accuracy increments of 2.35% and 2.91%, respectively. |
Experiments | The combined model in all three data sets gives the highest performance in comparison to all single models, and it significantly outperforms the baseline model with p < 0.01. |
Abstract | In experiments, we show that our model significantly outperforms strong linear and nonlinear discriminative baselines on three datasets under various settings. |
Conclusion | Focusing on the three financial crisis related datasets, the proposed model significantly outperform the standard linear regression method in statistics and strong discriminative support vector regression baselines. |
Introduction | By varying different experimental settings on three datasets concerning different periods of the Great Recession from 2006-2013, we empirically show that our approach significantly outperforms the baselines by a wide margin. |
Introduction | 0 Our results significantly outperform standard linear regression and strong SVM baselines. |
Abstract | Our empirical results have shown the probabilistic labeling approach significantly outperforms a previous graph-matching approach for referential grounding. |
Evaluation and Discussion | significantly outperforms state-space search (S.S.S. |
Evaluation and Discussion | Although probabilistic labeling significantly outperforms the state-space search, the grounding performance is still rather poor (less than 50%) |
Introduction | Our empirical results have shown that the probabilistic labeling approach significantly outperforms the state-space search approach in both grounding accuracy and efficiency. |
Abstract | Experimental results on benchmark data show that our method significantly outperforms the baseline supervised parser and other entire-tree based semi-supervised methods, such as self-training, co-training and tri-training. |
Experiments and Analysis | Using unlabeled data with the results of ZPar (“Unlabeled <— Z”) significantly outperforms the baseline GParser by 0.30% (93.15-82.85) on English. |
Experiments and Analysis | However, we find that although the parser significantly outperforms the supervised GParser on English, it does not gain significant improvement over co-training with ZPar (“Unlabeled <— Z”) on both English and Chinese. |
Experiments and Analysis | (2012) and Bohnet and Nivre (2012) use joint models for POS tagging and dependency parsing, significantly outperforming their pipeline counterparts. |
Abstract | Our results show that RL significantly outperforms Supervised Learning when interacting in simulation as well as for interactions with real users. |
Conclusion | Our results show that RL significantly outperforms SL in simulation as well as in interactions with real users. |
Simulated Learning Environment | For learning presentation modality, both classifiers significantly outperform the baseline. |
Simulated Learning Environment | The results show that simulation-based RL with an environment bootstrapped from WOZ data allows learning of robust strategies which significantly outperform the strategies contained in the initial data set. |
Experimental setup | fulladd and lexfunc significantly outperform stem also in the HR subset (p<.001). |
Experimental setup | However, the stemploitation hypothesis is dispelled by the observation that both models significantly outperform the stem baseline (p<.001), despite the fact that the latter, again, has good performance, significantly outperforming the corpus-derived vectors (p < .001). |
Experimental setup | Indeed, if we focus on the third row of Table 5, reporting performance on low stemderived relatedness (LR) items (annotated as described in Section 4.1), fulladd and wadd still significantly outperform the corpus representations (p< .001), whereas the quality of the stem representations of LR items is not significantly different form that of the corpus-derived ones. |
Experimental Results | On head queries, the addition of the empty context parameter 0 and click signal (2 together (Model M1) significantly outperforms both the baseline and the state-of-the-art model Guo’ 09. |
Experimental Results | We observe a different behavior on tail queries where all models significantly outperform the baseline BFB, but are not significantly different from each other. |
Introduction | We show that jointly modeling user intent and entity type significantly outperforms the current state of the art on the task of entity type resolution in queries. |
Experiments | that PR significantly outperforms all other baselines in both the CR dataset and the MD dataset (average accuracy across domains is reported). |
Experiments | In contrast, both PR1“ and PR significantly outperform CRF, which implies that incorporating lexical and discourse constraints as posterior constraints is much more effective. |
Experiments | We can see that both PR and Ple significantly outperform all other baselines in all domains. |
Experiments and Results | We can see that the Unbalanced approach significantly outperforms the baseline (Balanced). |
Experiments and Results | We can also notice that the prediction model applied to the balanced corpus (Balanced + Prediction) slightly outperforms the baseline while the Unbalanced + Prediction approach significantly outperforms the three other approaches (moreover the variation observed with the Unbalanced approach are lower than the Unbalanced —|— Prediction approach). |
Experiments and Results | As for the previous experiment, we can see that the Unbalanced approach significantly outperforms the Balanced approach. |
Abstract | Automatic and manual evaluation results over meeting, chat and email conversations show that our approach significantly outperforms baselines and previous extractive models. |
Experimental Setup | Results indicate that our system significantly outperforms baselines in overall quality and responsiveness, for both meeting and email datasets. |
Introduction | Automatic evaluation on the chat dataset and manual evaluation over the meetings and emails show that our system uniformly and statistically significantly outperforms baseline systems, as well as a state-of-the-art query-based extractive summarization system. |
Abstract | Experiments on Automatic Content Extraction (ACE)1 corpora demonstrate that our joint model significantly outperforms a strong pipelined baseline, which attains better performance than the best-reported end-to-end system. |
Conclusions and Future Work | Experiments demonstrated our approach significantly outperformed pipelined approaches for both tasks and dramatically advanced state-of-the-art. |
Experiments | We can see that our approach significantly outperforms the pipelined approach for both tasks. |
Abstract | In both tasks, our method significantly outperforms competitive baselines and returns results that are statistically comparable to current state-of-the-art methods, while requiring no task-specific customisations. |
Experiments and Results | Except for the DE setting in which Proposed method significantly outperforms both SFA and SCL, the performance of the Proposed method is not statistically significantly different to that of SFA or SCL. |
Introduction | Without requiring any task specific customisations, systems based on our distribution prediction method significantly outperform competitive baselines in both tasks. |
Conclusions and Future Work | Experiments conducted on a real CQA data show some promising findings: (1) the proposed method significantly outperforms the previous work for question retrieval; (2) the proposed matrix factorization can significantly improve the performance of question retrieval, no matter whether considering the translation languages or not; (3) considering more languages can further improve the performance but it does not seem to produce significantly better performance; (4) different languages contribute unevenly for question retrieval; (5) our proposed method can be easily adapted to the large-scale information retrieval task. |
Experiments | (l) Monolingual translation models significantly outperform the VSM and LM (row 1 and |
Experiments | (3) Our proposed method (leveraging statistical machine translation via matrix factorization, SMT + MF) significantly outperforms the bilingual translation model of Zhou et al. |
Introduction | Automatic evaluation (using ROUGE (Lin and Hovy, 2003) and BLEU (Papineni et al., 2002)) against manually generated focused summaries shows that our sum-marizers uniformly and statistically significantly outperform two baseline systems as well as a state-of-the-art supervised extraction-based system. |
Introduction | The resulting systems yield results comparable to those from the same system trained on in-domain data, and statistically significantly outperform supervised extractive summarization approaches trained on in-domain data. |
Results | In most experiments, it also significantly outperforms the baselines and the extract-based approaches (p < 0.05). |
Introduction | 0 We have conducted in depth experimental evaluation and showed that the developed methods significantly outperform baseline methods. |
Task A: Polarity Classification | As we can see from Figure 5 that all classifiers significantly outperform the majority base- |
Task A: Polarity Classification | The learned lessons from this study are: (1) for n-gram usage, the larger the context of the metaphor, the better the classification accuracy becomes; (2) if present source and target information can further boost the performance of the classifiers; (3) LIWC is a useful resource for polarity identification in metaphor-rich texts; (4) analyzing the usages of tense like past vs. present and pronouns are important triggers for positive and negative polarity of metaphors; (5) some categories like family, social presence indicate positive polarity, while others like inhibition, anger and swear words are indicative of negative affect; (6) the built models significantly outperform majority baselines. |
Abstract | We show empirically that TESLA—CELAB significantly outperforms character-level BLEU in the English—Chinese translation evaluation tasks. |
Conclusion | We show empirically that TESLA-CELAB significantly outperforms the strong baseline of character-level BLEU in two well known English-Chinese MT evaluation data sets. |
Experiments | The results indicate that TESLA-CELAB significantly outperforms BLEU. |
Abstract | Experiments on Chinese—English translation on four NIST MT test sets show that the HD—HPB model significantly outperforms Chiang’s model with average gains of 1.91 points absolute in BLEU. |
Experiments | Table 3 shows that our HD-HPB model significantly outperforms Chiang’s HPB model with an average improvement of 1.91 in BLEU (and similar improvements over Moses HPB). |
Introduction | Experiments on Chinese-English translation using four NIST MT test sets show that our HD-HPB model significantly outperforms Chiang’s HPB as well as a SAMT—style refined version of HPB. |
Abstract | Additionally, we show that a high—level planner utilizing these extracted relations significantly outperforms a strong, text unaware baseline — successfully completing 80% of planning tasks as compared to 69% for the baseline.1 |
Conclusions | We show that building high-level plans in this manner significantly outperforms traditional techniques in terms of task completion. |
Introduction | Our results show that our text-driven high-level planner significantly outperforms all baselines in terms of completed planning tasks — it successfully solves 80% as compared to 41% for the Metric-FF planner and 69% for the text unaware variant of our model. |
Abstract | Adaptation experiments on LETOR3.0 data set demonstrate that query weighting significantly outperforms document instance weighting methods. |
Conclusion | We evaluated our approaches on LETOR3.0 dataset for ranking adaptation and found that: (l) the first method efficiently estimate query weights, and can outperform the document instance weighting but some information is lost during the aggregation; (2) the second method consistently and significantly outperforms document instance weighting. |
Introduction | wise approach significantly outperformed pointwise approach, which takes each document instance as independent learning object, as well as pairwise approach, which concentrates learning on the order of a pair of documents (Liu, 2009). |
Abstract | Experiments show that, the classifier trained on the projected classification instances significantly outperforms previous projected dependency parsers. |
Experiments | The projected classifier significantly outperforms previous works on both test sets, which demonstrates that the word-pair classification model, although falling behind of the state-of-the-art on human-annotated treebanks, performs well in projected dependency parsing. |
Introduction | Experimental results show that, the classifier trained on the projected classification instances significantly outperforms the projected dependency parsers in previous works. |
Conclusion | EXperiments show that our model achieves substantial improvements over baseline and significantly outperforms (Marton and Resnik, 2008)’s XP+. |
Experiments | The binary SDB (BiSDB) model statistically significantly outperforms Marton and Resnik’s XP+ by an absolute improvement of 0.59 (relatively 2%). |
Introduction | Our experimental results display that our SDB model achieves a substantial improvement over the baseline and significantly outperforms XP+ according to the BLEU metric (Papineni et al., 2002). |
Experiments | As shown in the table, the multi-parameter model improves by approximately 18% and 12% on the TREC-6 and 7 partial query sets, and it also significantly outperforms both the word model and the one-parameter model on the TREC-8 query set. |
Experiments | For both opposite cases, the multi—parameter model significantly outperforms one—parameter model. |
Experiments | Note that the multi—parameter model significantly outperforms the one—parameter model and all manually—set As for the queries ‘declining birth rate’ and ‘Amazon rain forest’, which also has one effective phrase, ‘rain forest’, and one noneffective phrase, ‘Amazon forest’. |
Abstract | The evaluation results show that: (l) The pivot approach is effective in extracting paraphrase patterns, which significantly outperforms the conventional method DIRT. |
Conclusion | In addition, the log-linear model with the proposed feature functions significantly outperforms the conventional models. |
Introduction | Our experiments show that the pivot approach significantly outperforms conventional methods. |
Abstract | Experimental results on the NIST MT-2005 Chinese-English translation task show that our method statistically significantly outperforms the baseline systems. |
Experiments | 1) Our tree sequence-based model significantly outperforms (p < 0.01) previous phrase-based and linguistically syntax-based methods. |
Introduction | Experiment results on the NIST MT-2005 Chinese-English translation task show that our method significantly outperforms Moses (Koehn et al., 2007), a state-of-the-art phrase-based SMT system, and other linguistically syntax-based methods, such as SCFG-based and STSG-based methods (Zhang et al., 2007). |