Abstract | Furthermore, our system improves significantly over a baseline system when applied to text from a different domain, and it reduces the sample complexity of sequence labeling. |
Experiments | As expected, the drop-off in the baseline system’s performance from all words to rare words is impressive for both tasks. |
Experiments | in F1 over the baseline system on all words, it in fact outperforms our baseline NP chunker on the WSJ data. |
Experiments | This chunker achieves 0.91 F1 on OANC data, and 0.93 F1 on WSJ data, outperforming the baseline system in both cases. |
Abstract | Experimental results on the NIST MT-2003 Chinese-English translation task show that our method statistically significantly outperforms the four baseline systems . |
Conclusion | Experimental results show that our model greatly outperforms the four baseline systems . |
Experiment | We use the first three syntax-based systems (TT2S, TTS2S, FT2S) and Moses (Koehn et al., 2007), the state-of-the-art phrase-based system, as our baseline systems . |
Experiment | 3) Our model statistically significantly outperforms all the baselines system . |
Conclusion | Since we used the latest release of FrameNet in order to use a greater number of hierarchical role-to-role relations, we could not make a direct comparison of performance with that of existing systems; however we may say that the 89.00% F1 micro-average of our baseline system is roughly comparable to the 88.93% value of Bejan and Hathaway (2007) for SemEval-2007 (Baker et al., 2007). |
Experiment and Discussion | The baseline system achieved 89.00% with respect to the micro-averaged F1. |
Experiment and Discussion | Table 6 reports the precision, recall, and micro-averaged F1 scores of semantic roles with respect to each coreness type.4 In general, semantic roles of the core coreness were easily identified by all of the grouping criteria; even the baseline system obtained an F1 score of 91.93. |
Coreference Subtask Analysis | 3.2 Baseline System Results |
Coreference Subtask Analysis | In all remaining experiments, we learn the threshold from the training set as in the BASELINE system . |
Coreference Subtask Analysis | Comparison to the BASELINE system (box 2) shows that using gold standard NEs leads to improvements on all data sets with the exception of ACE2 and ACEOS, on which performance is virtually unchanged. |