Conditional Random Fields | Based on a study of three NLP benchmarks, the authors of (Tsuruoka et al., 2009) claim this approach to be much faster than the orthant-wise approach and yet to yield very comparable performance, while selecting slightly larger feature sets . |
Conditional Random Fields | The n-grm feature sets (n = {1, 3, 5, 7}) includes all features testing embedded windows of k letters, for all 0 g k g n; the n-grm- setting is similar, but only includes the window of length n; in the n-grm+ setting, we add features for odd-size windows; in the n-grm++ setting, we add all sequences of letters up to size n occurring in current window. |
Conditional Random Fields | For instance, the active bigram features at position 75 = 2 in the sequence x=’lemma’ are as follows: the 3-grm feature set contains fywx, fyw/fl and fy/Mem; only the latter appears in the 3-grm- setting. |
Introduction | An important property of CRFs is their ability to handle large and redundant feature sets and to integrate structural dependency between output labels. |
Introduction | Limitating the feature set or the number of output labels is however frustrating for many NLP tasks, where the type and number of potentially relevant features are very large. |
Introduction | Second, the experimental demonstration that using large output label sets is doable and that very large feature sets actually help improve prediction accuracy. |
Introduction | In section 4 we motivate the choice of feature sets for the automatic identification of generic NPs in context. |
Introduction | 4.2 Feature set and feature classes |
Introduction | The feature set includes NP-local and global features. |
Results and discussion | For all four other questions, the best feature set is Continuity, which is a combination of summarization specific features, coreference features and cosine similarity of adjacent sentences. |
Results and discussion | Feature set Gram. |
Results and discussion | Feature set Gram. |
Empirical Analysis | Results on the Boundary Detection BD task are obtained by training an SVM model on the same feature set presented in (J ohansson and Nugues, 2008b) and are slightly below the state-of-the art BD accuracy reported in (Coppola et al., 2009). |
Empirical Analysis | Given the relatively simple feature set adopted here, this result is very significant as for its resulting efficiency. |
Introduction | Notice how this is also a general problem of statistical learning processes, as large fine grain feature sets are more exposed to the risks of overfitting. |
Related Work | While these approaches increase the expressive power of the models to capture more general linguistic properties, they rely on complex feature sets , are more demanding about the amount of training information and increase the overall exposure to overfitting effects. |
Related Work | But, importantly, our classifiers all use the same feature set so they do not represent independent views of the data. |
Related Work | The feature set for these classifiers is exactly the same as described in Section 3.2, except that we add a new lexical feature that represents the head noun of the target NP (i.e., the NP that needs to be tagged). |
Related Work | 3But technically this is not co-training because our feature sets are all the same. |