Experimental Setup | Instead, we use several baselines to demonstrate the usefulness of integrating multiple LCs, as well as the relative usefulness of our feature sets . |
Experimental Setup | The other evaluated systems are formed by taking various subsets of our feature set . |
Experimental Setup | We experiment with 4 feature sets . |
Introduction | (2012) and compare our methods with analogous ones that select a fixed LC, using state-of-the-art feature sets . |
Our Proposal: A Latent LC Approach | Section 3.1 describes our general approach, Section 3.2 presents our model and Section 3.3 details the feature set . |
Our Proposal: A Latent LC Approach | We choose this model for its generality, conceptual simplicity, and because it allows to easily incorporate various feature sets and sets of latent variables. |
Our Proposal: A Latent LC Approach | 3.3 Feature Set |
Experimental Setup | We experiment with two feature sets for each language: the optimized local feature sets (denoted local), and the optimized local feature sets extended with nonlocal features (denoted nonlocal). |
Features | The feature sets are customized for each language. |
Features | The exact definitions and feature sets that we use are available as part of the download package of our system. |
Features | nonlocal features were selected with the same greedy forward strategy as the local features, starting from the optimized local feature sets . |
Introducing Nonlocal Features | In other words, it is unlikely that we can devise a feature set that is informative enough to allow the weight vector to converge towards a solution that lets the learning algorithm see the entire documents during training, at least in the situation when no external knowledge sources are used. |
Introduction | We show that for the task of coreference resolution the straightforward combination of beam search and early update (Collins and Roark, 2004) falls short of more limited feature sets that allow for exact search. |
Results | the English development set as a function of number of training iterations with two different beam sizes, 20 and 100, over the local and nonlocal feature sets . |
Results | The left half uses the local feature set, and the right the extended nonlocal feature set . |
Results | Local vs. Nonlocal feature sets . |
Discussion | In this section, we analyze the influences of employed feature sets and constraint conditions on the performances. |
Discussion | Because features may interact mutually in an indirect way, even with the same feature set , different constraint conditions can have significant influences on the final performance. |
Discussion | In Section 3, we introduced five candidate feature sets . |
Feature Construction | 3.1 Candidate Feature Set |
Feature Construction | To sum up, among the five candidate feature sets , the position feature is used as a singleton feature. |
Feature Construction | In the following experiments, focusing on Chinese relation extraction, we will analyze the performance of candidate feature sets and study the influence of the constraint conditions. |
Abstract | This regularizes the model complexity and makes the tensor model highly effective in situations where a large feature set is defined but very limited resources are available for training. |
Conclusion and Future Work | This can be regarded as a form of model regular-ization.Therefore, compared with the traditional vector-space models, learning in the tensor space is very effective when a large feature set is defined, but only small amount of training data is available. |
Introduction | This also makes training the model parameters a challenging problem, since the amount of labeled training data is usually small compared to the size of feature sets : the feature weights cannot be estimated reliably. |
Introduction | Such models require learning individual feature weights directly, so that the number of parameters to be estimated is identical to the size of the feature set . |
Tensor Model Construction | ways of mapping, which is an intractable number of possibilities even for modest sized feature sets , making it impractical to carry out a brute force search. |
Tensor Space Representation | In general, if V features are defined for a learning problem, and we (i) organize the feature set as a tensor (I) E Rnlxn2x"'nD and (ii) use H component rank-l tensors to approximate the corresponding target weight tensor. |
Tensor Space Representation | Specifically, a vector space model assumes each feature weight to be a “free” parameter, and estimating them reliably could therefore be hard when training data are not sufficient or the feature set is huge. |
Experiments | Our primary feature set IGC consists of 127 template unigrams that emphasize coarse properties (i.e., properties 7, 9, and 11 in Table 1). |
Experiments | We compare against the language-specific feature sets detailed in the literature on high-resource top-performing SRL systems: From Bj o'rkelund et al. |
Experiments | (2009), these are feature sets for German, English, Spanish and Chinese, obtained by weeks of forward selection (Bdeflmemh); and from Zhao et al. |
Experiments | We evaluate word cluster and embedding (denoted by ED) features by adding them individually as well as simultaneously into the baseline feature set . |
Experiments | This might be explained by the difference between our baseline feature set and the feature set underlying their kernel-based system. |
Feature Set | 5.1 Baseline Feature Set |
Feature Set | (2011) utilize the full feature set from (Zhou et al., 2005) plus some additional features and achieve the state-of-the-art feature-based RE system. |
Feature Set | Unfortunately, this feature set includes the human-annotated (gold-standard) information on entity and mention types which is often missing or noisy in reality (Plank and Moschitti, 2013). |
Introduction | Recent research in this area, whether feature-based (Kambhatla, 2004; Boschee et al., 2005; Zhou et al., 2005; Grishman et al., 2005; Jiang and Zhai, 2007a; Chan and Roth, 2010; Sun et al., 2011) or kernel-based (Zelenko et al., 2003; Bunescu and Mooney, 2005a; Bunescu and Mooney, 2005b; Zhang et al., 2006; Qian et al., 2008; Nguyen et al., 2009), attempts to improve the RE performance by enriching the feature sets from multiple sentence analyses and knowledge resources. |
Conclusion | Using this feature set , we obtain an accuracy of 73.0% on a blind test. |
Introduction | This best-performing system uses our new feature set . |
Predicting Direction of Power | We use another feature set LEX to capture word ngrams, POS (part of speech) ngrams and mixed ngrams. |
Predicting Direction of Power | We also performed an ablation study to understand the importance of different slices of our feature sets . |
Structural Analysis | THRPR: This feature set includes two meta- |
Structural Analysis | data based feature sets — positional and verbosity. |
Abstract | Experiments on applying SSWE to a benchmark Twitter sentiment classification dataset in SemEval 2013 show that (1) the SSWE feature performs comparably with handcrafted features in the top-performed system; (2) the performance is further improved by concatenating SSWE with existing feature set . |
Introduction | After concatenating the SSWE feature with existing feature set , we push the state-of-the-art to 86.58% in macro-Fl. |
Related Work | NRC-ngram refers to the feature set of NRC leaving out ngram features. |
Related Work | After concatenating SSWEU with the feature set of NRC, the performance is further improved to 86.58%. |
Related Work | The concatenated features SSWEu +NRC-ngram (86.48%) outperform the original feature set of NRC (84.73%). |
Arabic Word Segmentation Model | This feature set also allows the model to take into account other interactions between the beginning and end of a word, particularly those involving the definite article Jl al-. |
Arabic Word Segmentation Model | A notable property of this feature set is that it remains highly dialect-agnostic, even though our additional features were chosen in response to errors made on text in Egyptian dialect. |
Error Analysis | 0 errors that can be fixed with a fuller analysis of just the problematic token, and therefore represent a deficiency in the feature set ; and |
Error Analysis | In 36 of the 100 sampled errors, we conjecture that the presence of the error indicates a shortcoming of the feature set , resulting in segmentations that make sense locally but are not plausible given the full token. |
Experiments | Experimental results are given in Table 2, where we also provide the number of features in each feature set . |
Experiments | Figure 1: ROC curves for classifiers trained using different feature sets (English SVO and AN test sets). |
Experiments | According to ROC plots in Figure 1, all three feature sets are effective, both for SVO and for AN tasks. |
Related Work | Current work builds on this study, and incorporates new syntactic relations as metaphor candidates, adds several new feature sets and different, more reliable datasets for evaluating results. |
Annotations | Table 2 shows the performance of our feature set in grammars with several different levels of structural annotation.3 Klein and Manning (2003) find large gains (6% absolute improvement, 20% relative improvement) going from v = 0, h = 0 to v = l, h = 1; however, we do not find the same level of benefit. |
Features | Table 1 shows the results of incrementally building up our feature set on the Penn Treebank development set. |
Introduction | Our parser can be easily adapted to this task by replacing the X-bar grammar over treebank symbols with a grammar over the sentiment values to encode the output variables and then adding n-gram indicators to our feature set to capture the bulk of the lexical effects. |
Evaluation framework | 4.2 Performance indicator and feature set |
Evaluation framework | As our focus is on the algorithmic aspect, in all experiments we use the same feature set , which consists of the seventeen features proposed in (Specia et al., 2009). |
Evaluation framework | This feature set , fully described in (Callison-Burch et al., 2012), takes into account the complexity of the source sentence (e. g. number of tokens, number of translations per source word) and the fluency of the target translation (e. g. language model probabilities). |
Copula Models for Text Regression | This nice property essentially allows us to fuse distinctive lexical, syntactic, and semantic feature sets naturally into a single compact model. |
Experiments | Feature sets: |
Experiments | To do this, we sample equal amount of features from each feature set , and concatenate |
Abstract | Third, to enhance the power of parsing models, we enlarge the feature set with nonlocal features and semi-supervised word cluster features. |
Experiment | We built three new parsing systems based on the StateAlign system: Nonlocal system extends the feature set of StateAlign system with nonlocal features, Cluster system extends the feature set with semi-supervised word cluster features, and Nonlocal & Cluster system extend the feature set with both groups of features. |
Related Work | Finally, we enhanced our parsing model by enlarging the feature set with nonlocal features and semi-supervised word cluster features. |