Approaches | We create binary indicator features for each model using feature templates . |
Approaches | Our feature template definitions build from those used by the top performing systems in the CoNLL—2009 Shared Task, Zhao et al. |
Approaches | Template Creation Feature templates are defined over triples of (property, positions, order). |
Experiments | 4.2 Feature Template Sets |
Experiments | Further, the best-performing low-resource features found in this work are those based on coarse feature templates and selected by information gain. |
Experiments | #FT indicates the number of feature templates used (unigrams+bigrams). |
Related Work | (2009) features, who use feature templates from combinations of word properties, syntactic positions including head and children, and semantic properties; and features from Bj orkelund et a1. |
Experiment | were the ones that had been counted”, using the feature templates in Table l, at least four times for all of the (i, j) position pairs in the training sentences. |
Experiment | We conjoined the features with three types of label pairs (C, I), (LN), or (C,N> as instances of the feature template (1,, [3-) to produce features for SEQUENCE. |
Experiment | We used the following feature templates to produce features for the outbound model: <8i—2>a <8i—1>a <80, <8i+1>a <8i+2>a <72), 02—1772), (25,-, n+1), and (3,, ti). |
Proposed Method | Table 1: Feature templates . |
Proposed Method | Table 1 shows the feature templates used to produce the features. |
Proposed Method | A feature is an instance of a feature template . |
Abstract | Moreover, when deterministic constraints have applied to contextual words of wo, it is also possible to include some lookahead feature templates , such as: |
Abstract | Character-based feature templates |
Abstract | We adopt the ’non—leXical-target’ feature templates in (Jiang et al., 2008a). |
Conclusion | We build up a small set of feature templates as part of a discriminative constituency parser and outperform the Berkeley parser on a wide range of languages. |
Features | Subsequent lines in Table 1 indicate additional surface feature templates computed over the span, which are then conjoined with the rule identity as shown in Figure l to give additional features. |
Features | Note that many of these features have been used before (Taskar et al., 2004; Finkel et al., 2008; Petrov and Klein, 2008b); our goal here is not to amass as many feature templates as possible, but rather to examine the extent to which a simple set of features can replace a complicated state space. |
Features | Because heads of constituents are often at the beginning or the end of a span, these feature templates can (noisily) capture monolexical properties of heads without having to incur the inferential cost of lexicalized annotations. |
Sentiment Analysis | We exploit this by adding an additional feature template similar to our span shape feature from Section 4.4 which uses the (deterministic) tag for each word as its descriptor. |
Surface Feature Framework | To improve the performance of our X-bar grammar, we will add a number of surface feature templates derived only from the words in the sentence. |
Features | In this section, we first introduce how different types of feature templates are designed, and then show an example of how the features help transfer the syntactic structure information. |
Features | Note that the same feature templates are used for all the target grammar formalisms. |
Features | We define the following feature templates : fbmary for binary derivations, funary for unary derivations, and from; for the root nodes. |
Abbreviator with Nonlocal Information | Feature templates #4 to #7 in Table l are used for Chinese abbreviations. |
Abbreviator with Nonlocal Information | Feature templates #8—#11 are designed for English abbreviations. |
Abbreviator with Nonlocal Information | Since the number of letters in Chinese (more than 10K characters) is much larger than the number of letters in English (26 letters), in order to avoid a possible overfitting problem, we did not apply these feature templates to Chinese abbreviations. |
Experiments | We employ the feature templates defined in Section 2.3, taking into account these 81,827 features for the Chinese abbreviation generation task, and the 50,149 features for the English abbreviation generation task. |
Recognition as a Generation Task | In implementing the recognizer, we simply use the model from the abbreviation generator, with the same feature templates (31,868 features) and training method; the major difference is in the restriction (according to the PE) of the decoding stage and penalizing the probability values of the NULL labelings7. |
System Architecture | The word-based feature templates derived for the label y, are as follows: |
System Architecture | For each label y, we use the feature templates as follows: |
System Architecture | The latter two feature templates are designed to detect character or word reduplication, a morphological phenomenon that can influence word segmentation in Chinese. |
Background | the baseline feature templates of joint S&T are the ones used in (Ng and Low, 2004; Jiang et al., 2008), as shown in Table l. A = {A1A2...AK} E RK are the weight parameters to be learned. |
Background | Table l: The feature templates of joint S&T. |
Method | The feature templates are from Zhao et al. |
Method | The same feature templates in (Wang et al., 2011) are used, i.e., "+n-gram+cluster+leXicon". |
Method | The feature templates introduced in Section 3.1 are used. |
Related Work | But overall, our approach differs in three important aspects: first, novel feature templates are defined for measuring the similarity between vertices. |
Experimental Setup | Table 1: PCS tag feature templates . |
Features | First- to Third-Order Features The feature templates of first— to third-order features are mainly drawn from previous work on graph-based parsing (McDonald and Pereira, 2006), transition-based parsing (Nivre et al., 2006) and dual decomposition-based parsing (Martins et al., 2011). |
Features | The feature templates are inspired by previous feature-rich POS tagging work (Toutanova et al., 2003). |
Features | In our work we use feature templates up to 5-gram. |
Experimental Setup | Features For the arc feature vector gbhflm, we use the same set of feature templates as MST V0.5 .1. |
Experimental Setup | For head/modifier vector gbh and gbm, we show the complete set of feature templates used by our model in Table 1. |
Experimental Setup | Finally, we use a similar set of feature templates as Turbo V2.1 for 3rd order parsing. |
Introduction | A predominant way to counter the high dimensionality of features is to manually design or select a meaningful set of feature templates , which are used to generate different types of features (McDonald et al., 2005a; Koo and Collins, 2010; Martins et al., 2013). |
Problem Formulation | Table 1: Word feature templates used by our model. |
Web-Derived Selectional Preference Features | N-gram feature templates hw, mw, PMI(hw,mw) hw, ht, mw, PMI(hw,mw) hw, mw, mt, PMI(hw,mw) hw, ht, mw, mt, PMI(hw,mw) |
Web-Derived Selectional Preference Features | Table 2: Examples of N-gram feature templates . |
Web-Derived Selectional Preference Features | 3.3 N-gram feature templates |
Joint POS Tagging and Parsing with Nonlocal Features | However, some feature templates in Table 1 become unavailable, because POS tags for the lookahead words are not specified yet under the joint framework. |
Joint POS Tagging and Parsing with Nonlocal Features | However, all the feature templates given in Table l are just some simple structural features. |
Transition-based Constituent Parsing | Type Feature Templates |
Transition-based Constituent Parsing | Table 1 lists the feature templates used in our baseline parser, which is adopted from Zhang and Clark (2009). |
Experimental Assessment | For the arc-eager parser, we use the feature template of Zhang and Nivre (2011). |
Experimental Assessment | It turns out that our feature template , described in §4.3, is the exact merge of the templates used for the arc-eager and the arc-standard parsers. |
Model and Training | These features are then combined together into complex features, according to some feature template , and joined with the available transition types. |
Model and Training | Our feature template is an extended version of the feature template of Zhang and Nivre (2011), originally developed for the arc-eager model. |
Experiments | We used 5-fold cross-validation performed using the training data to tweak the included feature templates and optimize training parameters. |
Experiments | The following feature templates are used to generate features from the above words. |
Experiments | To evaluate the importance of the different types of features, the same experiment was rerun multiple times, each time including or excluding exactly one feature template . |
Character-based Chinese Parsing | Table 1 shows the feature templates of our model. |
Character-based Chinese Parsing | The feature templates in bold are novel, are designed to encode head character information. |
Experiments | We find that the parsing accuracy decreases about 0.6% when the head character related features (the bold feature templates in Table l) are removed, which demonstrates the usefulness of these features. |
Experiments | Table 3 is the feature template we set initially which generates 722 999 637 features. |
Experiments | Feature Templates |
Experiments | Table 3: feature templates for CRFs training |
Additional Experiments | The sparse feature templates resulted here in a total of 4.9 million possible features, of which again only a fraction were active, as shown in Table 2. |
Experiments | Table 2: Active sparse feature templates |
Experiments | These feature templates resulted in a total of 3.4 million possible features, of which only a fraction were active for the respective tuning set and optimizer, as shown in Table 2. |
Discussion on Related Work | We need to redesign our ranking feature templates to encode the reordering information in the source part of the translation rules. |
Ranking Model Training | The detailed feature templates are shown in Table l. |
Ranking Model Training | Table 1: Feature templates for ranking function. |
MWE-dedicated Features | In order to make these models comparable, we use two comparable sets of feature templates : one adapted to sequence labelling (CRF—based MWER) and the other one adapted to reranking (MaXEnt-based reranker). |
MWE-dedicated Features | All feature templates are given in table 2. |
MWE-dedicated Features | Table 2: Feature templates (f) used both in the MWER and the reranker models: n is the current position in the sentence, is the word at position i; is the part-of-speech tag of w(z’); if the word at absolute position i is part of a compound in the Shortest Path Segmentation, mwt(i) and mws(i) are respectively the part-of-speech tag and the internal structure of the compound, mwpos(i) indicates its relative position in the compound (B or I). |
Decoding | Table 1: DLM-based feature templates |
Parsing with dependency language model | 3.3 DLM-based feature templates |
Parsing with dependency language model | The feature templates are outlined in Table l, where TYPE refers to one of the typeszPL or PR, h_pos refers to the part-of-speech tag of :1: h, h_word refers to the lexical form of :1: h, ch_pos refers to the part-of-speech tag of mch, and ch_word refers to the lexical form of mm. |
Character-Level Dependency Tree | Feature templates |
Character-Level Dependency Tree | Table 1: Feature templates encoding intra-word dependencies. |
Character-Level Dependency Tree | It adjusts the weights of segmentation and POS-tagging features, because the number of feature templates is much less for the two tasks than for parsing. |