Results | 5.1 Effect of Feature Set Choice |
Results | Table 3 illustrates the result of taking a baseline feature set (containing word as the only feature) and adding a single feature from the Simple set to it. |
Results | Feature Set F-score %Imp word 43.85 —word+nw 43.86 N0 word+na 44.78 2.1 word+lem 45.85 4.6 word+pos 45.91 4.7 word+nw+pos+lem+na 46.34 5.7 |
Abstract | Previous work in traditional text classification and its variants — such as sentiment analysis — has achieved successful results by using the bag-of-words representation; that is, by treating text as a collection of words with no interdependencies, training a classifier on a large feature set of word unigrams which appear in the corpus. |
Abstract | To illustrate, consider the following feature set , a bigram and a trigram (each term in the n-gram either has the form word or Atag): |
Abstract | While the feature set was too small to produce notable results, we identified which features actually were indicative of lect. |
Experiments | Second, note that the parsers incorporating the N-gram feature sets consistently outperform the models using the baseline features in all test data sets, regardless of model order or label usage. |
Web-Derived Selectional Preference Features | In this paper, we employ two different feature sets: a baseline feature set3 which draw upon “normal” information source, such as word forms and part-of-speech (POS) without including the web-derived selectional preference4 features, a feature set conjoins the baseline features and the web-derived selectional preference features. |
Web-Derived Selectional Preference Features | 3 This kind of feature sets are similar to other feature sets in the literature (McDonald et al., 2005; Carreras, 2007), so we will not attempt to give a exhaustive description. |
Web-Derived Selectional Preference Features | For example, the baseline feature set includes indicators for word-to-word and tag-to-tag interactions between the head and modifier of a dependency. |
Approach | 3.2 Feature set |
Approach | Our full feature set is as follows: |
Conclusions and future work | The addition of an incoherence metric to the feature set of an AA system has been shown to improve performance significantly (Miltsakaki and Kukich, 2000; Miltsakaki and Kukich, 2004). |
Validity tests | Although the above modifications do not exhaust the potential challenges a deployed AA system might face, they represent a threat to the validity of our system since we are using a highly related feature set . |
Learning Algorithm | The held-out portion is used to tune the feature set and results are reported for the test split only, i.e., using unseen instances. |
Learning Algorithm | We improve BASIC with an extended feature set which targets especially A1 and the verb (Table 5). |
Negation in Natural Language | The main contributions are: (l) interpretation of negation using focus detection; (2) focus of negation annotation over all PropBank negated sen-tencesl; (3) feature set to detect the focus of negation; and (4) model to semantically represent negation and reveal its underlying positive meaning. |
Automated Approaches to Deceptive Opinion Spam Detection | Specifically, we consider the following three n-gram feature sets, with the corresponding features lowercased and unstemmed: UNIGRAMS, BIGRAMS+, TRIGRAMS+, where the superscript + indicates that the feature set subsumes the preceding feature set . |
Automated Approaches to Deceptive Opinion Spam Detection | We consider all three n-gram feature sets , namely UNIGRAMS, BIGRAMS+, and TRIGRAMS+, with corresponding language models smoothed using the interpolated Kneser-Ney method (Chen and Goodman, 1996). |
Automated Approaches to Deceptive Opinion Spam Detection | We use SVMlight (Joachims, 1999) to train our linear SVM models on all three approaches and feature sets described above, namely POS, LIWC, UNIGRAMS, BIGRAMS+, and TRIGRAMS+. |