Tri-Training for Authorship Attribution with Limited Training Data
Qian, Tieyun and Liu, Bing and Chen, Li and Peng, Zhiyong

Article Structure

Introduction

Existing approaches to authorship attribution (AA) are mainly based on supervised classification (Stamatatos, 2009, Kim et al., 2011, Seroussi et al., 2012).

Related Work

Existing AA methods either focused on finding suitable features or on developing effective techniques.

Proposed Tri-Training Algorithm

5.1 Overall Framework

Experimental Evaluation

We now evaluate the proposed method.

Conclusion

In this paper, we investigated the problem of authorship attribution with very few labeled examples.

Topics

SVM

Appears in 13 sentences as: SVM (13)
In Tri-Training for Authorship Attribution with Limited Training Data
  1. However, the self-training method in (Kourtis and Stamatatos, 2011) uses two classifiers (CNG and SVM ) on one view.
    Page 1, “Introduction”
  2. On developing effective learning techniques, supervised classification has been the dominant approach, e.g., neural networks (Graham et al., 2005; Zheng et al., 2006), decision tree (Uzuner and Katz, 2005; Zhao and Zobel, 2005), logistic regression (Madigan et al., 2005), SVM (Diederich et al., 2000; Gamon 2004; Li et al., 2006; Kim et al., 2011), etc.
    Page 2, “Related Work”
  3. Many classification algorithms give such scores, e.g., SVM and logistic regression.
    Page 3, “Proposed Tri-Training Algorithm”
  4. We use logistic regression (LR) with L2 regularization (Fan et al., 2008) and the SVMWWW ( SVM ) system (Joachims, 2007) with its default settings as the classifiers.
    Page 3, “Experimental Evaluation”
  5. It self-trains two classifiers from the character 3- gram, lexical, and syntactic views using CNG and SVM classifiers (Kourtis and Stamatatos, 2011).
    Page 4, “Experimental Evaluation”
  6. The original method applied only CNG and SVM on the character n-gram view.
    Page 4, “Experimental Evaluation”
  7. It again uses the character, lexical and syntactic view and SVM as one of the two classifiers.
    Page 4, “Experimental Evaluation”
  8. We use SVM and LR as the learners as they are among the best methods.
    Page 4, “Experimental Evaluation”
  9. Effects of SVM and LR on tri-training
    Page 4, “Experimental Evaluation”
  10. The effects of SVM and LR on tri-training are shown in Fig.
    Page 4, “Experimental Evaluation”
  11. It is clear that LR outperforms SVM by a large margin for tri-training when the number of iterations (k) is
    Page 4, “Experimental Evaluation”

See all papers in Proc. ACL 2014 that mention SVM.

See all papers in Proc. ACL that mention SVM.

Back to top.

n-grams

Appears in 5 sentences as: n-grams (5)
In Tri-Training for Authorship Attribution with Limited Training Data
  1. Example features include function words (Argamon et al., 2007), richness features (Gamon 2004), punctuation frequencies (Graham et al., 2005), character (Grieve, 2007), word (Burrows, 1992) and POS n-grams (Gamon, 2004; Hirst and Feiguina, 2007), rewrite rules (Halteren et al., 1996), and similarities (Qian and Liu, 2013).
    Page 2, “Related Work”
  2. The features in the character view are the character n-grams of a document.
    Page 3, “Proposed Tri-Training Algorithm”
  3. Character n-grams are simple and easily available for any natural language.
    Page 3, “Proposed Tri-Training Algorithm”
  4. We use four content-independent structures including n-grams of POS tags (n = 1..3) and rewrite rules (Kim et al., 2011).
    Page 3, “Proposed Tri-Training Algorithm”
  5. CNG is a pro-file-based method which represents the author as the N most frequent character n-grams of all his/her training texts.
    Page 4, “Experimental Evaluation”

See all papers in Proc. ACL 2014 that mention n-grams.

See all papers in Proc. ACL that mention n-grams.

Back to top.

logistic regression

Appears in 3 sentences as: logistic regression (3)
In Tri-Training for Authorship Attribution with Limited Training Data
  1. On developing effective learning techniques, supervised classification has been the dominant approach, e.g., neural networks (Graham et al., 2005; Zheng et al., 2006), decision tree (Uzuner and Katz, 2005; Zhao and Zobel, 2005), logistic regression (Madigan et al., 2005), SVM (Diederich et al., 2000; Gamon 2004; Li et al., 2006; Kim et al., 2011), etc.
    Page 2, “Related Work”
  2. Many classification algorithms give such scores, e.g., SVM and logistic regression .
    Page 3, “Proposed Tri-Training Algorithm”
  3. We use logistic regression (LR) with L2 regularization (Fan et al., 2008) and the SVMWWW (SVM) system (Joachims, 2007) with its default settings as the classifiers.
    Page 3, “Experimental Evaluation”

See all papers in Proc. ACL 2014 that mention logistic regression.

See all papers in Proc. ACL that mention logistic regression.

Back to top.