Recognizing Rare Social Phenomena in Conversation: Empowerment Detection in Support Group Chatrooms
Mayfield, Elijah and Adamson, David and Penstein Rosé, Carolyn

Article Structure

Abstract

Automated annotation of social behavior in conversation is necessary for large-scale analysis of real-world conversational data.

Introduction

Quantitative social science research has experienced a recent expansion, out of controlled settings and into natural environments.

Background

We ground this paper’s discussion of machine learning with a real problem, turning to the annotation of empowerment language in chatl.

Data

Our data consists of a set of chatroom conversation transcripts from the Cancer Support Community2.

Cue Discovery for Content Selection

Our algorithm performs content selection by learning a set of cue features.

Prediction

Our prediction algorithm begins with a standard implementation of cross-validated committees (Parmanto et al., 1996), whose results are aggregated with a confidence voting method intended to favor rare labels (Erp et al., 2002).

Experimental Results

All experiments were performed using LightSIDE (Mayfield and Rose, 2013).

Discussion

These results show promise for our techniques, which are able to distinguish features of rare la-)els, previously awash in a sea of irrelevance.

Conclusion

We have demonstrated an algorithm for improving automated classification accuracy on highly skewed tasks for conversational data.

Topics

machine learning

Appears in 9 sentences as: machine learning (9)
In Recognizing Rare Social Phenomena in Conversation: Empowerment Detection in Support Group Chatrooms
  1. This makes supervised machine learning difficult, through a combination of noisy features and unbalanced class distributions.
    Page 1, “Abstract”
  2. While machine learning is highly effective for annotation tasks with relatively balanced labels, such as sentiment analysis (Pang and Lee, 2004), more complex social functions are often rarer.
    Page 1, “Introduction”
  3. We propose adaptations to existing machine learning algorithms which improve recognition of rare annotations in conversational text data.
    Page 1, “Introduction”
  4. We introduce the domain of empowerment in support contexts, along with previous studies on the challenges that these annotations (and similar others) bring to machine learning .
    Page 2, “Introduction”
  5. We introduce our new technique for improving the ability to automate this annotation, along with other optimiza-tions to the machine learning workflow which are tailored to this skewed class balance.
    Page 2, “Introduction”
  6. We ground this paper’s discussion of machine learning with a real problem, turning to the annotation of empowerment language in chatl.
    Page 2, “Background”
  7. Users, of course, do not express empowerment in every thread in which they participate, which leads to a challenge for machine learning .
    Page 2, “Background”
  8. This approach is designed to bias the prediction of our machine learning algorithms in favor of minority classes in a coherent manner.
    Page 6, “Prediction”
  9. Our experiments show that this model significantly improves machine learning performance.
    Page 8, “Conclusion”

See all papers in Proc. ACL 2013 that mention machine learning.

See all papers in Proc. ACL that mention machine learning.

Back to top.

feature space

Appears in 5 sentences as: feature space (5) feature spaces (1)
In Recognizing Rare Social Phenomena in Conversation: Empowerment Detection in Support Group Chatrooms
  1. Our feature space X = {$1, :52, .
    Page 4, “Cue Discovery for Content Selection”
  2. We search only top candidates for efficiency, following the fixed-width search methodology for feature selection in very high-dimensionality feature spaces (Gutlein et al., 2009).
    Page 5, “Cue Discovery for Content Selection”
  3. One challenge of this approach is our underlying unigram feature space - tree-based algorithms are generally poor classifiers for the high-dimensionality, low-information features in a lexical feature space (Han et al., 2001).
    Page 6, “Prediction”
  4. We exhaustively sweep this feature space , and report the most successful stump rules for each annotation task.
    Page 7, “Prediction”
  5. We use a binary unigram feature space , and we perform 7-fold cross-va1idation.
    Page 7, “Experimental Results”

See all papers in Proc. ACL 2013 that mention feature space.

See all papers in Proc. ACL that mention feature space.

Back to top.

significant improvements

Appears in 5 sentences as: Significant improvement (1) significant improvement (1) significant improvements (2) significantly improves (1)
In Recognizing Rare Social Phenomena in Conversation: Empowerment Detection in Support Group Chatrooms
  1. Using these techniques, we demonstrate a significant improvement in classifier performance when recognizing the language of empowerment in support group chatrooms, a critical application area for researchers studying conversational interactions in healthcare (Uden-Kraan et al., 2009).
    Page 2, “Introduction”
  2. For self-empowerment recognition, all methods that we introduce are significant improvements in H, the
    Page 7, “Experimental Results”
  3. Statis-:ically significant improvements over baseline are narked (p < .01, T; p < .05, *; p < 0.1, +).
    Page 7, “Experimental Results”
  4. Significant improvement (p < 0.05) in-
    Page 8, “Discussion”
  5. Our experiments show that this model significantly improves machine learning performance.
    Page 8, “Conclusion”

See all papers in Proc. ACL 2013 that mention significant improvements.

See all papers in Proc. ACL that mention significant improvements.

Back to top.

unigram

Appears in 4 sentences as: unigram (3) unigrams (1)
In Recognizing Rare Social Phenomena in Conversation: Empowerment Detection in Support Group Chatrooms
  1. .xm} consists of m unigram features representing the observed vocabulary used in our corpus.
    Page 4, “Cue Discovery for Content Selection”
  2. One challenge of this approach is our underlying unigram feature space - tree-based algorithms are generally poor classifiers for the high-dimensionality, low-information features in a lexical feature space (Han et al., 2001).
    Page 6, “Prediction”
  3. splits than would unigrams alone.
    Page 7, “Prediction”
  4. We use a binary unigram feature space, and we perform 7-fold cross-va1idation.
    Page 7, “Experimental Results”

See all papers in Proc. ACL 2013 that mention unigram.

See all papers in Proc. ACL that mention unigram.

Back to top.

logistic regression

Appears in 3 sentences as: logistic regression (4)
In Recognizing Rare Social Phenomena in Conversation: Empowerment Detection in Support Group Chatrooms
  1. Before describing extensions to the baseline logistic regression model, we define notation.
    Page 4, “Cue Discovery for Content Selection”
  2. We define classifiers as functions f (:E —> y E Y); in practice, we use logistic regression via LibLINEAR (Fan et al., 2008).
    Page 4, “Cue Discovery for Content Selection”
  3. We compare our methods against baselines including a majority baseline, a baseline logistic regression classifier with L2 regularized features, and two common ensemble methods, AdaBoost (Freund and Schapire, 1996) and bagging (Breiman, 1996) with logistic regression base c1assifiers5.
    Page 7, “Experimental Results”

See all papers in Proc. ACL 2013 that mention logistic regression.

See all papers in Proc. ACL that mention logistic regression.

Back to top.