Extracting Social Power Relationships from Natural Language
Bramsen, Philip and Escobar-Molano, Martha and Patel, Ami and Alonso, Rafael

Article Structure

Abstract

Sociolinguists have long argued that social context influences language use in all manner of ways, resulting in lects].

Topics

n-grams

Appears in 29 sentences as: N-grams (1) n-grams (34) n-grams: (1) n-gram’s (1)
In Extracting Social Power Relationships from Natural Language
  1. Our approach first identifies statistically salient phrases of words and parts of speech — known as n-grams — in training texts generated in conditions where the social power
    Page 1, “Abstract”
  2. Then, we apply machine learning to train classifiers with groups of these n-grams as features.
    Page 2, “Abstract”
  3. Unlike their works, our text classification techniques take into account the frequency of occurrence of word n-grams and part-of-speech (POS) tag sequences, and other measures of statistical salience in training data.
    Page 3, “Abstract”
  4. Many would be better modeled by POS tag unigrams (with no word information) or by longer n-grams consisting of either words, POS tags, or a combination of the two.
    Page 4, “Abstract”
  5. Because considering such features would increase the size of the feature space, we suspected that including these features would also benefit from algorithmic means of selecting n-grams that are indicative of particular lects, and even from binning these relevant n-grams into sets to be used as features.
    Page 4, “Abstract”
  6. Therefore, we focused on an approach where each feature is associated with a set of one or more n-grams .
    Page 4, “Abstract”
  7. (“mixed” n-grams ).
    Page 4, “Abstract”
  8. Let S represent a set {111, nk} of n-grams .
    Page 4, “Abstract”
  9. Defining features in this manner allows us to both explore the bag-of-words representation as well as use groups of n-grams as features, which we believed would be a better fit for this problem.
    Page 5, “Abstract”
  10. To identify n-grams which would be useful features, frequencies of n-grams in only the training set are considered.
    Page 5, “Abstract”
  11. Different types of frequency measures were explored to capture different types of information about an n-gram’s usage.
    Page 5, “Abstract”

See all papers in Proc. ACL 2011 that mention n-grams.

See all papers in Proc. ACL that mention n-grams.

Back to top.

n-gram

Appears in 21 sentences as: N-Gram (1) N-gram (1) n-gram (21)
In Extracting Social Power Relationships from Natural Language
  1. Each n-gram is a sequence of words, POS tags or a combination of words and POS tags
    Page 4, “Abstract”
  2. To illustrate, consider the following feature set, a bigram and a trigram (each term in the n-gram either has the form word or Atag):
    Page 4, “Abstract”
  3. The first n-gram of the set, please AVB, would match pleaseARB bringAVB from the text.
    Page 4, “Abstract”
  4. The frequency of this n-gram in T would then be 1/9, where l is the number of substrings in T that match
    Page 4, “Abstract”
  5. The other n-gram , the trigram please A‘eomma’ AVB, does not have any match, so the final value of the feature is 1/9.
    Page 5, “Abstract”
  6. 3.2 N-Gram Selection
    Page 5, “Abstract”
  7. 0 Absolute frequency: The total number of times a particular n-gram occurs in the text of a given class (social power lect).
    Page 5, “Abstract”
  8. 0 Relative frequency: The total number of times a particular n-gram occurs in a given class, divided by the total number of n-grams in that class.
    Page 5, “Abstract”
  9. Normalization by the size of the class makes relative frequency a better metric for comparing n-gram usage across classes.
    Page 5, “Abstract”
  10. 0 We set a minimum threshold for the absolute frequency of the n-gram in a class.
    Page 5, “Abstract”
  11. 0 We require that the ratio of the relative frequency of the n-gram in one class to its relative frequency in the other class is also greater than a threshold.
    Page 5, “Abstract”

See all papers in Proc. ACL 2011 that mention n-gram.

See all papers in Proc. ACL that mention n-gram.

Back to top.

text classification

Appears in 10 sentences as: text classification (10)
In Extracting Social Power Relationships from Natural Language
  1. This paper explores a text classification problem we will call lect modeling, an example of what has been termed computational sociolinguistics.
    Page 1, “Abstract”
  2. Our results validate the treatment of lect modeling as a text classification problem — albeit a hard one — and constitute a case for future research in computational sociolinguistics.
    Page 1, “Abstract”
  3. Given, then, that there are distinct differences among what we term UpSpeak and DownSpeak, we treat Social Power Modeling as an instance of text classification (or categorization): we seek to assign a class (UpSpeak or DownSpeak) to a text sample.
    Page 2, “Abstract”
  4. Unlike their works, our text classification techniques take into account the frequency of occurrence of word n-grams and part-of-speech (POS) tag sequences, and other measures of statistical salience in training data.
    Page 3, “Abstract”
  5. Text-based emotion prediction is another instance of text classification , where the goal is to detect the emotion appropriate to a text (Alm, Roth & Sproat, 2005) or provoked by an author, for example (Strapparava & Mihalcea, 2008).
    Page 3, “Abstract”
  6. Personality classification seems to be the application of text classification which is the most relevant to Social Power Modeling.
    Page 3, “Abstract”
  7. Apart from text classification , work from the topic modeling community is also closely related to Social Power Modeling.
    Page 3, “Abstract”
  8. Previous work in traditional text classification and its variants — such as sentiment analysis — has achieved successful results by using the bag-of-words representation; that is, by treating text as a collection of words with no interdependencies, training a classifier on a large feature set of word unigrams which appear in the corpus.
    Page 4, “Abstract”
  9. However, we generally achieved the best results using support vector machines, a machine learning classifier which has been successfully applied to many previous text classification problems.
    Page 5, “Abstract”
  10. Our text classification problem is similar to sentiment analysis in that there are class dependencies; for example, DownSpeak is more closely related to PeerSpeak than to UpSpeak.
    Page 9, “Abstract”

See all papers in Proc. ACL 2011 that mention text classification.

See all papers in Proc. ACL that mention text classification.

Back to top.

unigrams

Appears in 8 sentences as: unigram (2) Unigrams (1) unigrams (6)
In Extracting Social Power Relationships from Natural Language
  1. Previous work in traditional text classification and its variants — such as sentiment analysis — has achieved successful results by using the bag-of-words representation; that is, by treating text as a collection of words with no interdependencies, training a classifier on a large feature set of word unigrams which appear in the corpus.
    Page 4, “Abstract”
  2. Few of these tactics would be effectively encapsulated by word unigrams .
    Page 4, “Abstract”
  3. Many would be better modeled by POS tag unigrams (with no word information) or by longer n-grams consisting of either words, POS tags, or a combination of the two.
    Page 4, “Abstract”
  4. Unigrams and Bigrams: As a different sort of baseline, we considered the results of a bag-of-words based classifier.
    Page 7, “Abstract”
  5. bigram frequency minimized the difference in the number of features between the unigram and bigram experiments.
    Page 7, “Abstract”
  6. While the bigrams on their own were less successful than the unigrams, as seen in line (2), adding them to the unigram features improved accuracy against the test set, shown in line (3).
    Page 7, “Abstract”
  7. As we had speculated that including surface-level grammar information in the form of tag n-grams would be beneficial to our problem, we performed experiments using all tag unigrams and all tag bigrams occurring in the training set as features.
    Page 7, “Abstract”
  8. For example, one feature might represent the set of all word unigrams which have a relative frequency ratio between 1.5 and 1.6.
    Page 8, “Abstract”

See all papers in Proc. ACL 2011 that mention unigrams.

See all papers in Proc. ACL that mention unigrams.

Back to top.

bigrams

Appears in 7 sentences as: bigram (3) Bigrams (1) bigrams (4)
In Extracting Social Power Relationships from Natural Language
  1. To illustrate, consider the following feature set, a bigram and a trigram (each term in the n-gram either has the form word or Atag):
    Page 4, “Abstract”
  2. please AVB and 9 is the number of bigrams in T, excluding sentence initial and final markers.
    Page 5, “Abstract”
  3. Unigrams and Bigrams : As a different sort of baseline, we considered the results of a bag-of-words based classifier.
    Page 7, “Abstract”
  4. We then performed experiments with word bigrams , selecting as features those which occurred at least seven times in the relevant lects of the training set.
    Page 7, “Abstract”
  5. bigram frequency minimized the difference in the number of features between the unigram and bigram experiments.
    Page 7, “Abstract”
  6. While the bigrams on their own were less successful than the unigrams, as seen in line (2), adding them to the unigram features improved accuracy against the test set, shown in line (3).
    Page 7, “Abstract”
  7. As we had speculated that including surface-level grammar information in the form of tag n-grams would be beneficial to our problem, we performed experiments using all tag unigrams and all tag bigrams occurring in the training set as features.
    Page 7, “Abstract”

See all papers in Proc. ACL 2011 that mention bigrams.

See all papers in Proc. ACL that mention bigrams.

Back to top.

bag-of-words

Appears in 6 sentences as: bag-of-words (6)
In Extracting Social Power Relationships from Natural Language
  1. Previous work in traditional text classification and its variants — such as sentiment analysis — has achieved successful results by using the bag-of-words representation; that is, by treating text as a collection of words with no interdependencies, training a classifier on a large feature set of word unigrams which appear in the corpus.
    Page 4, “Abstract”
  2. Defining features in this manner allows us to both explore the bag-of-words representation as well as use groups of n-grams as features, which we believed would be a better fit for this problem.
    Page 5, “Abstract”
  3. In experiments based on the bag-of-words model, we only consider an absolute frequency threshold, whereas in later experiments, we also take into account the relative frequency ratio threshold.
    Page 5, “Abstract”
  4. Unigrams and Bigrams: As a different sort of baseline, we considered the results of a bag-of-words based classifier.
    Page 7, “Abstract”
  5. It far outperforms the bag-of-words baselines, despite significantly fewer features.
    Page 8, “Abstract”
  6. Line (6) of Table 2 shows that while this approach is an improvement over the basic bag-of-words method, grouping features still improves results.
    Page 8, “Abstract”

See all papers in Proc. ACL 2011 that mention bag-of-words.

See all papers in Proc. ACL that mention bag-of-words.

Back to top.

feature set

Appears in 6 sentences as: feature set (5) feature sets (1)
In Extracting Social Power Relationships from Natural Language
  1. Previous work in traditional text classification and its variants — such as sentiment analysis — has achieved successful results by using the bag-of-words representation; that is, by treating text as a collection of words with no interdependencies, training a classifier on a large feature set of word unigrams which appear in the corpus.
    Page 4, “Abstract”
  2. To illustrate, consider the following feature set , a bigram and a trigram (each term in the n-gram either has the form word or Atag):
    Page 4, “Abstract”
  3. While the feature set was too small to produce notable results, we identified which features actually were indicative of lect.
    Page 7, “Abstract”
  4. We explored possible feature sets with cross validation.
    Page 8, “Abstract”
  5. Our goal was to have successful results using only statistically extracted features; however, we examined the effect of augmenting this feature set with the most indicative of the human-identified feature — polite imperatives.
    Page 8, “Abstract”
  6. We attempted to increase the training set size by performing exploratory experiments with self-training, an iterative semi-supervised learning method (Zhu, 2005) with the feature set from (7).
    Page 8, “Abstract”

See all papers in Proc. ACL 2011 that mention feature set.

See all papers in Proc. ACL that mention feature set.

Back to top.

sentiment analysis

Appears in 6 sentences as: Sentiment analysis (1) sentiment analysis (5)
In Extracting Social Power Relationships from Natural Language
  1. If SPM were yoked with sentiment analysis , we might identify which opinions belong to respected members of online communities or lay the groundwork for understanding how respect is earned in social networks.
    Page 2, “Abstract”
  2. Closely related natural language processing problems are authorship attribution, sentiment analysis , emotion detection, and personality classification: all aim to extract higher-level information from language.
    Page 2, “Abstract”
  3. Sentiment analysis , which strives to determine the attitude of an author from text, has recently garnered much attention (e.g.
    Page 2, “Abstract”
  4. Their attempt to classify each personality trait as either “high” or “low” echoes early sentiment analysis work that reduced sentiments to either positive or negative (Pang, Lee, & Vaithyanathan, 2002), and supports initially treating Social Power Modeling as a binary classification task.
    Page 3, “Abstract”
  5. Previous work in traditional text classification and its variants — such as sentiment analysis — has achieved successful results by using the bag-of-words representation; that is, by treating text as a collection of words with no interdependencies, training a classifier on a large feature set of word unigrams which appear in the corpus.
    Page 4, “Abstract”
  6. Our text classification problem is similar to sentiment analysis in that there are class dependencies; for example, DownSpeak is more closely related to PeerSpeak than to UpSpeak.
    Page 9, “Abstract”

See all papers in Proc. ACL 2011 that mention sentiment analysis.

See all papers in Proc. ACL that mention sentiment analysis.

Back to top.

machine learning

Appears in 5 sentences as: machine learning (5)
In Extracting Social Power Relationships from Natural Language
  1. In particular, we use machine learning techniques to identify social power relationships between members of a social network, based purely on the content of their interpersonal communication.
    Page 1, “Abstract”
  2. Then, we apply machine learning to train classifiers with groups of these n-grams as features.
    Page 2, “Abstract”
  3. Our approach is corpus-driven like the Na'ive Bayes approach, but we interject statistically driven feature selection between the corpus and the machine learning classifiers.
    Page 3, “Abstract”
  4. They applied machine learning classifiers to map individuals to their roles in the hierarchy based on features related to email traffic patterns.
    Page 4, “Abstract”
  5. However, we generally achieved the best results using support vector machines, a machine learning classifier which has been successfully applied to many previous text classification problems.
    Page 5, “Abstract”

See all papers in Proc. ACL 2011 that mention machine learning.

See all papers in Proc. ACL that mention machine learning.

Back to top.

POS tags

Appears in 5 sentences as: POS tag (2) POS tagger (1) POS tags (4)
In Extracting Social Power Relationships from Natural Language
  1. Many would be better modeled by POS tag unigrams (with no word information) or by longer n-grams consisting of either words, POS tags , or a combination of the two.
    Page 4, “Abstract”
  2. Each n-gram is a sequence of words, POS tags or a combination of words and POS tags
    Page 4, “Abstract”
  3. or a POS tag .
    Page 4, “Abstract”
  4. As our approach requires text to be POS-tagged, we employed Stanford’s POS tagger (http://nlp.stanford.edu/software/tagger.shtml).
    Page 6, “Abstract”
  5. Binning: Next, we wished to explore longer n-grams of words or POS tags and to reduce the sparsity of the feature vectors.
    Page 7, “Abstract”

See all papers in Proc. ACL 2011 that mention POS tags.

See all papers in Proc. ACL that mention POS tags.

Back to top.

feature space

Appears in 3 sentences as: feature space (3)
In Extracting Social Power Relationships from Natural Language
  1. Because considering such features would increase the size of the feature space , we suspected that including these features would also benefit from algorithmic means of selecting n-grams that are indicative of particular lects, and even from binning these relevant n-grams into sets to be used as features.
    Page 4, “Abstract”
  2. Although this approach to partitioning is simple and worthy of improvement, it effectively reduced the dimensionality of the feature space .
    Page 5, “Abstract”
  3. Therefore, as we explored the feature space , small bins of different n-gram lengths were merged.
    Page 8, “Abstract”

See all papers in Proc. ACL 2011 that mention feature space.

See all papers in Proc. ACL that mention feature space.

Back to top.

feature vectors

Appears in 3 sentences as: feature vectors (3)
In Extracting Social Power Relationships from Natural Language
  1. The results of these experiments were not particularly strong, likely owing to the increased sparsity of the feature vectors .
    Page 7, “Abstract”
  2. Binning: Next, we wished to explore longer n-grams of words or POS tags and to reduce the sparsity of the feature vectors .
    Page 7, “Abstract”
  3. Self-Training: Besides sparse feature vectors , another factor likely to be hurting our classifier was the limited amount of training data.
    Page 8, “Abstract”

See all papers in Proc. ACL 2011 that mention feature vectors.

See all papers in Proc. ACL that mention feature vectors.

Back to top.

SVM

Appears in 3 sentences as: SVM (3)
In Extracting Social Power Relationships from Natural Language
  1. Oberlander and Nowson explore using a Na'ive Bayes and an SVM classifier to perform binary classification of text on each personality dimension.
    Page 3, “Abstract”
  2. The results of the SVM classifier, shown in line (1) of Table 2, were fairly poor.
    Page 7, “Abstract”
  3. Training a multiclass SVM on the binned n-gram features from (5) produces 51.6% cross-validation accuracy on training data and 44.4% accuracy on the weighted test set (both numbers should be compared to a 33% baseline).
    Page 9, “Abstract”

See all papers in Proc. ACL 2011 that mention SVM.

See all papers in Proc. ACL that mention SVM.

Back to top.