Discovering User Interactions in Ideological Discussions
Mukherjee, Arjun and Liu, Bing

Article Structure

Abstract

Online discussion forums are a popular platform for people to voice their opinions on any subject matter and to discuss or debate any issue of interest.

Introduction

Online discussion/debate forums allow people with common interests to freely ask and answer questions, to express their views and opinions on any subject matter, and to discuss issues of common interest.

Related Work

There are several research areas that are related to our work.

Model

We now introduce the JTE-P model with additional details.

Phrase Ranking based on Relevance

We now detail our method of preprocessing n-grams (phrases) based on relevance to select a subset of highly relevant n-grams for model building.

Classifying Pair Interaction Nature

We now determine whether two users (also called a user pair) mostly agree or disagree with each other in their exchanges, i.e., their pair interaction or arguing nature.

Empirical Evaluation

We now evaluate the proposed techniques in the context of the J TE-P model.

Conclusion

This paper studied the problem of modeling user pair interactions in online discussions with the purpose of discovering the interaction or arguing nature of each author pair and various AD-eXpressions emitted in debates.

Topics

n-grams

Appears in 18 sentences as: n-grams (23)
In Discovering User Interactions in Ideological Discussions
  1. Like most generative models for text, a post (document) is viewed as a bag of n-grams and each n-gram (word/phrase) takes one value from a predefined vocabulary.
    Page 4, “Model”
  2. Instead of using all n-grams, a relevance based ranking method is proposed to select a subset of highly relevant n-grams for model building (details in §4).
    Page 4, “Model”
  3. For notational convenience, we use terms to denote both words (unigrams) and phrases ( n-grams ).
    Page 4, “Model”
  4. For phrasal terms ( n-grams ), all POS tags and lexemes of de- are considered as contextual information for computing feature functions in Max-Ent.
    Page 4, “Model”
  5. We now detail our method of preprocessing n-grams (phrases) based on relevance to select a subset of highly relevant n-grams for model building.
    Page 5, “Phrase Ranking based on Relevance”
  6. A large number of irrelevant n-grams slow inference.
    Page 5, “Phrase Ranking based on Relevance”
  7. This method, however, is expensive computationally and has a limitation for arbitrary length n-grams .
    Page 5, “Phrase Ranking based on Relevance”
  8. This approach considers adjacent word pairs and identifies n-grams which occur much more often than one would expect by chance alone by computing likelihood ratios.
    Page 5, “Phrase Ranking based on Relevance”
  9. As our goal is to enhance the expressiveness of our models by considering relevant n-grams preserving the advantages of exchangeable modeling, we employ a preprocessing technique to rank n-grams based on relevance and consider certain number of top ranked n-grams based on coverage (details follow) in our vocabulary.
    Page 5, “Phrase Ranking based on Relevance”
  10. Next, we rank the candidate phrases ( n-grams ) using our probabilistic ranking function.
    Page 5, “Phrase Ranking based on Relevance”
  11. loge being a constant is ignored because it cancels out while comparing candidate n-grams .
    Page 5, “Phrase Ranking based on Relevance”

See all papers in Proc. ACL 2013 that mention n-grams.

See all papers in Proc. ACL that mention n-grams.

Back to top.

topic models

Appears in 12 sentences as: topic model (1) topic modeling (3) Topic models (2) topic models (7)
In Discovering User Interactions in Ideological Discussions
  1. We present a topic model based approach to answer these questions.
    Page 1, “Abstract”
  2. Since agreement and disagreement expressions are usually multi-word phrases, we propose to employ a ranking method to identify highly relevant phrases prior to topic modeling .
    Page 1, “Abstract”
  3. In our earlier work (Mukherjee and Liu, 2012a), we proposed three topic models to mine contention points, which also extract AD-expressions.
    Page 2, “Introduction”
  4. In this paper, we further improve the work by coupling an information retrieval method to rank good candidate phrases with topic modeling in order to discover more accurate AD-expressions.
    Page 2, “Introduction”
  5. Topic models: Our work is also related to topic modeling and joint modeling of topics and other information as we jointly model several aspects of discussions/debates.
    Page 2, “Related Work”
  6. Topic models like pLSA (Hofmann, 1999) and LDA (Blei et al., 2003) have proved to be very successful in mining topics from large text collections.
    Page 2, “Related Work”
  7. There have been various extensions to multi-grain (Titov and McDonald, 2008), labeled (Ramage et al., 2009), and sequential (Du et al., 2010) topic models .
    Page 2, “Related Work”
  8. Yet other approaches extend topic models to produce author specific topics (Rosen-Zvi et al., 2004), author persona (Mimno and McCallum, 2007), social roles (McCallum et al., 2007), etc.
    Page 2, “Related Work”
  9. Also related are topic models in sentiment analysis which are often referred to as Aspect and Sentiment models (ASMs).
    Page 2, “Related Work”
  10. Topics in most topic models like LDA are usually unigram distributions.
    Page 5, “Phrase Ranking based on Relevance”
  11. For quantitative evaluation, topic models are often compared using perplexity.
    Page 7, “Empirical Evaluation”

See all papers in Proc. ACL 2013 that mention topic models.

See all papers in Proc. ACL that mention topic models.

Back to top.

unigrams

Appears in 9 sentences as: unigram (5) unigrams (6)
In Discovering User Interactions in Ideological Discussions
  1. For notational convenience, we use terms to denote both words ( unigrams ) and phrases (n-grams).
    Page 4, “Model”
  2. Topics in most topic models like LDA are usually unigram distributions.
    Page 5, “Phrase Ranking based on Relevance”
  3. For each word, a topic is sampled first, then its status as a unigram or bigram is sampled, and finally the word is sampled from a topic-specific unigram or bigram distribution.
    Page 5, “Phrase Ranking based on Relevance”
  4. Yet another thread of research post-processes the discovered topical unigrams to form multi-word phrases using likelihood scores (Blei and Lafferty, 2009).
    Page 5, “Phrase Ranking based on Relevance”
  5. We first induce a unigram JTE-P whereby we cluster the relevant AD-expression unigrams in (pig and (pg-5,49.
    Page 5, “Phrase Ranking based on Relevance”
  6. The ranking function is grounded on the following hypothesis: a relevant phrase is one whose unigrams are closely related to (or appear with high probabilities in) the given AD-expression type, e : Agreement ( Ag ) or disagreement (DisA g).
    Page 5, “Phrase Ranking based on Relevance”
  7. log (W) = 2:21 mm (5) Given the expression model gag previously learned by inducing the unigram JTE-P, it is intuitive to set P(wl-|Rel = 1, e) to the point 115%in ngg)+VBE ’ where nffii is the number of times w,- was
    Page 6, “Phrase Ranking based on Relevance”
  8. Thus, we choose top 5000 candidate n-grams for each expression type and add them to the vocabulary beyond all unigrams .
    Page 6, “Phrase Ranking based on Relevance”
  9. Note that phrases for topics are not as crucial as for AD-expressions because topics can more or less be defined by unigrams .
    Page 6, “Phrase Ranking based on Relevance”

See all papers in Proc. ACL 2013 that mention unigrams.

See all papers in Proc. ACL that mention unigrams.

Back to top.

generative process

Appears in 5 sentences as: generative process (5)
In Discovering User Interactions in Ideological Discussions
  1. The generative process of ASMs is, however, different from our model.
    Page 2, “Related Work”
  2. This observation motivates the generative process of our model.
    Page 3, “Model”
  3. Referring to the notations in Table 1, we explain the generative process of JTE-P.
    Page 3, “Model”
  4. We now detail the generative process of J TE-P (plate notation in Figure l) as follows:
    Page 4, “Model”
  5. This thread of research models bigrams by encoding them into the generative process .
    Page 5, “Phrase Ranking based on Relevance”

See all papers in Proc. ACL 2013 that mention generative process.

See all papers in Proc. ACL that mention generative process.

Back to top.

n-gram

Appears in 5 sentences as: n-gram (6)
In Discovering User Interactions in Ideological Discussions
  1. Like most generative models for text, a post (document) is viewed as a bag of n-grams and each n-gram (word/phrase) takes one value from a predefined vocabulary.
    Page 4, “Model”
  2. While this is reasonable, a significant n-gram with high likelihood score may not necessarily be relevant to the problem domain.
    Page 5, “Phrase Ranking based on Relevance”
  3. This is nothing wrong per se because the statistical tests only judge significance of an n-gram, but a significant n-gram may not necessarily be relevant in a given problem domain.
    Page 5, “Phrase Ranking based on Relevance”
  4. The reduced dataset consists of 1095586 tokens (after n-gram preprocessing in §4), 40102 posts with an average of 27 posts or interactions per pair.
    Page 7, “Empirical Evaluation”
  5. A novel technique was also proposed to rank n-gram phrases where relevance based ranking was used in conjunction with a semi-supervised generative model.
    Page 9, “Conclusion”

See all papers in Proc. ACL 2013 that mention n-gram.

See all papers in Proc. ACL that mention n-gram.

Back to top.

sentiment analysis

Appears in 5 sentences as: Sentiment analysis (2) sentiment analysis (4)
In Discovering User Interactions in Ideological Discussions
  1. AD-expressions are crucial for the analysis of interactive discussions and debates just as sentiment expressions are instrumental in sentiment analysis (Liu, 2012).
    Page 1, “Introduction”
  2. Sentiment analysis: Sentiment analysis determines positive and negative opinions expressed on entities and aspects (Hu and Liu, 2004).
    Page 2, “Related Work”
  3. Thus, this work expands the sentiment analysis research.
    Page 2, “Related Work”
  4. Also related are topic models in sentiment analysis which are often referred to as Aspect and Sentiment models (ASMs).
    Page 2, “Related Work”
  5. p@n is commonly used to evaluate a ranking when the total number of correct items is unknown (e.g., Web search results, aspect terms in topic models for sentiment analysis (Zhao et al., 2010), etc.).
    Page 7, “Empirical Evaluation”

See all papers in Proc. ACL 2013 that mention sentiment analysis.

See all papers in Proc. ACL that mention sentiment analysis.

Back to top.

generative model

Appears in 4 sentences as: generative model (3) generative models (1)
In Discovering User Interactions in Ideological Discussions
  1. We employ a semi-supervised generative model called JTE-P to jointly model AD-expressions, pair interactions, and discussion topics simultaneously in a single framework.
    Page 2, “Introduction”
  2. JTE-P is a semi-supervised generative model motivated by the joint occurrence of expression types (agreement and disagreement), topics in discussion posts, and user pairwise interactions.
    Page 3, “Model”
  3. Like most generative models for text, a post (document) is viewed as a bag of n-grams and each n-gram (word/phrase) takes one value from a predefined vocabulary.
    Page 4, “Model”
  4. A novel technique was also proposed to rank n-gram phrases where relevance based ranking was used in conjunction with a semi-supervised generative model .
    Page 9, “Conclusion”

See all papers in Proc. ACL 2013 that mention generative model.

See all papers in Proc. ACL that mention generative model.

Back to top.

randomly sampled

Appears in 4 sentences as: random sample (1) randomly sampled (3)
In Discovering User Interactions in Ideological Discussions
  1. To compute coverage, we randomly sampled 500 documents from the corpus and listed the candidate n-grams3 in the collection of sampled 500 documents.
    Page 6, “Phrase Ranking based on Relevance”
  2. We then computed the coverage to see how many of the relevant terms in the random sample were also present in top k phrases from the ranked candidate n-grams.
    Page 6, “Phrase Ranking based on Relevance”
  3. the Max-Ent parameters A, we randomly sampled 500 terms from the held-out data (10 threads in our corpus which were excluded from the evaluation of tasks in §6.2, §6.3) appearing at least 10 times and labeled them as topical (361) or AD-expressions (139) and used the corresponding features of each term (in the context of posts where it occurs, §3) to train the Max-Ent model.
    Page 7, “Empirical Evaluation”
  4. Instead, we randomly sampled 500 pairs (z 34% of the population) for evaluation.
    Page 9, “Empirical Evaluation”

See all papers in Proc. ACL 2013 that mention randomly sampled.

See all papers in Proc. ACL that mention randomly sampled.

Back to top.

bigram

Appears in 3 sentences as: bigram (3) bigrams (1)
In Discovering User Interactions in Ideological Discussions
  1. This thread of research models bigrams by encoding them into the generative process.
    Page 5, “Phrase Ranking based on Relevance”
  2. For each word, a topic is sampled first, then its status as a unigram or bigram is sampled, and finally the word is sampled from a topic-specific unigram or bigram distribution.
    Page 5, “Phrase Ranking based on Relevance”
  3. In (Tomokiyo and Hurst, 2003), a language model approach is used for bigram phrase extraction.
    Page 5, “Phrase Ranking based on Relevance”

See all papers in Proc. ACL 2013 that mention bigram.

See all papers in Proc. ACL that mention bigram.

Back to top.

feature settings

Appears in 3 sentences as: feature sets (1) feature settings (2)
In Discovering User Interactions in Ideological Discussions
  1. To compare classification performance, we use two feature sets : (i) standard word + POS 1-4 grams and (ii) AD-expressions from $5.
    Page 9, “Empirical Evaluation”
  2. Predicting agreeing arguing nature is harder than that of disagreeing across all feature settings .
    Page 9, “Empirical Evaluation”
  3. Using the discovered AD-eXpressions (Table 6, last low) as features renders a statistically significant (see Table 6 caption) improvement over other baseline feature settings .
    Page 9, “Empirical Evaluation”

See all papers in Proc. ACL 2013 that mention feature settings.

See all papers in Proc. ACL that mention feature settings.

Back to top.

human judges

Appears in 3 sentences as: human judges (2) human judgment (1)
In Discovering User Interactions in Ideological Discussions
  1. For this and subsequent human judgment tasks, we use two judges (graduate students well versed in English).
    Page 6, “Phrase Ranking based on Relevance”
  2. The evaluation of this task requires human judges to read all the posts where the two users forming the pair have interacted.
    Page 9, “Empirical Evaluation”
  3. Two human judges were asked to independently read all the post interactions of 500 pairs and label each pair as overall “disagreeing” or overall “agreeing” or “none”.
    Page 9, “Empirical Evaluation”

See all papers in Proc. ACL 2013 that mention human judges.

See all papers in Proc. ACL that mention human judges.

Back to top.

semi-supervised

Appears in 3 sentences as: semi-supervised (3)
In Discovering User Interactions in Ideological Discussions
  1. We employ a semi-supervised generative model called JTE-P to jointly model AD-expressions, pair interactions, and discussion topics simultaneously in a single framework.
    Page 2, “Introduction”
  2. JTE-P is a semi-supervised generative model motivated by the joint occurrence of expression types (agreement and disagreement), topics in discussion posts, and user pairwise interactions.
    Page 3, “Model”
  3. A novel technique was also proposed to rank n-gram phrases where relevance based ranking was used in conjunction with a semi-supervised generative model.
    Page 9, “Conclusion”

See all papers in Proc. ACL 2013 that mention semi-supervised.

See all papers in Proc. ACL that mention semi-supervised.

Back to top.