SciSurf: Index of 'Summarizing Emails with Conversational Cohesion and Subjectivity'

Topics

semantic similarity (17)
cosine similarity (11)
WordNet (5)
edge weight (4)
graph-based (4)
significantly improve (3)
statistical significance (3)

Topics

semantic similarity (17)
cosine similarity (11)
WordNet (5)
edge weight (4)
graph-based (4)
significantly improve (3)
statistical significance (3)

Summarizing Emails with Conversational Cohesion and Subjectivity

Carenini, Giuseppe and Ng, Raymond T. and Zhou, Xiaodong

Published in Proc. ACL, 2008

Article Structure

Abstract

In this paper, we study the problem of summarizing email conversations.

Introduction

With the ever increasing popularity of emails, it is very common nowadays that people discuss specific issues, events or tasks among a group of people by emails(Fisher and Moody, 2002).

Related Work

Rambow et al.

Extracting Conversations from Multiple Emails

In this section, we first review how to build a fragment quotation graph through an example.

Summarization Based on the Sentence Quotation Graph

Having built the sentence quotation graph with different measures of cohesion, in this section, we develop two summarization approaches.

Summarization with Subjective Opinions

Other than the conversation structure, the measures of cohesion and the graph-based summarization methods we have proposed, the importance of a sentence in emails can be captured from other aspects.

Empirical Evaluation

6.1 Dataset Setup

Conclusions

We study how to summarize email conversations based on the conversational cohesion and the subjective opinions.

Topics

semantic similarity

Appears in 17 sentences as: semantic similarities (2) Semantic Similarity (1) semantic similarity (14) semantically similar (1)

In Summarizing Emails with Conversational Cohesion and Subjectivity

We adopt three cohesion measures: clue words, semantic similarity and cosine similarity as the weight of the edges.
Page 1, “Abstract”
(Carenini et al., 2007), semantic similarity and cosine similarity.
Page 2, “Introduction”
Second, we only adopted one cohesion measure (clue words that are based on stemming), and did not consider more sophisticated ones such as semantically similar words.
Page 2, “Related Work”
3.3.2 Semantic Similarity Based on WordNet
Page 4, “Extracting Conversations from Multiple Emails”
Based on this observation, we propose to use semantic similarity to measure the cohesion between two sentences.
Page 4, “Extracting Conversations from Multiple Emails”
We use the well-known lexical database WordNet to get the semantic similarity of two words.
Page 4, “Extracting Conversations from Multiple Emails”
Specifically, we use the package by (Pedersen et al., 2004), which includes several methods to compute the semantic similarity .
Page 4, “Extracting Conversations from Multiple Emails”
Similar to the clue words, we measure the semantic similarity of two sentences by the total semantic similarity of the words in both sentences.
Page 4, “Extracting Conversations from Multiple Emails”
In the rest of this paper, let CWS denote the Generalized ClueWordSummarizer when the edge weight is based on clue words, and let CWS-Cosine and CWS-Semantic denote the summarizer when the edge weight is cosine similarity and semantic similarity respectively.
Page 5, “Summarization Based on the Sentence Quotation Graph”
In Section 3.3, we developed three ways to compute the weight of an edge in the sentence quotation graph, i.e., clue words, semantic similarity based on WordNet and cosine similarity.
Page 7, “Empirical Evaluation”
Table 1 shows the aggregated pyramid precision over all five summary lengths of CWS, CWS-Cosine, two semantic similarities , i.e., CWS-lesk and CWS-jcn.
Page 7, “Empirical Evaluation”

See all papers in Proc. ACL 2008 that mention semantic similarity.

See all papers in Proc. ACL that mention semantic similarity.

Back to top.

cosine similarity

Appears in 11 sentences as: Cosine Similarity (1) Cosine similarity (1) cosine similarity (9)

In Summarizing Emails with Conversational Cohesion and Subjectivity

We adopt three cohesion measures: clue words, semantic similarity and cosine similarity as the weight of the edges.
Page 1, “Abstract”
(Carenini et al., 2007), semantic similarity and cosine similarity .
Page 2, “Introduction”
and (3) cosine similarity that is based on the word TFIDF vector.
Page 4, “Extracting Conversations from Multiple Emails”
3.3.3 Cosine Similarity
Page 4, “Extracting Conversations from Multiple Emails”
Cosine similarity is a popular metric to compute the similarity of two text units.
Page 4, “Extracting Conversations from Multiple Emails”
Hence, the cosine similarity of two sentences 52-53
Page 4, “Extracting Conversations from Multiple Emails”
In the rest of this paper, let CWS denote the Generalized ClueWordSummarizer when the edge weight is based on clue words, and let CWS-Cosine and CWS-Semantic denote the summarizer when the edge weight is cosine similarity and semantic similarity respectively.
Page 5, “Summarization Based on the Sentence Quotation Graph”
In Section 3.3, we developed three ways to compute the weight of an edge in the sentence quotation graph, i.e., clue words, semantic similarity based on WordNet and cosine similarity .
Page 7, “Empirical Evaluation”
The widely used cosine similarity does not perform well.
Page 7, “Empirical Evaluation”
The above experiments show that the widely used cosine similarity and the more sophisticated semantic similarity in WordNet are less accurate than the basic CWS in the summarization framework.
Page 7, “Empirical Evaluation”
We adopt three cohesion metrics, clue words, semantic similarity and cosine similarity , to measure the weight of the edges.
Page 8, “Conclusions”

See all papers in Proc. ACL 2008 that mention cosine similarity.

See all papers in Proc. ACL that mention cosine similarity.

Back to top.

WordNet

Appears in 5 sentences as: WordNet (5)

In Summarizing Emails with Conversational Cohesion and Subjectivity

We explore three types of cohesion measures: (1) clue words that are based on stems, (2) semantic distance based on WordNet
Page 4, “Extracting Conversations from Multiple Emails”
3.3.2 Semantic Similarity Based on WordNet
Page 4, “Extracting Conversations from Multiple Emails”
We use the well-known lexical database WordNet to get the semantic similarity of two words.
Page 4, “Extracting Conversations from Multiple Emails”
In Section 3.3, we developed three ways to compute the weight of an edge in the sentence quotation graph, i.e., clue words, semantic similarity based on WordNet and cosine similarity.
Page 7, “Empirical Evaluation”
The above experiments show that the widely used cosine similarity and the more sophisticated semantic similarity in WordNet are less accurate than the basic CWS in the summarization framework.
Page 7, “Empirical Evaluation”

See all papers in Proc. ACL 2008 that mention WordNet.

See all papers in Proc. ACL that mention WordNet.

Back to top.

edge weight

Appears in 4 sentences as: edge weight (3) Edge Weights (1) edge weights (1)

In Summarizing Emails with Conversational Cohesion and Subjectivity

In the rest of this paper, let CWS denote the Generalized ClueWordSummarizer when the edge weight is based on clue words, and let CWS-Cosine and CWS-Semantic denote the summarizer when the edge weight is cosine similarity and semantic similarity respectively.
Page 5, “Summarization Based on the Sentence Quotation Graph”
Table l: Generalized CWS with Different Edge Weights
Page 7, “Empirical Evaluation”
Table 2 compares Page-Rank and CWS under different edge weights .
Page 8, “Empirical Evaluation”
Moreover, when we compare Table l and 2 together, we can find that, for each kind of edge weight , Page-Rank has a lower accuracy than the corresponding Generalized CWS.
Page 8, “Empirical Evaluation”

See all papers in Proc. ACL 2008 that mention edge weight.

See all papers in Proc. ACL that mention edge weight.

Back to top.

graph-based

Appears in 4 sentences as: graph-based (4)

In Summarizing Emails with Conversational Cohesion and Subjectivity

Second, we use two graph-based summarization approaches, Generalized ClueWordSummarizer and Page-Rank, to extract sentences as summaries.
Page 1, “Abstract”
Third, we propose a summarization approach based on subjective opinions and integrate it with the graph-based ones.
Page 1, “Abstract”
Finally, we did not compared CWS to other possible graph-based approaches as we propose in this paper.
Page 2, “Related Work”
Other than the conversation structure, the measures of cohesion and the graph-based summarization methods we have proposed, the importance of a sentence in emails can be captured from other aspects.
Page 5, “Summarization with Subjective Opinions”

See all papers in Proc. ACL 2008 that mention graph-based.

See all papers in Proc. ACL that mention graph-based.

Back to top.

significantly improve

Appears in 3 sentences as: significantly improve (3)

In Summarizing Emails with Conversational Cohesion and Subjectivity

Moreover, subjective words can significantly improve accuracy.
Page 1, “Abstract”
Our empirical evaluations show that subjective words and phrases can significantly improve email summarization.
Page 2, “Introduction”
Their experiments showed that features about emails and the email thread could significantly improve the accuracy of summarization.
Page 2, “Related Work”

See all papers in Proc. ACL 2008 that mention significantly improve.

See all papers in Proc. ACL that mention significantly improve.

Back to top.

statistical significance

Appears in 3 sentences as: statistical significance (2) statistically significant (1)

In Summarizing Emails with Conversational Cohesion and Subjectivity

As to the statistical significance , we use the 2-tail pairwise student t-test in all the experiments to compare two specific methods.
Page 6, “Empirical Evaluation”
Meanwhile, both semantic similarities have lower accuracy than CWS, and the differences are also statistically significant even with the conservative Bonferroni adjustment (i.e., the p-values in Table 1 are multiplied by three).
Page 7, “Empirical Evaluation”
The increase in precision is at least 0.04 with statistical significance .
Page 8, “Empirical Evaluation”

See all papers in Proc. ACL 2008 that mention statistical significance.

See all papers in Proc. ACL that mention statistical significance.

Back to top.