Using Syntax to Disambiguate Explicit Discourse Connectives in Text
Pitler, Emily and Nenkova, Ani

Article Structure

Abstract

Discourse connectives are words or phrases such as once, since, and on the contrary that explicitly signal the presence of a discourse relation.

Introduction

Discourse connectives are often used to explicitly mark the presence of a discourse relation between two textual units.

Corpus and features

2.1 Penn Discourse Treebank

Discourse vs. non-discourse usage

Of the 100 connectives annotated in the PDTB, only 11 appear as a discourse connective more than 90% of the time: although, in turn, afterward, consequently, additionally, alternatively, whereas, on the contrary, if and when, lest, and on the one hand...on the other hand.

Sense classification

While most connectives almost always occur with just one of the senses (for example, because is almost always a Contingency), a few are quite ambiguous.

Conclusion

We have shown that using a few syntactic features leads to state-of-the-art accuracy for discourse vs. non-discourse usage classification.

Topics

f-score

Appears in 7 sentences as: f-score (7)
In Using Syntax to Disambiguate Explicit Discourse Connectives in Text
  1. Using the string of the connective as the only feature sets a reasonably high baseline, with an f-score of 75.33% and an accuracy of 85.86%.
    Page 3, “Discourse vs. non-discourse usage”
  2. Interestingly, using only the syntactic features, ignoring the identity of the connective, is even better, resulting in an f-score of 88.19% and accuracy of 92.25%.
    Page 3, “Discourse vs. non-discourse usage”
  3. Using both the connective and syntactic features is better than either individually, with an f-score of 92.28% and accuracy of 95.04%.
    Page 3, “Discourse vs. non-discourse usage”
  4. Including pairwise interaction features between the connective and each syntactic feature (features like connective=also-RightSibling=SBAR) raised the f-score about 1.5%, to 93.63%.
    Page 3, “Discourse vs. non-discourse usage”
  5. Adding interaction terms between pairs of syntactic features raises the f-score
    Page 3, “Discourse vs. non-discourse usage”
  6. Features Accuracy f-score
    Page 3, “Discourse vs. non-discourse usage”
  7. These results amount to a 10% absolute improvement over those obtained by Marcu (2000) in his corpus-based approach which achieves an f-score of 84.9%3 for identifying discourse connectives in text.
    Page 3, “Discourse vs. non-discourse usage”

See all papers in Proc. ACL 2009 that mention f-score.

See all papers in Proc. ACL that mention f-score.

Back to top.

Treebank

Appears in 3 sentences as: Treebank (3)
In Using Syntax to Disambiguate Explicit Discourse Connectives in Text
  1. 2.1 Penn Discourse Treebank
    Page 2, “Corpus and features”
  2. In our work we use the Penn Discourse Treebank (PDTB) (Prasad et al., 2008), the largest public resource containing discourse annotations.
    Page 2, “Corpus and features”
  3. The syntactic features we used were extracted from the gold standard Penn Treebank (Marcus et al., 1994) parses of the PDTB articles:
    Page 2, “Corpus and features”

See all papers in Proc. ACL 2009 that mention Treebank.

See all papers in Proc. ACL that mention Treebank.

Back to top.