Genre distinctions for discourse in the Penn TreeBank
Webber, Bonnie

Article Structure

Abstract

Articles in the Penn TreeBank were identified as being reviews, summaries, letters to the editor, news reportage, corrections, wit and short verse, or quarterly profit reports.

Introduction

It is well-known that texts differ from each other in a variety of ways, including their topic, the reading level of their intended audience, and their intended purpose (eg, to instruct, to inform, to express an opinion, to summarize, to take issue with or disagree, to correct, to entertain, etc.).

Two Perspectives on Genre

The dimension of text variation of interest here is genre, which can be viewed externally, in terms of the communicative purpose of a text (Swales, 1990), or internally, in terms of features common to texts sharing a communicative purpose.

Genre in the Penn TreeBank

Although the files in the Penn TreeBank (PTB) lack any classificatory meta-data, leading the PTB to be treated as a single homogeneous collection of “news articles”, researchers who have manually examined it in detail have noted that it includes a variety of “financial reports, general interest stories, business-related news, cultural reviews, editorials and letters to the editor” (Carlson et al., 2002, p. 7).

The Penn Discourse TreeBank

Genre differences at the level of discourse in the PTB can be seen in the manual annotations of the Penn Discourse TreeBank (Prasad et al., 2008).

Connective Frequency by Genre

The analysis that follows distinguishes between two kinds of relations associated with explicit connectives in the PDTB: (l) intra-sentential discourse relations, which hold between clauses within the same sentence and are associated with subordinating conjunctions, intra-sentential coordinating conjunctions, and discourse adverbials whose arguments occur within the same sentences); and (2) explicit inter-sentential discourse relations, which hold across sentences and are associated with explicit inter-sentential connectives (inter-sentential coordinating conjunctions and discourse adverbials whose arguments are not

Connective Sense by Genre

(Pitler et al., 2008) show a difference across Level 1 senses (COMPARISON, CONTINGENCY, TEMPORAL and EXPANSION) in the PDTB in terms of their tendency to be realised by explicit connectives (a tendency of COMPARISON and TEMPORAL relations) or by Implicit Connectives (a tendency of CONTINGENCY and EXPANSION).

Automated Sense Labelling of Discourse Connectives

The focus here is on automated sense labelling of discourse connectives (Elwell and Baldridge, 2008; Marcu and Echihabi, 2002; Pitler et al., 2009; Wellner and Pustejovsky, 2007; Wellner,

Conclusion

This paper has, for the first time, provided genre information about the articles in the Penn TreeBank.

Topics

TreeBank

Appears in 10 sentences as: TreeBank (11)
In Genre distinctions for discourse in the Penn TreeBank
  1. Articles in the Penn TreeBank were identified as being reviews, summaries, letters to the editor, news reportage, corrections, wit and short verse, or quarterly profit reports.
    Page 1, “Abstract”
  2. All but the latter three were then characterised in terms of features manually annotated in the Penn Discourse TreeBank — discourse connectives and their senses.
    Page 1, “Abstract”
  3. This paper considers differences in texts in the well-known Penn TreeBank (hereafter, PTB) and in particular, how these differences show up in the Penn Discourse TreeBank (Prasad et al., 2008).
    Page 1, “Introduction”
  4. After a brief introduction to the Penn Discourse TreeBank (hereafter, PDTB) in Section 4, Sections 5 and 6 show that these four genres display differences in connective frequency and in terms of the senses associated with intra-sentential connectives (eg, subordinating conjunctions), inter-sentential connectives (eg, inter-sentential coordinating conjunctions) and those inter-sentential relations that are not lexically marked.
    Page 1, “Introduction”
  5. Although the files in the Penn TreeBank (PTB) lack any classificatory meta-data, leading the PTB to be treated as a single homogeneous collection of “news articles”, researchers who have manually examined it in detail have noted that it includes a variety of “financial reports, general interest stories, business-related news, cultural reviews, editorials and letters to the editor” (Carlson et al., 2002, p. 7).
    Page 2, “Genre in the Penn TreeBank”
  6. In lieu of any informative meta-data in the PTB filesl, I looked at line-level patterns in the 2159 files that make up the Penn Discourse TreeBank subset of the PTB, and then manually confirmed the text types I found.2 The resulting set includes all the
    Page 2, “Genre in the Penn TreeBank”
  7. the Penn TreeBank that aren’t included in the PDTB.
    Page 3, “Genre in the Penn TreeBank”
  8. Genre differences at the level of discourse in the PTB can be seen in the manual annotations of the Penn Discourse TreeBank (Prasad et al., 2008).
    Page 4, “The Penn Discourse TreeBank”
  9. This paper has, for the first time, provided genre information about the articles in the Penn TreeBank .
    Page 8, “Conclusion”
  10. It has characterised each genre in terms of features manually annotated in the Penn Discourse TreeBank , and used this to show that genre should be made a factor in automated sense labelling of discourse relations that are not explicitly marked.
    Page 8, “Conclusion”

See all papers in Proc. ACL 2009 that mention TreeBank.

See all papers in Proc. ACL that mention TreeBank.

Back to top.

Penn TreeBank

Appears in 5 sentences as: Penn TreeBank (5)
In Genre distinctions for discourse in the Penn TreeBank
  1. Articles in the Penn TreeBank were identified as being reviews, summaries, letters to the editor, news reportage, corrections, wit and short verse, or quarterly profit reports.
    Page 1, “Abstract”
  2. This paper considers differences in texts in the well-known Penn TreeBank (hereafter, PTB) and in particular, how these differences show up in the Penn Discourse TreeBank (Prasad et al., 2008).
    Page 1, “Introduction”
  3. Although the files in the Penn TreeBank (PTB) lack any classificatory meta-data, leading the PTB to be treated as a single homogeneous collection of “news articles”, researchers who have manually examined it in detail have noted that it includes a variety of “financial reports, general interest stories, business-related news, cultural reviews, editorials and letters to the editor” (Carlson et al., 2002, p. 7).
    Page 2, “Genre in the Penn TreeBank”
  4. the Penn TreeBank that aren’t included in the PDTB.
    Page 3, “Genre in the Penn TreeBank”
  5. This paper has, for the first time, provided genre information about the articles in the Penn TreeBank .
    Page 8, “Conclusion”

See all papers in Proc. ACL 2009 that mention Penn TreeBank.

See all papers in Proc. ACL that mention Penn TreeBank.

Back to top.

manually annotated

Appears in 4 sentences as: manual annotations (1) manually annotated (3)
In Genre distinctions for discourse in the Penn TreeBank
  1. All but the latter three were then characterised in terms of features manually annotated in the Penn Discourse TreeBank — discourse connectives and their senses.
    Page 1, “Abstract”
  2. Genre differences at the level of discourse in the PTB can be seen in the manual annotations of the Penn Discourse TreeBank (Prasad et al., 2008).
    Page 4, “The Penn Discourse TreeBank”
  3. These have been manually annotated using the three-level sense hierarchy described in detail in (Miltsakaki et al., 2008).
    Page 4, “The Penn Discourse TreeBank”
  4. It has characterised each genre in terms of features manually annotated in the Penn Discourse TreeBank, and used this to show that genre should be made a factor in automated sense labelling of discourse relations that are not explicitly marked.
    Page 8, “Conclusion”

See all papers in Proc. ACL 2009 that mention manually annotated.

See all papers in Proc. ACL that mention manually annotated.

Back to top.