PDTB-style Discourse Annotation of Chinese Text
Zhou, Yuping and Xue, Nianwen

Article Structure

Abstract

We describe a discourse annotation scheme for Chinese and report on the preliminary results.

Introduction

In the realm of discourse annotation, the Penn Discourse TreeBank (PDTB) (Prasad et al., 2008) separates itself by adopting a lexically grounded approach: Discourse relations are lexically anchored by discourse connectives (e.g., because, but, therefore), which are viewed as predicates that take abstract objects such as propositions, events and states as their arguments.

The PDTB annotation scheme

As mentioned in the introduction, discourse relation is viewed as a predication with two arguments in the framework of the PDTB.

Adapted scheme for Chinese

3.1 Key characteristics of Chinese text

Annotation experiment

To test our adapted annotation scheme, we have conducted annotation experiments on a modest, yet significant, amount of data and computed agreement statistics.

Conclusions

We have presented a discourse annotation scheme for Chinese that adopts the lexically ground approach of the PDTB while making systematic adaptations motivated by characteristics of Chinese text.

Topics

TreeBank

Appears in 4 sentences as: TreeBank (2) Treebank (2)
In PDTB-style Discourse Annotation of Chinese Text
  1. Our scheme, inspired by the Penn Discourse TreeBank (PDTB), adopts the lexically grounded approach; at the same time, it makes adaptations based on the linguistic and statistical characteristics of Chinese text.
    Page 1, “Abstract”
  2. In the realm of discourse annotation, the Penn Discourse TreeBank (PDTB) (Prasad et al., 2008) separates itself by adopting a lexically grounded approach: Discourse relations are lexically anchored by discourse connectives (e.g., because, but, therefore), which are viewed as predicates that take abstract objects such as propositions, events and states as their arguments.
    Page 1, “Introduction”
  3. According to a rough count on 20 randomly selected files from Chinese Treebank (Xue et al., 2005), 82% are tokens of implicit relation, compared to 54.5% in the PDTB 2.0.
    Page 5, “Adapted scheme for Chinese”
  4. The data set consists of 98 files taken from the Chinese Treebank (Xue et al., 2005).
    Page 7, “Annotation experiment”

See all papers in Proc. ACL 2012 that mention TreeBank.

See all papers in Proc. ACL that mention TreeBank.

Back to top.

lexicalized

Appears in 3 sentences as: lexicalized (3)
In PDTB-style Discourse Annotation of Chinese Text
  1. o AltLex: when insertion of a connective leads to redundancy due to the presence of an alternatively lexicalized expression, as in (2).
    Page 2, “The PDTB annotation scheme”
  2. o NoRel: when neither a lexicalized discourse relation nor entity-based coherence is present.
    Page 2, “The PDTB annotation scheme”
  3. Implicit) or overt, lexicalized as discourse connectives (i.e.
    Page 8, “Conclusions”

See all papers in Proc. ACL 2012 that mention lexicalized.

See all papers in Proc. ACL that mention lexicalized.

Back to top.