Kernel Based Discourse Relation Recognition with Temporal Ordering Information
Wang, WenTing and Su, Jian and Tan, Chew Lim

Article Structure

Abstract

Syntactic knowledge is important for discourse relation recognition.

Introduction

Discourse relations capture the internal structure and logical relationship of coherent text, including Temporal, Causal and Contrastive relations etc.

Penn Discourse Tree Bank

The Penn Discourse Treebank (PDTB) is the largest available annotated corpora of discourse relations (Prasad et al., 2008) over 2,312 Wall Street Journal articles.

Related Work

Tree Kernel based Approach in NLP.

The Recognition Framework

In the learning framework, a training or testing instance is formed by a non-overlapping clause(s)/sentence(s) pair.

Incorporating Structural Syntactic Information

A parse tree that covers both discourse arguments could provide us much syntactic information related to the pair.

Using Temporal Ordering Information

In our discourse analyzer, we also add in temporal information to be used as features to predict discourse relations.

Experiments and Results

In this section we provide the results of a set of experiments focused on the task of simultaneous discourse identification and classification.

Conclusions and Future Works

The purpose of this paper is to explore how to make use of the structural syntactic knowledge to do discourse relation recognition.

Topics

tree kernel

Appears in 33 sentences as: Tree Kernel (8) Tree kernel (3) tree kernel (24) tree kernels (1)
In Kernel Based Discourse Relation Recognition with Temporal Ordering Information
  1. In this paper we propose using tree kernel based approach to automatically mine the syntactic information from the parse trees for discourse analysis, applying kernel function to the tree structures directly.
    Page 1, “Abstract”
  2. The experiment shows tree kernel approach is able to give statistical significant improvements over flat syntactic path feature.
    Page 1, “Abstract”
  3. We also illustrate that tree kernel approach covers more structure information than the production rules, which allows tree kernel to further incorporate information from a higher dimension space for possible better discrimination.
    Page 1, “Abstract”
  4. In this paper we propose using tree kernel based method to automatically mine the syntactic
    Page 1, “Introduction”
  5. The experiment shows that tree kernel is able to effectively incorporate syntactic structural information and produce statistical significant improvements over flat syntactic path feature for the recognition of both explicit and implicit relation in Penn Discourse Treebank (PDTB; Prasad et al., 2008).
    Page 2, “Introduction”
  6. We also illustrate that tree kernel approach covers more structure information than the production rules, which allows tree kernel to further work on a higher dimensional space for possible better discrimination.
    Page 2, “Introduction”
  7. Section 3 gives the related work on tree kernel approach in NLP and its difference with production rules, and also linguistic study on tense and discourse anaphor.
    Page 2, “Introduction”
  8. Tree Kernel based Approach in NLP.
    Page 3, “Related Work”
  9. While the feature based approach may not be able to fully utilize the syntactic information in a parse tree, an alternative to the feature-based methods, tree kernel methods (Haussler, 1999) have been proposed to implicitly explore features in a high dimensional space by employing a kernel function to calculate the similarity between two objects directly.
    Page 3, “Related Work”
  10. Other sub-trees beyond 2-level (e. g. Tf- 7}) are only captured in the tree kernel, which allows tree kernel to further leverage on information from higher dimension space for possible better discrimination.
    Page 3, “Related Work”
  11. Different subtree sets for T1 used by 2-level production rules and convolution tree kernel approaches.
    Page 3, “Related Work”

See all papers in Proc. ACL 2010 that mention tree kernel.

See all papers in Proc. ACL that mention tree kernel.

Back to top.

parse tree

Appears in 25 sentences as: Parse Tree (1) parse tree (15) parse tree: (1) parse trees (8) parsing trees (1)
In Kernel Based Discourse Relation Recognition with Temporal Ordering Information
  1. In this paper we propose using tree kernel based approach to automatically mine the syntactic information from the parse trees for discourse analysis, applying kernel function to the tree structures directly.
    Page 1, “Abstract”
  2. Nevertheless, Ben and James (2007) only uses flat syntactic path connecting connective and arguments in the parse tree .
    Page 1, “Introduction”
  3. (2009) uses 2-level production rules to represent parse tree information.
    Page 1, “Introduction”
  4. information from the parse trees for discourse analysis, applying kernel function to the parse tree structures directly.
    Page 2, “Introduction”
  5. While the feature based approach may not be able to fully utilize the syntactic information in a parse tree , an alternative to the feature-based methods, tree kernel methods (Haussler, 1999) have been proposed to implicitly explore features in a high dimensional space by employing a kernel function to calculate the similarity between two objects directly.
    Page 3, “Related Work”
  6. One advantage of SVM is that we can use tree kernel approach to capture syntactic parse tree information in a particular high-dimension space.
    Page 4, “The Recognition Framework”
  7. A parse tree that covers both discourse arguments could provide us much syntactic information related to the pair.
    Page 4, “Incorporating Structural Syntactic Information”
  8. Both the syntactic flat path connecting connective and arguments and the 2-level production rules in the parse tree used in previous study can be directly described by the tree structure.
    Page 4, “Incorporating Structural Syntactic Information”
  9. To present their syntactic properties and relations in a single tree structure, we construct a syntax tree for each paragraph by attaching the parsing trees of all its sentences to an upper paragraph node.
    Page 4, “Incorporating Structural Syntactic Information”
  10. ragraph, thus paragraph parse trees are sufficient.
    Page 5, “Incorporating Structural Syntactic Information”
  11. Having obtained the parse tree of a paragraph, we shall consider how to select the appropriate portion of the tree as the structured feature for a given instance.
    Page 5, “Incorporating Structural Syntactic Information”

See all papers in Proc. ACL 2010 that mention parse tree.

See all papers in Proc. ACL that mention parse tree.

Back to top.

statistical significant

Appears in 8 sentences as: statistical significance (1) statistical significant (9)
In Kernel Based Discourse Relation Recognition with Temporal Ordering Information
  1. The experiment shows tree kernel approach is able to give statistical significant improvements over flat syntactic path feature.
    Page 1, “Abstract”
  2. Besides, we further propose to leverage on temporal ordering information to constrain the interpretation of discourse relation, which also demonstrate statistical significant improvements for discourse relation recognition on PDTB 2.0 for both explicit and implicit as well.
    Page 1, “Abstract”
  3. The experiment shows that tree kernel is able to effectively incorporate syntactic structural information and produce statistical significant improvements over flat syntactic path feature for the recognition of both explicit and implicit relation in Penn Discourse Treebank (PDTB; Prasad et al., 2008).
    Page 2, “Introduction”
  4. Besides, inspired by the linguistic study on tense and discourse anaphor (Webber, 1988), we further propose to incorporate temporal ordering information to constrain the interpretation of discourse relation, which also demonstrates statistical significant improvements for discourse relation recognition on PDTB v2.0 for both explicit and implicit relations.
    Page 2, “Introduction”
  5. conduct chi square statistical significance test on All relations between flat path approach and Simple-Expansion approach, which shows the performance improvements are statistical significant (p < 0.05) through incorporating tree kernel.
    Page 8, “Experiments and Results”
  6. We conduct chi square statistical significant test on All relations, which shows the performance improvement is statistical significant (p < 0.05).
    Page 9, “Experiments and Results”
  7. The experimental results on PDTB v2.0 show that our kernel-based approach is able to give statistical significant improvement over flat syntactic path method.
    Page 9, “Conclusions and Future Works”
  8. In addition, we also propose to incorporate temporal ordering information to constrain the interpretation of discourse relations, which also demonstrate statistical significant improvements for discourse relation recognition, both explicit and implicit.
    Page 9, “Conclusions and Future Works”

See all papers in Proc. ACL 2010 that mention statistical significant.

See all papers in Proc. ACL that mention statistical significant.

Back to top.

Natural Language

Appears in 7 sentences as: Natural Language (7) natural language (1)
In Kernel Based Discourse Relation Recognition with Temporal Ordering Information
  1. The ability of recognizing such relations between text units including identifying and classifying provides important information to other natural language processing systems, such as language genenuknn docunnnu sunnnafizafion, and question answering.
    Page 1, “Introduction”
  2. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and
    Page 9, “Conclusions and Future Works”
  3. Computational Natural Language Learning, pages 92—101.
    Page 9, “Conclusions and Future Works”
  4. Convolution Kernels for Natural Language .
    Page 10, “Conclusions and Future Works”
  5. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP 2009), Singapore, August.
    Page 10, “Conclusions and Future Works”
  6. The Stanford Natural Language Processing Group, June 6.
    Page 10, “Conclusions and Future Works”
  7. In Proceedings of the Joint Conference of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (ACL-IJCNLP 2009).
    Page 10, “Conclusions and Future Works”

See all papers in Proc. ACL 2010 that mention Natural Language.

See all papers in Proc. ACL that mention Natural Language.

Back to top.

significant improvements

Appears in 6 sentences as: significant improvement (1) significant improvements (5)
In Kernel Based Discourse Relation Recognition with Temporal Ordering Information
  1. The experiment shows tree kernel approach is able to give statistical significant improvements over flat syntactic path feature.
    Page 1, “Abstract”
  2. Besides, we further propose to leverage on temporal ordering information to constrain the interpretation of discourse relation, which also demonstrate statistical significant improvements for discourse relation recognition on PDTB 2.0 for both explicit and implicit as well.
    Page 1, “Abstract”
  3. The experiment shows that tree kernel is able to effectively incorporate syntactic structural information and produce statistical significant improvements over flat syntactic path feature for the recognition of both explicit and implicit relation in Penn Discourse Treebank (PDTB; Prasad et al., 2008).
    Page 2, “Introduction”
  4. Besides, inspired by the linguistic study on tense and discourse anaphor (Webber, 1988), we further propose to incorporate temporal ordering information to constrain the interpretation of discourse relation, which also demonstrates statistical significant improvements for discourse relation recognition on PDTB v2.0 for both explicit and implicit relations.
    Page 2, “Introduction”
  5. The experimental results on PDTB v2.0 show that our kernel-based approach is able to give statistical significant improvement over flat syntactic path method.
    Page 9, “Conclusions and Future Works”
  6. In addition, we also propose to incorporate temporal ordering information to constrain the interpretation of discourse relations, which also demonstrate statistical significant improvements for discourse relation recognition, both explicit and implicit.
    Page 9, “Conclusions and Future Works”

See all papers in Proc. ACL 2010 that mention significant improvements.

See all papers in Proc. ACL that mention significant improvements.

Back to top.

Treebank

Appears in 6 sentences as: TreeBank (2) Treebank (3) Treebanks (1)
In Kernel Based Discourse Relation Recognition with Temporal Ordering Information
  1. The experiment shows that tree kernel is able to effectively incorporate syntactic structural information and produce statistical significant improvements over flat syntactic path feature for the recognition of both explicit and implicit relation in Penn Discourse Treebank (PDTB; Prasad et al., 2008).
    Page 2, “Introduction”
  2. The Penn Discourse Treebank (PDTB) is the largest available annotated corpora of discourse relations (Prasad et al., 2008) over 2,312 Wall Street Journal articles.
    Page 2, “Penn Discourse Tree Bank”
  3. We directly use the golden standard parse trees in Penn TreeBank .
    Page 7, “Experiments and Results”
  4. In Proceedings of the 5th International Workshop on Treebanks and Linguistic Theories.
    Page 10, “Conclusions and Future Works”
  5. Recognizing Implicit Discourse Relations in the Penn Discourse Treebank .
    Page 10, “Conclusions and Future Works”
  6. The Penn Discourse TreeBank 2.0.
    Page 10, “Conclusions and Future Works”

See all papers in Proc. ACL 2010 that mention Treebank.

See all papers in Proc. ACL that mention Treebank.

Back to top.

SVM

Appears in 5 sentences as: SVM (5)
In Kernel Based Discourse Relation Recognition with Temporal Ordering Information
  1. Section 4 introduces the frame work for discourse recognition, as well as the baseline feature space and the SVM classifier.
    Page 2, “Introduction”
  2. The classifier learned by SVM is:
    Page 4, “The Recognition Framework”
  3. One advantage of SVM is that we can use tree kernel approach to capture syntactic parse tree information in a particular high-dimension space.
    Page 4, “The Recognition Framework”
  4. And thus an SVM classifier can be learned and then used for recognition.
    Page 4, “Incorporating Structural Syntactic Information”
  5. We employ an SVM coreference resolver trained and tested on ACE 2005 with 79.5% Precision, 66.7% Recall and 72.5% F1 to label coreference mentions of the same named entity in an article.
    Page 7, “Experiments and Results”

See all papers in Proc. ACL 2010 that mention SVM.

See all papers in Proc. ACL that mention SVM.

Back to top.

Relation Extraction

Appears in 4 sentences as: Relation Extraction (3) relation extraction (1)
In Kernel Based Discourse Relation Recognition with Temporal Ordering Information
  1. Indeed, using kernel methods to mine structural knowledge has shown success in some NLP applications like parsing (Collins and Duffy, 2001; Moschitti, 2004) and relation extraction (Zelenko et al., 2003; Zhang et al., 2006).
    Page 3, “Related Work”
  2. Dependency Tree Kernel for Relation Extraction .
    Page 9, “Conclusions and Future Works”
  3. Kernel Methods for Relation Extraction .
    Page 10, “Conclusions and Future Works”
  4. Exploring Syntactic Features for Relation Extraction using a Convolution Tree Kernel.
    Page 10, “Conclusions and Future Works”

See all papers in Proc. ACL 2010 that mention Relation Extraction.

See all papers in Proc. ACL that mention Relation Extraction.

Back to top.

feature vector

Appears in 3 sentences as: feature vector (3)
In Kernel Based Discourse Relation Recognition with Temporal Ordering Information
  1. Suppose the training set S consists of labeled vectors {(xi, 311)}, where xi is the feature vector
    Page 4, “The Recognition Framework”
  2. where a,- is the learned parameter for a feature vector xi, and b is another parameter which can be derived from a,- .
    Page 4, “The Recognition Framework”
  3. Thus, it is computational infeasible to directly use the feature vector (MT).
    Page 6, “Incorporating Structural Syntactic Information”

See all papers in Proc. ACL 2010 that mention feature vector.

See all papers in Proc. ACL that mention feature vector.

Back to top.