A Novel Discourse Parser Based on Support Vector Machine Classification
duVerle, David and Prendinger, Helmut

Article Structure

Abstract

This paper introduces a new algorithm to parse discourse within the framework of Rhetorical Structure Theory (RST).

Introduction

According to Mann and Thompson (1988), all well-written text is supported by a hierarchically structured set of coherence relations which reflect the authors intent.

Building a Discourse Parser

2.1 Assumptions and Restrictions

Features

Instrumental to our system’s performance is the choice of a set of salient characteristics (“features”) to be used as input to the SVM algorithm for training and classification.

Evaluation

4.1 General Considerations

Conclusions and Future Work

In this paper, we have shown that it is possible to build an accurate automatic text-level discourse parser based on supervised machine-learning algorithms, using a feature-driven approach and a manually annotated corpus.

Topics

discourse parsing

Appears in 10 sentences as: discourse parse (1) discourse parser (2) Discourse parsing (1) discourse parsing (6)
In A Novel Discourse Parser Based on Support Vector Machine Classification
  1. The goal of discourse parsing is to extract this high-level, rhetorical structure.
    Page 1, “Introduction”
  2. Discourse parsing , on the other hand, focuses on a higher-level view of text, allowing some flexibility in the choice of formal representation while providing a wide range of applications in both analytical and computational linguistics.
    Page 1, “Introduction”
  3. Several attempts to automate discourse parsing have been made.
    Page 1, “Introduction”
  4. To the best of our knowledge, Reitter’s (2003b) was the only previous research based exclusively on feature-rich supervised learning to produce text-level RST discourse parse trees.
    Page 2, “Introduction”
  5. In this paper, we advanced the state-of-the-art in general discourse parsing , with an implemented solution that is computationally efficient and sufficiently accurate for use in real-time interactive applications.
    Page 2, “Introduction”
  6. In our work, we focused exclusively on the second step of the discourse parsing problem, i.e., constructing the RST tree from a sequence of edus that have been segmented beforehand.
    Page 2, “Building a Discourse Parser”
  7. The motivation for leaving aside segmenting were both practical — previous discourse parsing efforts (Soricut and Marcu, 2003; LeThanh et al., 2004) already provide alternatives for standalone segmenting tools — and scientific, namely, the greater need for improvements in labeling.
    Page 2, “Building a Discourse Parser”
  8. To the best of our knowledge, only two fully functional text-level discourse parsing algorithms for general text have published their results: Marcu’s decision-tree-based parser (Marcu, 2000) and the multilevel rule-based system built by LeThanh et al.
    Page 7, “Evaluation”
  9. In this paper, we have shown that it is possible to build an accurate automatic text-level discourse parser based on supervised machine-learning algorithms, using a feature-driven approach and a manually annotated corpus.
    Page 8, “Conclusions and Future Work”
  10. A complete online discourse parser , incorporating the parsing tool presented above combined with a new segmenting method has since been made freely available at http: / /nlp .
    Page 8, “Conclusions and Future Work”

See all papers in Proc. ACL 2009 that mention discourse parsing.

See all papers in Proc. ACL that mention discourse parsing.

Back to top.

SVM

Appears in 10 sentences as: SVM (12)
In A Novel Discourse Parser Based on Support Vector Machine Classification
  1. l SVM Training | ‘ Feature Extraction w Classification [ SVM Models (Binary and Multiclass) J v [ Scored RS sub-trees ] v ‘ Bottom-up Tree Construction
    Page 3, “Building a Discourse Parser”
  2. Support Vector Machines (SVM) (Vapnik, 1995) are used to model classifiers S and L. SVM refers to a set of supervised learning algorithms that are based on margin maximization.
    Page 3, “Building a Discourse Parser”
  3. This makes SVM well-fitted to treat classification problems involving relatively large feature spaces such as ours (% 105 features).
    Page 3, “Building a Discourse Parser”
  4. SVM algorithms make use of the ‘kernel trick’ (Aizerman et al., 1964), a method for using linear classifiers to solve nonlinear problems.
    Page 3, “Building a Discourse Parser”
  5. Because the original SVM algorithms build binary classifiers, multi-label classification requires some adaptation.
    Page 3, “Building a Discourse Parser”
  6. Since our SVM classifiers work in linear time, the overall time-complexity of our algorithm is O(n).
    Page 4, “Building a Discourse Parser”
  7. Instrumental to our system’s performance is the choice of a set of salient characteristics (“features”) to be used as input to the SVM algorithm for training and classification.
    Page 4, “Features”
  8. 4.2 Raw SVM Classification
    Page 6, “Evaluation”
  9. Although our final goal is to achieve good performance on the entire tree-building task, a useful intermediate evaluation of our system can be conducted by measuring raw performance of SVM classifiers.
    Page 6, “Evaluation”
  10. Table 1: SVM Classifier performance.
    Page 6, “Evaluation”

See all papers in Proc. ACL 2009 that mention SVM.

See all papers in Proc. ACL that mention SVM.

Back to top.

edus

Appears in 9 sentences as: EDUS (1) EDUs (2) edus (5) ‘edus’ (1)
In A Novel Discourse Parser Based on Support Vector Machine Classification
  1. Segmentation of the input text into elementary discourse units ( ‘edus’ ).
    Page 1, “Introduction”
  2. Generation of the rhetorical structure tree based on ‘rhetorical relations’ (or ‘coherence relations’) as labels of the tree, with the edus constituting its terminal nodes.
    Page 1, “Introduction”
  3. In our work, we focused exclusively on the second step of the discourse parsing problem, i.e., constructing the RST tree from a sequence of edus that have been segmented beforehand.
    Page 2, “Building a Discourse Parser”
  4. At the core of our system is a set of classifiers, trained through supervised-learning, which, given two consecutive spans (atomic edus or RST sub-trees) in an input document, will score the likelihood of a direct structural relation as well as probabilities for such a relation’s label and nuclearity.
    Page 2, “Building a Discourse Parser”
  5. EDUS Syntax Trees I Syntax Parsing (Charniak's n/parse) ' v I Tokenization l I Lexicalization l .
    Page 3, “Building a Discourse Parser”
  6. Lexicalized (SPADE) Tokenized EDUs Syntax Trees
    Page 3, “Building a Discourse Parser”
  7. Tokenized EDUs
    Page 3, “Building a Discourse Parser”
  8. The algorithm starts with a list of all atomic discourse sub-trees (made of single edus in their text order) and recursively selects the best match between adjacent sub-trees (using binary classifier S), labels the newly created subtree (using multi-label classifier L) and updates scoring for S, until only one subtree is left: the complete rhetorical parse tree for the input text.
    Page 4, “Building a Discourse Parser”
  9. Therefore, it seems useful to encode different measures of span size and positioning, using either tokens or edus as a distance unit:
    Page 4, “Features”

See all papers in Proc. ACL 2009 that mention edus.

See all papers in Proc. ACL that mention edus.

Back to top.

F-score

Appears in 7 sentences as: F-Score (2) F-score (5)
In A Novel Discourse Parser Based on Support Vector Machine Classification
  1. Using a rich set of shallow lexical, syntactic and structural features from the input text, our parser achieves, in linear time, 73.9% of professional annotators’ human agreement F-score .
    Page 1, “Abstract”
  2. Current state-of-the-art results in automatic segmenting are much closer to human levels than full structure labeling ( F-score ratios of automatic performance over gold standard reported in LeThanh et al.
    Page 2, “Building a Discourse Parser”
  3. Standard performance indicators for such a task are precision, recall and F-score as measured by the PARSEVAL metrics (Black et al., 1991), with the specific adaptations to the case of RST trees made by Marcu (2000, page 143-144).
    Page 7, “Evaluation”
  4. S N R F S N R F Precision 83.0 68.4 55.3 54.8 69.5 56.1 44.9 44.4 Recall 83.0 68.4 55.3 54.8 69.2 55.8 44.7 44.2 F-Score 83.0 68.4 55.3 54.8 69.3 56.0 44.8 44.3
    Page 7, “Evaluation”
  5. Manual SPADE -SNRFSNRFSNRF Precision 84.1 70.6 55.6 55.1 70.6 58.1 46.0 45.6 88.0 77.5 66.0 65.2 Recall 84.1 70.6 55.6 55.1 71.2 58.6 46.4 46.0 88.1 77.6 66.1 65.3 F-Score 84.1 70.6 55.6 55.1 70.9 58.3 46.2 45.8 88.1 77.5 66.0 65.3
    Page 7, “Evaluation”
  6. Precision Recall F-score
    Page 8, “Evaluation”
  7. In order to take into account possibly varying levels of difficulties between corpora, we therefore divided each F-score by the value for human agreement, such as measured by each author (see Table 5).
    Page 8, “Evaluation”

See all papers in Proc. ACL 2009 that mention F-score.

See all papers in Proc. ACL that mention F-score.

Back to top.

binary classifier

Appears in 5 sentences as: Binary classifier (1) binary classifier (2) binary classifiers (2)
In A Novel Discourse Parser Based on Support Vector Machine Classification
  1. o S: A binary classifier , for structure (existence of a connecting node between the two input sub-trees).
    Page 3, “Building a Discourse Parser”
  2. Because the original SVM algorithms build binary classifiers , multi-label classification requires some adaptation.
    Page 3, “Building a Discourse Parser”
  3. A possible approach is to reduce the multi-classification problem through a set of binary classifiers , each trained either on a single class (“one vs. all”) or by pair (“one vs. one”).
    Page 3, “Building a Discourse Parser”
  4. The algorithm starts with a list of all atomic discourse sub-trees (made of single edus in their text order) and recursively selects the best match between adjacent sub-trees (using binary classifier S), labels the newly created subtree (using multi-label classifier L) and updates scoring for S, until only one subtree is left: the complete rhetorical parse tree for the input text.
    Page 4, “Building a Discourse Parser”
  5. Binary classifier S is trained on 52,683 instances (split approximately 1/3, 2 / 3 between positive and negative examples), extracted from 350 documents, and tested on 8,558 instances extracted from 50 documents.
    Page 6, “Evaluation”

See all papers in Proc. ACL 2009 that mention binary classifier.

See all papers in Proc. ACL that mention binary classifier.

Back to top.

Lexicalized

Appears in 5 sentences as: Lexicalization (1) Lexicalized (2) lexicalized (1) “lexicalized” (1)
In A Novel Discourse Parser Based on Support Vector Machine Classification
  1. EDUS Syntax Trees I Syntax Parsing (Charniak's n/parse) ' v I Tokenization l I Lexicalization l .
    Page 3, “Building a Discourse Parser”
  2. Lexicalized (SPADE) Tokenized EDUs Syntax Trees
    Page 3, “Building a Discourse Parser”
  3. Lexicalized Syntax Trees
    Page 3, “Building a Discourse Parser”
  4. In training mode, classification instances are built by parsing manually annotated trees from the RST—DT corpus paired with lexicalized syntax trees (LS Trees) for each sentence (see Sect.
    Page 3, “Building a Discourse Parser”
  5. directly from the Penn Treebank corpus (which covers a superset of the RST-DT corpus), then “lexicalized” (i.e.
    Page 4, “Building a Discourse Parser”

See all papers in Proc. ACL 2009 that mention Lexicalized.

See all papers in Proc. ACL that mention Lexicalized.

Back to top.

sentence-level

Appears in 5 sentences as: sentence-level (5)
In A Novel Discourse Parser Based on Support Vector Machine Classification
  1. Marcu and Soricut focussed on sentence-level parsing and developed two probabilistic models that use syntactic and lexical information (Soricut and Marcu, 2003).
    Page 1, “Introduction”
  2. As evidenced by a number of discourse-parsing efforts focusing on intra-sentential parsing (Marcu, 2000; Soricut and Marcu, 2003), there is a strong correlation between different organizational levels of textual units and sub-trees of the RST tree both at the sentence-level and the paragraph level.
    Page 4, “Features”
  3. While not always present, discourse markers (connectives, cue-words or cue-phrases, etc) have been shown to give good indications on discourse structure and labeling, particularly at the sentence-level (Marcu, 2000).
    Page 4, “Features”
  4. A promising concept introduced by Soricut and Marcu (2003) in their sentence-level parser is the identification of ‘dominance sets’ in the syntax parse trees associated to each input sentence.
    Page 5, “Features”
  5. A large majority of the features considered so far focus exclusively on sentence-level information.
    Page 5, “Features”

See all papers in Proc. ACL 2009 that mention sentence-level.

See all papers in Proc. ACL that mention sentence-level.

Back to top.

gold standard

Appears in 4 sentences as: gold standard (4)
In A Novel Discourse Parser Based on Support Vector Machine Classification
  1. Current state-of-the-art results in automatic segmenting are much closer to human levels than full structure labeling (F-score ratios of automatic performance over gold standard reported in LeThanh et al.
    Page 2, “Building a Discourse Parser”
  2. validation levels (our gold standard ) in linear time-complexity (see Fig.
    Page 3, “Building a Discourse Parser”
  3. A measure of our full system’s performance is realized by comparing structure and labeling of the RST tree produced by our algorithm to that obtained through manual annotation (our gold standard ).
    Page 7, “Evaluation”
  4. In order to more accurately compare our results to the gold standard (defined as manual agreement between human annotators), we also evaluated performance using the 52 doubly-annotated files present in the RST-DT as test set (see Table 3).
    Page 7, “Evaluation”

See all papers in Proc. ACL 2009 that mention gold standard.

See all papers in Proc. ACL that mention gold standard.

Back to top.

manually annotated

Appears in 4 sentences as: manual annotation (1) manually annotated (3)
In A Novel Discourse Parser Based on Support Vector Machine Classification
  1. Both S and L classifiers are trained using manually annotated documents taken from the RST—DT corpus.
    Page 3, “Building a Discourse Parser”
  2. In training mode, classification instances are built by parsing manually annotated trees from the RST—DT corpus paired with lexicalized syntax trees (LS Trees) for each sentence (see Sect.
    Page 3, “Building a Discourse Parser”
  3. A measure of our full system’s performance is realized by comparing structure and labeling of the RST tree produced by our algorithm to that obtained through manual annotation (our gold standard).
    Page 7, “Evaluation”
  4. In this paper, we have shown that it is possible to build an accurate automatic text-level discourse parser based on supervised machine-learning algorithms, using a feature-driven approach and a manually annotated corpus.
    Page 8, “Conclusions and Future Work”

See all papers in Proc. ACL 2009 that mention manually annotated.

See all papers in Proc. ACL that mention manually annotated.

Back to top.

parse trees

Appears in 4 sentences as: parse tree (1) parse trees (3)
In A Novel Discourse Parser Based on Support Vector Machine Classification
  1. To the best of our knowledge, Reitter’s (2003b) was the only previous research based exclusively on feature-rich supervised learning to produce text-level RST discourse parse trees .
    Page 2, “Introduction”
  2. The algorithm starts with a list of all atomic discourse sub-trees (made of single edus in their text order) and recursively selects the best match between adjacent sub-trees (using binary classifier S), labels the newly created subtree (using multi-label classifier L) and updates scoring for S, until only one subtree is left: the complete rhetorical parse tree for the input text.
    Page 4, “Building a Discourse Parser”
  3. A promising concept introduced by Soricut and Marcu (2003) in their sentence-level parser is the identification of ‘dominance sets’ in the syntax parse trees associated to each input sentence.
    Page 5, “Features”
  4. In each case, parse trees are evaluated using the four following, increasingly complex, matching criteria: blank tree structure (‘S’), tree structure with nuclearity (‘N’), tree structure with rhetorical relations (‘R’) and our final goal: fully labeled structure with both nuclearity and rhetorical relation labels (‘F’).
    Page 7, “Evaluation”

See all papers in Proc. ACL 2009 that mention parse trees.

See all papers in Proc. ACL that mention parse trees.

Back to top.

feature space

Appears in 3 sentences as: feature space (2) feature spaces (1)
In A Novel Discourse Parser Based on Support Vector Machine Classification
  1. Our method is based on recent advances in the field of statistical machine learning (multivariate capabilities of Support Vector Machines) and a rich feature space .
    Page 1, “Abstract”
  2. This makes SVM well-fitted to treat classification problems involving relatively large feature spaces such as ours (% 105 features).
    Page 3, “Building a Discourse Parser”
  3. The feature space dimension is 136,987.
    Page 6, “Evaluation”

See all papers in Proc. ACL 2009 that mention feature space.

See all papers in Proc. ACL that mention feature space.

Back to top.

Support Vector

Appears in 3 sentences as: Support Vector (3)
In A Novel Discourse Parser Based on Support Vector Machine Classification
  1. Our method is based on recent advances in the field of statistical machine learning (multivariate capabilities of Support Vector Machines) and a rich feature space.
    Page 1, “Abstract”
  2. 2.2 Support Vector Machines
    Page 2, “Building a Discourse Parser”
  3. Support Vector Machines (SVM) (Vapnik, 1995) are used to model classifiers S and L. SVM refers to a set of supervised learning algorithms that are based on margin maximization.
    Page 3, “Building a Discourse Parser”

See all papers in Proc. ACL 2009 that mention Support Vector.

See all papers in Proc. ACL that mention Support Vector.

Back to top.

Treebank

Appears in 3 sentences as: Treebank (3)
In A Novel Discourse Parser Based on Support Vector Machine Classification
  1. Figure 1: Example of a simple RST tree (Source: RST Discourse Treebank (Carlson et al., 2001),
    Page 2, “Introduction”
  2. In this set, the 75 relations originally used in the RST Discourse Treebank (RST-DT) corpus (Carlson et al., 2001) are partitioned into 18 classes according to rhetorical similarity (e.g.
    Page 2, “Building a Discourse Parser”
  3. directly from the Penn Treebank corpus (which covers a superset of the RST-DT corpus), then “lexicalized” (i.e.
    Page 4, “Building a Discourse Parser”

See all papers in Proc. ACL 2009 that mention Treebank.

See all papers in Proc. ACL that mention Treebank.

Back to top.