Text-level Discourse Dependency Parsing
Li, Sujian and Wang, Liang and Cao, Ziqiang and Li, Wenjie

Article Structure

Abstract

Previous researches on Text-level discourse parsing mainly made use of constituency structure to parse the whole document into one discourse tree.

Introduction

It is widely agreed that no units of the text can be understood in isolation, but in relation to their context.

Discourse Dependency Structure and Tree Bank

2.1 Discourse Dependency Structure

Discourse Dependency Parsing

3.1 System Overview

Add arc <eC,ej> to GC with

eP(€Ca€j)=argmaXeiecMeiaej)

Topics

discourse parsing

Appears in 23 sentences as: discourse parser (1) discourse parsers (1) discourse parsing (22)
In Text-level Discourse Dependency Parsing
  1. Previous researches on Text-level discourse parsing mainly made use of constituency structure to parse the whole document into one discourse tree.
    Page 1, “Abstract”
  2. In this paper, we present the limitations of constituency based discourse parsing and first propose to use dependency structure to directly represent the relations between elementary discourse units (EDUs).
    Page 1, “Abstract”
  3. Experiments show that our discourse dependency parsers achieve a competitive performance on text-level discourse parsing .
    Page 1, “Abstract”
  4. Researches in discourse parsing aim to acquire such relations in text, which is fundamental to many natural language processing applications such as question answering, automatic summarization and so on.
    Page 1, “Introduction”
  5. One important issue behind discourse parsing is the representation of discourse structure.
    Page 1, “Introduction”
  6. 1 EDU segmentation is a relatively trivial step in discourse parsing .
    Page 1, “Introduction”
  7. Since our work focus here is not EDU segmentation but discourse parsing .
    Page 1, “Introduction”
  8. for discourse parsing (Soricut and Marcu, 2003; J oty et al., 2012; Reitter, 2003; LeThanh et al., 2004; Baldridge and Lascarides, 2005; Subba and Di Eugenio, 2009; Sagae, 2009; Hernault et al., 2010b; Feng and Hirst, 2012).
    Page 1, “Introduction”
  9. Section 3 presents the discourse parsing approach based on the Eisner and MST algorithms.
    Page 2, “Discourse Dependency Structure and Tree Bank”
  10. Figure 5 shows the details of the Chu-Liu/Edmonds algorithm for discourse parsing .
    Page 4, “Discourse Dependency Parsing”
  11. The third feature type (Position) is also very helpful to discourse parsing .
    Page 7, “Add arc <eC,ej> to GC with”

See all papers in Proc. ACL 2014 that mention discourse parsing.

See all papers in Proc. ACL that mention discourse parsing.

Back to top.

EDUs

Appears in 23 sentences as: EDUs (26)
In Text-level Discourse Dependency Parsing
  1. In this paper, we present the limitations of constituency based discourse parsing and first propose to use dependency structure to directly represent the relations between elementary discourse units ( EDUs ).
    Page 1, “Abstract”
  2. Rhetorical Structure Theory (RST) (Mann and Thompson, 1988), one of the most influential discourse theories, posits a hierarchical generative tree representation, as illustrated in Figure l. The leaves of a tree correspond to contiguous text spans called Elementary Discourse Units ( EDUs )1.
    Page 1, “Introduction”
  3. The adjacent EDUs are combined into
    Page 1, “Introduction”
  4. We assume EDUs are already known.
    Page 1, “Introduction”
  5. EDUs or larger text spans) occurring in the generative process are better represented with different features, and thus a uniform framework for discourse analysis is hard to develop.
    Page 1, “Introduction”
  6. Here is the basic idea: the discourse structure consists of EDUs which are linked by the binary, asymmetrical relations called dependency relations.
    Page 1, “Introduction”
  7. Now, we can analyze the relations between EDUs directly, without worrying about any interior text spans.
    Page 2, “Introduction”
  8. Then, discourse dependency structure can be formalized as the labeled directed graph, Where nodes correspond to EDUs and labeled arcs correspond to labeled dependency relations.
    Page 2, “Discourse Dependency Structure and Tree Bank”
  9. We assume that the teth T is composed of n+1 EDUs including the artificial e0.
    Page 3, “Discourse Dependency Structure and Tree Bank”
  10. Let R={r1,r2, ,rm} denote a finite set of functional relations that hold between two EDUs .
    Page 3, “Discourse Dependency Structure and Tree Bank”
  11. The third condition assures that each EDU has one and only one head and the fourth tells that only one kind of dependency relation holds between two EDUs .
    Page 3, “Discourse Dependency Structure and Tree Bank”

See all papers in Proc. ACL 2014 that mention EDUs.

See all papers in Proc. ACL that mention EDUs.

Back to top.

dependency trees

Appears in 22 sentences as: dependency tree (8) dependency trees (14)
In Text-level Discourse Dependency Parsing
  1. The state-of—the-art dependency parsing techniques, the Eisner algorithm and maximum spanning tree (MST) algorithm, are adopted to parse an optimal discourse dependency tree based on the arc-factored model and the large—margin learning techniques.
    Page 1, “Abstract”
  2. Since dependency trees contain much fewer nodes and on average they are simpler than constituency based trees, the current dependency parsers can have a relatively low computational complexity.
    Page 2, “Introduction”
  3. In our work, we adopt the graph based dependency parsing techniques learned from large sets of annotated dependency trees .
    Page 2, “Introduction”
  4. and maximum spanning tree (MST) algorithm are used respectively to parse the optimal projective and non-projective dependency trees with the large-margin learning technique (Crammer and Singer, 2003).
    Page 2, “Discourse Dependency Structure and Tree Bank”
  5. According to the definition, we illustrate all the 9 possible unlabeled dependency trees for a text containing three EDUs in Figure 2.
    Page 3, “Discourse Dependency Structure and Tree Bank”
  6. The dependency trees 1’ to 7’ are projective while 8’ and 9’ are non-projective with crossing arcs.
    Page 3, “Discourse Dependency Structure and Tree Bank”
  7. We use dependency trees to simulate the headed constituency based trees.
    Page 3, “Discourse Dependency Structure and Tree Bank”
  8. Contrasting Figure l with Figure 2, we use dependency tree 1’ to simulate binary trees 1 and 8, and dependency tress 2’- 7’ to simulate binary trees 2-7 correspondingly.
    Page 3, “Discourse Dependency Structure and Tree Bank”
  9. The rhetorical relations in RST trees are kept as the functional relations which link the two EDUs in dependency trees .
    Page 3, “Discourse Dependency Structure and Tree Bank”
  10. Here we follow the arc factored method and define the score of a dependency tree as the sum of the scores of all the arcs in the tree.
    Page 3, “Discourse Dependency Parsing”
  11. Thus, the optimal dependency tree for T is a spanning tree with the highest score and obtained through the function DT(T,w): DT(T, w) = argmaxGT gVXRMO score(T, GT)
    Page 3, “Discourse Dependency Parsing”

See all papers in Proc. ACL 2014 that mention dependency trees.

See all papers in Proc. ACL that mention dependency trees.

Back to top.

dependency parsing

Appears in 16 sentences as: dependency parsers (2) Dependency Parsing (1) dependency parsing (12) dependency parsing, (1)
In Text-level Discourse Dependency Parsing
  1. The state-of—the-art dependency parsing techniques, the Eisner algorithm and maximum spanning tree (MST) algorithm, are adopted to parse an optimal discourse dependency tree based on the arc-factored model and the large—margin learning techniques.
    Page 1, “Abstract”
  2. Experiments show that our discourse dependency parsers achieve a competitive performance on text-level discourse parsing.
    Page 1, “Abstract”
  3. Dependency Parsing
    Page 1, “Introduction”
  4. Since dependency trees contain much fewer nodes and on average they are simpler than constituency based trees, the current dependency parsers can have a relatively low computational complexity.
    Page 2, “Introduction”
  5. In our work, we adopt the graph based dependency parsing techniques learned from large sets of annotated dependency trees.
    Page 2, “Introduction”
  6. To the best of our knowledge, we are the first to apply the dependency structure and introduce the dependency parsing techniques into discourse analysis.
    Page 2, “Discourse Dependency Structure and Tree Bank”
  7. To automatically conduct discourse dependency parsing , constructing a discourse dependency treebank is fundamental.
    Page 3, “Discourse Dependency Structure and Tree Bank”
  8. The goal of discourse dependency parsing is to parse an optimal spanning tree from V><R><V_0.
    Page 3, “Discourse Dependency Parsing”
  9. It is well known that projective dependency parsing can be handled with the Eisner algorithm (1996) which is based on the bottom-up dynamic programming techniques with the time complexity of 0(n3).
    Page 4, “Discourse Dependency Parsing”
  10. Following the work of McDonald (2005b), we formalize discourse dependency parsing as searching for a maximum spanning tree (MST) in a directed graph.
    Page 4, “Discourse Dependency Parsing”
  11. Referring to the evaluation of syntactic dependency parsing,
    Page 6, “Add arc <eC,ej> to GC with”

See all papers in Proc. ACL 2014 that mention dependency parsing.

See all papers in Proc. ACL that mention dependency parsing.

Back to top.

treebank

Appears in 15 sentences as: TreeBank (1) Treebank (3) treebank (10) treebanks (1)
In Text-level Discourse Dependency Parsing
  1. Section 2 formally defines discourse dependency structure and introduces how to build a discourse dependency treebank from the existing RST corpus.
    Page 2, “Discourse Dependency Structure and Tree Bank”
  2. 2.2 Our Discourse Dependency Treebank
    Page 3, “Discourse Dependency Structure and Tree Bank”
  3. To automatically conduct discourse dependency parsing, constructing a discourse dependency treebank is fundamental.
    Page 3, “Discourse Dependency Structure and Tree Bank”
  4. It is costly to manually construct such a treebank from scratch.
    Page 3, “Discourse Dependency Structure and Tree Bank”
  5. Fortunately, RST Discourse Treebank (RST-DT) (Carlson et al., 2001) is an available resource to help with.
    Page 3, “Discourse Dependency Structure and Tree Bank”
  6. With this kind of conversion, we can get our discourse dependency treebank .
    Page 3, “Discourse Dependency Structure and Tree Bank”
  7. It is worth noting that the non-projective trees like 8’ and 9’ do not exist in our dependency treebank , though they are eligible according to the definition of discourse dependency graph.
    Page 3, “Discourse Dependency Structure and Tree Bank”
  8. We use the syntactic trees from the Penn Treebank to find the dominating nodes,.
    Page 5, “Add arc <eC,ej> to GC with”
  9. But we think that MST algorithm has more potential in discourse dependency parsing, because our converted discourse dependency treebank contains only projective trees and somewhat suppresses the MST algorithm to exhibit its advantage of parsing non-projective trees.
    Page 8, “Add arc <eC,ej> to GC with”
  10. In fact, we observe that some non-projective dependencies produced by the MST algorithm are even reasonable than what they are in the dependency treebank .
    Page 8, “Add arc <eC,ej> to GC with”
  11. Thus, it is important to build a manually labeled discourse dependency treebank , which will be our future work.
    Page 8, “Add arc <eC,ej> to GC with”

See all papers in Proc. ACL 2014 that mention treebank.

See all papers in Proc. ACL that mention treebank.

Back to top.

dependency relation

Appears in 7 sentences as: dependency relation (5) dependency relations (2)
In Text-level Discourse Dependency Parsing
  1. Here is the basic idea: the discourse structure consists of EDUs which are linked by the binary, asymmetrical relations called dependency relations .
    Page 1, “Introduction”
  2. A dependency relation holds between a subordinate EDU called the dependent, and another EDU on
    Page 1, “Introduction”
  3. Similar to the syntactic dependency structure defined by McDonald (2005a, 2005b), we insert an artificial EDU e0 in the beginning for each document and label the dependency relation linking from 60 as ROOT.
    Page 2, “Discourse Dependency Structure and Tree Bank”
  4. A labeled directed arc is used to represent the dependency relation from one head to its dependent.
    Page 2, “Discourse Dependency Structure and Tree Bank”
  5. Then, discourse dependency structure can be formalized as the labeled directed graph, Where nodes correspond to EDUs and labeled arcs correspond to labeled dependency relations .
    Page 2, “Discourse Dependency Structure and Tree Bank”
  6. The third condition assures that each EDU has one and only one head and the fourth tells that only one kind of dependency relation holds between two EDUs.
    Page 3, “Discourse Dependency Structure and Tree Bank”
  7. For example, The sixth feature in Table 5 represents that the dependency relation is preferred to be labeled Explanation with the fact that “because” is the first word of the dependent EDU.
    Page 7, “Add arc <eC,ej> to GC with”

See all papers in Proc. ACL 2014 that mention dependency relation.

See all papers in Proc. ACL that mention dependency relation.

Back to top.

spanning tree

Appears in 7 sentences as: Spanning Tree (1) spanning tree (6)
In Text-level Discourse Dependency Parsing
  1. The state-of—the-art dependency parsing techniques, the Eisner algorithm and maximum spanning tree (MST) algorithm, are adopted to parse an optimal discourse dependency tree based on the arc-factored model and the large—margin learning techniques.
    Page 1, “Abstract”
  2. and maximum spanning tree (MST) algorithm are used respectively to parse the optimal projective and non-projective dependency trees with the large-margin learning technique (Crammer and Singer, 2003).
    Page 2, “Discourse Dependency Structure and Tree Bank”
  3. The goal of discourse dependency parsing is to parse an optimal spanning tree from V><R><V_0.
    Page 3, “Discourse Dependency Parsing”
  4. Thus, the optimal dependency tree for T is a spanning tree with the highest score and obtained through the function DT(T,w): DT(T, w) = argmaxGT gVXRMO score(T, GT)
    Page 3, “Discourse Dependency Parsing”
  5. <ei,r,ej>eGT where GT means a possible spanning tree with score(T,GT) and Mei, r, 61-) denotes the score of
    Page 3, “Discourse Dependency Parsing”
  6. 3.3 Maximum Spanning Tree Algorithm
    Page 4, “Discourse Dependency Parsing”
  7. Following the work of McDonald (2005b), we formalize discourse dependency parsing as searching for a maximum spanning tree (MST) in a directed graph.
    Page 4, “Discourse Dependency Parsing”

See all papers in Proc. ACL 2014 that mention spanning tree.

See all papers in Proc. ACL that mention spanning tree.

Back to top.

feature weights

Appears in 6 sentences as: Feature Weights (2) feature weights (3) Features Weight (1)
In Text-level Discourse Dependency Parsing
  1. In fact, the score of each arc is calculated as a linear combination of feature weights .
    Page 5, “Add arc <eC,ej> to GC with”
  2. (2005a; 2005b), we use the Margin Infused Relaxed Algorithm (MIRA) to learn the feature weights based on a training set of documents annotated with dependency structures yi where yi denotes the correct dependency tree for the text Ti.
    Page 5, “Add arc <eC,ej> to GC with”
  3. 'able 5: Top 10 Feature Weights for Coarse-rained Relation Labeling (Eisner Algorithm)
    Page 7, “Add arc <eC,ej> to GC with”
  4. Features Weight Last two words in dependent EDU are “ap- 0 576 eals court” & List ' First two words in head EDU are “I ‘d” 0 385 & Attribution ' First two words in dependent EDU is “that 0 348 the” & Elaboration-object-attribute-e ' First P08 in head EDU is “DT” & List —0.323 Last word in dependent EDU is “in” & List -0.286 First word in dependent EDU is “racked” & 0 445 Elaboration-object-attribute-e ' First two word pairs are <”In an”,”But _0 252 even”> & List ' Dependent EDU has a dominating node _0 244 tagged “CD”& Elaboration-object-attribute-e ' First two words in dependent EDU are “pa— 0 231 tents disputes” & Purpose ' First word in dependent EDU is “to” 0 230 & Pur ose ' P
    Page 7, “Add arc <eC,ej> to GC with”
  5. 'able 6: Top 10 Feature Weights for Coarse-rained Relation Labeling (Eisner Algorithm)
    Page 7, “Add arc <eC,ej> to GC with”
  6. To calculate the score for each arc, six types of features are explored to represent the arcs and the feature weights are learned based on the MIRA learning technique.
    Page 9, “Add arc <eC,ej> to GC with”

See all papers in Proc. ACL 2014 that mention feature weights.

See all papers in Proc. ACL that mention feature weights.

Back to top.

fine-grained

Appears in 6 sentences as: fine-grained (6)
In Text-level Discourse Dependency Parsing
  1. A total of 110 fine-grained relations (e.g.
    Page 3, “Discourse Dependency Structure and Tree Bank”
  2. One is composed of 19 coarse-grained relations and the other 111 fine-grained relations6.
    Page 5, “Add arc <eC,ej> to GC with”
  3. From Table 3 and Table 4, we can see that the addition of more feature types, except the 6th feature type (semantic similarity), can promote the performance of relation labeling, whether using the coarse-grained 19 relations and the fine-grained 111 relations.
    Page 7, “Add arc <eC,ej> to GC with”
  4. Table 5 selects 10 features with the highest weights in absolute value for the parser which uses the coarse-grained relations, while Table 6 selects the top 10 features for the parser using the fine-grained relations.
    Page 7, “Add arc <eC,ej> to GC with”
  5. From Table 3 and 'able 4, we can see that fine-grained relations re more helpful to building unlabeled discourse
    Page 7, “Add arc <eC,ej> to GC with”
  6. We can also see that the labeled accuracy using the fine-grained relations can achieve 0.4309, only 0.06 lower than the best labeled accuracy (0.4915) using the coarse-grained relations.
    Page 8, “Add arc <eC,ej> to GC with”

See all papers in Proc. ACL 2014 that mention fine-grained.

See all papers in Proc. ACL that mention fine-grained.

Back to top.

maximum spanning

Appears in 4 sentences as: Maximum Spanning (1) maximum spanning (3)
In Text-level Discourse Dependency Parsing
  1. The state-of—the-art dependency parsing techniques, the Eisner algorithm and maximum spanning tree (MST) algorithm, are adopted to parse an optimal discourse dependency tree based on the arc-factored model and the large—margin learning techniques.
    Page 1, “Abstract”
  2. and maximum spanning tree (MST) algorithm are used respectively to parse the optimal projective and non-projective dependency trees with the large-margin learning technique (Crammer and Singer, 2003).
    Page 2, “Discourse Dependency Structure and Tree Bank”
  3. 3.3 Maximum Spanning Tree Algorithm
    Page 4, “Discourse Dependency Parsing”
  4. Following the work of McDonald (2005b), we formalize discourse dependency parsing as searching for a maximum spanning tree (MST) in a directed graph.
    Page 4, “Discourse Dependency Parsing”

See all papers in Proc. ACL 2014 that mention maximum spanning.

See all papers in Proc. ACL that mention maximum spanning.

Back to top.

syntactic parsing

Appears in 4 sentences as: syntactic parsing (4)
In Text-level Discourse Dependency Parsing
  1. Since such a hierarchical discourse tree is analogous to a constituency based syntactic tree except that the constituents in the discourse trees are text spans, previous researches have explored different constituency based syntactic parsing techniques (eg.
    Page 1, “Introduction”
  2. First, it is difficult to design a set of production rules as in syntactic parsing , since there are no determinate generative rules for the interior text spans.
    Page 1, “Introduction”
  3. The other two types of features which are related to length and syntactic parsing , only promote the performance slightly.
    Page 7, “Add arc <eC,ej> to GC with”
  4. Since the RST tree is similar to the constituency based syntactic tree except that the constituent nodes are different, the syntactic parsing techniques have been borrowed for discourse parsing (Soricut and Marcu, 2003; Baldridge and Lascarides, 2005; Sagae, 2009; Hernault et al., 2010b; Feng and Hirst, 2012).
    Page 9, “Add arc <eC,ej> to GC with”

See all papers in Proc. ACL 2014 that mention syntactic parsing.

See all papers in Proc. ACL that mention syntactic parsing.

Back to top.

time complexity

Appears in 4 sentences as: time complexity (4)
In Text-level Discourse Dependency Parsing
  1. Third, to reduce the time complexity of the state-of-the-art constituency based parsing techniques, the approximate parsing approaches are prone to trap in local maximum.
    Page 1, “Introduction”
  2. It is well known that projective dependency parsing can be handled with the Eisner algorithm (1996) which is based on the bottom-up dynamic programming techniques with the time complexity of 0(n3).
    Page 4, “Discourse Dependency Parsing”
  3. (2005b), we adopt an efficient implementation of the Chu-Liu/Edmonds algorithm that is proposed by Tar-jan (1997) with O(n2) time complexity .
    Page 4, “Discourse Dependency Parsing”
  4. Based on dependency structure, we are able to directly analyze the relations between the EDUs without worrying about the additional interior text spans, and apply the existing state-of-the-art dependency parsing techniques which have a relatively low time complexity .
    Page 9, “Add arc <eC,ej> to GC with”

See all papers in Proc. ACL 2014 that mention time complexity.

See all papers in Proc. ACL that mention time complexity.

Back to top.

discourse structure

Appears in 3 sentences as: discourse structure (3)
In Text-level Discourse Dependency Parsing
  1. One important issue behind discourse parsing is the representation of discourse structure .
    Page 1, “Introduction”
  2. Here is the basic idea: the discourse structure consists of EDUs which are linked by the binary, asymmetrical relations called dependency relations.
    Page 1, “Introduction”
  3. Soricut and Marcu (2003) use a standard bottom-up chart parsing algorithm to determine the discourse structure of sentences.
    Page 9, “Add arc <eC,ej> to GC with”

See all papers in Proc. ACL 2014 that mention discourse structure.

See all papers in Proc. ACL that mention discourse structure.

Back to top.

highest score

Appears in 3 sentences as: highest score (2) highest scored (1)
In Text-level Discourse Dependency Parsing
  1. Thus, the optimal dependency tree for T is a spanning tree with the highest score and obtained through the function DT(T,w): DT(T, w) = argmaxGT gVXRMO score(T, GT)
    Page 3, “Discourse Dependency Parsing”
  2. A dynamic programming table E[i,j,d,c] is used to represent the highest scored subtree spanning e,- to ej.
    Page 4, “Discourse Dependency Parsing”
  3. Each node in the graph greedily selects the incoming arc with the highest score .
    Page 4, “Discourse Dependency Parsing”

See all papers in Proc. ACL 2014 that mention highest score.

See all papers in Proc. ACL that mention highest score.

Back to top.

learning algorithm

Appears in 3 sentences as: learning algorithm (3)
In Text-level Discourse Dependency Parsing
  1. As we employed the MIRA learning algorithm , it is possible to identify which specific features are useful, by looking at the weights learned to each feature using the training data.
    Page 7, “Add arc <eC,ej> to GC with”
  2. Other text-level discourse parsing methods include: (1) Percep-coarse: we replace MIRA with the averaged per-ceptron learning algorithm and the other settings are the same with Our-coarse; (2) HILDA-manual and HILDA-seg are from Hernault (2010b)’s work, and their inputted EDUs are from RST-DT and their own EDU segmenter respectively; (3) LeThanh indicates the results given by LeThanh el al.
    Page 8, “Add arc <eC,ej> to GC with”
  3. We can also see that the averaged perceptron learning algorithm , though simple, can achieve a comparable performance, better than HILDA-manual.
    Page 8, “Add arc <eC,ej> to GC with”

See all papers in Proc. ACL 2014 that mention learning algorithm.

See all papers in Proc. ACL that mention learning algorithm.

Back to top.

subtrees

Appears in 3 sentences as: subtrees (3)
In Text-level Discourse Dependency Parsing
  1. The algorithm begins by initializing all length-one subtrees to a score of 0.0.
    Page 4, “Discourse Dependency Parsing”
  2. over all the internal indices (iSqu) in the span, and calculating the value of merging the two subtrees and adding one new arc.
    Page 4, “Discourse Dependency Parsing”
  3. This algorithm considers all the possible subtrees .
    Page 4, “Discourse Dependency Parsing”

See all papers in Proc. ACL 2014 that mention subtrees.

See all papers in Proc. ACL that mention subtrees.

Back to top.