Abstract | For efficient inference over taxonomy structures, we use loopy belief propagation along with a directed spanning tree algorithm for the core hypernymy factor. |
Experiments | We present a development set ablation study where we start with the edges-only model (Figure 2a) and its random tree baseline (which chooses any arbitrary spanning tree for the term set). |
Features | While spanning trees are familiar from non-projective dependency parsing, features based on the linear order of the words or on lexical identi- |
Introduction | Our model takes a loglinear form and is represented using a factor graph that includes both lst—order scoring factors on directed hypernymy edges (a parent and child in the taxonomy) and 2nd-order scoring factors on sibling edge pairs (pairs of hypernym edges with a shared parent), as well as incorporating a global (directed spanning tree ) structural constraint. |
Introduction | Inference for both learning and decoding uses structured loopy belief propagation (BP), incorporating standard spanning tree algorithms (Chu and Liu, 1965; Edmonds, 1967; Tutte, 1984). |
Introduction | (2006), the longest path approach of Kozareva and Hovy (2010), and the maximum spanning tree (MST) approach of Navigli et al. |
Related Work | Our work differs from the two systems above in that ours is the first discriminatively trained, structured probabilistic model over the full space of taxonomy trees that uses structured inference via spanning tree algorithms (MST and MTT) through both the learning and decoding phases. |
Structured Taxonomy Induction | Of course, not all variable assignments y form legal taxonomy trees (i.e., directed spanning trees ). |
Structured Taxonomy Induction | Note that finding taxonomy trees is a structurally identical problem to directed spanning trees (and thereby non-proj ective dependency parsing), for which belief propagation has previously been worked out in depth (Smith and Eisner, 2008). |
Structured Taxonomy Induction | ever, due to the structure of that particular factor, all of its outgoing messages can be computed simultaneously in 0(n3) time via an efficient adaptation of Kirchhoff ’s Matrix Tree Theorem (MTT) (Tutte, 1984) which computes partition functions and marginals for directed spanning trees . |
Abstract | We recover approximate solutions to this problem using LP relaxation and maximum spanning tree algorithms, yielding techniques that can be combined with the efficient bigram-based inference approach using Lagrange multipliers. |
Experiments | As discussed in §2.4, a maximum spanning tree is recovered from the output of the LP and greedily pruned in order to generate a valid integral solution while observing the imposed compression rate. |
Introduction | Maximum spanning tree algorithms, commonly used in non-projective dependency parsing (McDonald et al., 2005), are not easily adaptable to this task since the maximum-weight subtree is not necessarily a part of the maximum spanning tree . |
Introduction | In addition, we can recover approximate solutions to this problem by using the Chu-Liu Edmonds algorithm for recovering maximum spanning trees (Chu and Liu, 1965; Edmonds, 1967) over the relatively sparse subgraph defined by a solution to the relaxed LP. |
Multi-Structure Sentence Compression | Figure 1: An example of the difficulty of recovering the maximum-weight subtree (B—>C, B—>D) from the maximum spanning tree (A—>C, C—>B, B—>D). |
Multi-Structure Sentence Compression | Although the maximum spanning tree for a given token configuration can be recovered efficiently, Figure 1 illustrates that the maximum-scoring subtree is not necessarily found within it. |
Multi-Structure Sentence Compression | Since the commodity flow constraints in (9)—(ll) ensure a connected i, it is therefore possible to recover a maximum-weight spanning tree from G using the Chu-Liu Edmonds algorithm (Chu and Liu, 1965; Edmonds, 1967).5 Although the runtime of this algorithm is cubic in the size of the input graph, it is fairly speedy when applied on relatively sparse graphs such as the solutions to the LP described above. |
Abstract | The state-of—the-art dependency parsing techniques, the Eisner algorithm and maximum spanning tree (MST) algorithm, are adopted to parse an optimal discourse dependency tree based on the arc-factored model and the large—margin learning techniques. |
Discourse Dependency Parsing | The goal of discourse dependency parsing is to parse an optimal spanning tree from V><R><V_0. |
Discourse Dependency Parsing | Thus, the optimal dependency tree for T is a spanning tree with the highest score and obtained through the function DT(T,w): DT(T, w) = argmaxGT gVXRMO score(T, GT) |
Discourse Dependency Parsing | <ei,r,ej>eGT where GT means a possible spanning tree with score(T,GT) and Mei, r, 61-) denotes the score of |
Discourse Dependency Structure and Tree Bank | and maximum spanning tree (MST) algorithm are used respectively to parse the optimal projective and non-projective dependency trees with the large-margin learning technique (Crammer and Singer, 2003). |
Introduction | To solve the latter problem, we introduce an apparently novel O(|V|2 log algorithm that is similar to the maximum spanning tree (MST) algorithms that are widely used for dependency parsing (McDonald et al., 2005). |
Introduction | The approach can be understood as an alternative to parsing approaches using graph transducers such as (synchronous) hyperedge replacement grammars (Chiang et al., 2013; Jones et al., 2012; Drewes et al., 1997), in much the same way that spanning tree algorithms are an alternative to using shift-reduce and dynamic programming algorithms for dependency parsing.1 While a detailed |
Related Work | Minimum spanning tree algorithms—specifically, the optimum branching algorithm of Chu and Liu (1965) and Edmonds (1967)—were first used for dependency parsing by McDonald et a1. |
Relation Identification | This algorithm is a (to our knowledge novel) modification of the minimum spanning tree algorithm of Kruskal (1956). |
Introducing Nonlocal Features | 5 In case there are multiple maximum spanning trees , the best-first decoder will return one of them. |
Introducing Nonlocal Features | With proper definitions, the proof can be constructed to show that both search algorithms return trees belonging to the set of maximum spanning trees over a graph. |
Representation and Learning | Edmonds, 1967), which is a maximum spanning tree algorithm that finds the optimal tree over a connected directed graph. |