Concise Integer Linear Programming Formulations for Dependency Parsing
Martins, Andre and Smith, Noah and Xing, Eric

Article Structure

Abstract

We formulate the problem of non-projective dependency parsing as a polynomial-sized integer linear program.

Introduction

Much attention has recently been devoted to integer linear programming (ILP) formulations of NLP problems, with interesting results in applications like semantic role labeling (Roth and Yih, 2005; Punyakanok et al., 2004), dependency parsing (Riedel and Clarke, 2006), word alignment for machine translation (Lacoste-Julien et al., 2006), summarization (Clarke and Lapata, 2008), and coreference resolution (Denis and Baldridge, 2007), among others.

Dependency Parsing

2.1 Preliminaries

Dependency Parsing as an ILP

Our approach will build a graph-based parser without the drawback of a restriction to local features.

Experiments

We report experiments on seven languages, six (Danish, Dutch, Portuguese, Slovene, Swedish and Turkish) from the CoNLL-X shared task (Buchholz and Marsi, 2006), and one (English) from the CoNLL-2008 shared task (Surdeanu et al., 2008).8 All experiments are evaluated using the unlabeled attachment score (UAS), using the default settings.9 We used the same arc-factored features as McDonald et al.

Conclusions

We presented new dependency parsers based on concise ILP formulations.

Topics

dependency parsing

Appears in 21 sentences as: dependency parse (10) dependency parsers (2) dependency parsing (11)
In Concise Integer Linear Programming Formulations for Dependency Parsing
  1. We formulate the problem of non-projective dependency parsing as a polynomial-sized integer linear program.
    Page 1, “Abstract”
  2. Much attention has recently been devoted to integer linear programming (ILP) formulations of NLP problems, with interesting results in applications like semantic role labeling (Roth and Yih, 2005; Punyakanok et al., 2004), dependency parsing (Riedel and Clarke, 2006), word alignment for machine translation (Lacoste-Julien et al., 2006), summarization (Clarke and Lapata, 2008), and coreference resolution (Denis and Baldridge, 2007), among others.
    Page 1, “Introduction”
  3. Riedel and Clarke (2006) cast dependency parsing as an ILP, but efi‘icient formulations remain an open problem.
    Page 1, “Introduction”
  4. Let us first describe formally the set of legal dependency parse trees.
    Page 1, “Dependency Parsing”
  5. We define the set of legal dependency parse trees of at (denoted 34:10)) as the set of O-arborescences of D, i.e., we admit each arborescence as a potential dependency tree.
    Page 2, “Dependency Parsing”
  6. 3 In this paper, we consider unlabeled dependency parsing , where only the backbone structure (i.e., the arcs without the labels depicted in Fig.
    Page 2, “Dependency Parsing”
  7. Figure l: A projective dependency parse (top), and a non-projective dependency parse (bottom) for two English sentences; examples from McDonald and Satta (2007).
    Page 2, “Dependency Parsing”
  8. , (5cm,ym)) E (X X 32W, W6 aim to learn a parser, i.e., a function h : X —> 32 that given cc 6 X outputs a legal dependency parse y E The fact that there are exponentially many candidates in 32(53) makes dependency parsing a structured classification problem.
    Page 2, “Dependency Parsing”
  9. There has been much recent work on dependency parsing using graph-based, transition-based, and hybrid methods; see Nivre and McDonald (2008) for an overview.
    Page 2, “Dependency Parsing”
  10. Combinatorial algorithms (Chu and Liu, 1965; Edmonds, 1967) can solve this problem in cubic time.4 If the dependency parse trees are restricted to be projective, cubic-time algorithms are available via dynamic programming (Eisner, 1996).
    Page 3, “Dependency Parsing”
  11. Riedel and Clarke (2006) proposed an ILP formulation for dependency parsing which refines the arc-factored model by imposing linguistically motivated “hard” constraints that forbid some arc configurations.
    Page 3, “Dependency Parsing as an ILP”

See all papers in Proc. ACL 2009 that mention dependency parsing.

See all papers in Proc. ACL that mention dependency parsing.

Back to top.

ILP

Appears in 19 sentences as: ILP (19)
In Concise Integer Linear Programming Formulations for Dependency Parsing
  1. Much attention has recently been devoted to integer linear programming ( ILP ) formulations of NLP problems, with interesting results in applications like semantic role labeling (Roth and Yih, 2005; Punyakanok et al., 2004), dependency parsing (Riedel and Clarke, 2006), word alignment for machine translation (Lacoste-Julien et al., 2006), summarization (Clarke and Lapata, 2008), and coreference resolution (Denis and Baldridge, 2007), among others.
    Page 1, “Introduction”
  2. In general, the rationale for the development of ILP formulations is to incorporate nonlocal features or global constraints, which are often difficult to handle with traditional algorithms.
    Page 1, “Introduction”
  3. ILP formulations focus more on the modeling of problems, rather than algorithm design.
    Page 1, “Introduction”
  4. While solving an ILP is NP-hard in general, fast solvers are available today that make it a practical solution for many NLP problems.
    Page 1, “Introduction”
  5. This paper presents new, concise ILP formulations for projective and non-projective depen-
    Page 1, “Introduction”
  6. Riedel and Clarke (2006) cast dependency parsing as an ILP , but efi‘icient formulations remain an open problem.
    Page 1, “Introduction”
  7. dence vectors can be cast as an ILP .
    Page 2, “Dependency Parsing”
  8. By formulating inference as an ILP , nonlocal features can be easily accommodated in our model; furthermore, by using a relaxation technique we can still make learning tractable.
    Page 3, “Dependency Parsing as an ILP”
  9. If we add the constraint x E Zd, then the above is called an integer linear program ( ILP ).
    Page 3, “Dependency Parsing as an ILP”
  10. Of course, this need not happen: solving a general ILP is an NP-complete problem.
    Page 3, “Dependency Parsing as an ILP”
  11. Riedel and Clarke (2006) proposed an ILP formulation for dependency parsing which refines the arc-factored model by imposing linguistically motivated “hard” constraints that forbid some arc configurations.
    Page 3, “Dependency Parsing as an ILP”

See all papers in Proc. ACL 2009 that mention ILP.

See all papers in Proc. ACL that mention ILP.

Back to top.

dependency tree

Appears in 14 sentences as: dependency tree (11) dependency trees (4)
In Concise Integer Linear Programming Formulations for Dependency Parsing
  1. A dependency tree is a lightweight syntactic representation that attempts to capture functional relationships between words.
    Page 1, “Dependency Parsing”
  2. We define the set of legal dependency parse trees of at (denoted 34:10)) as the set of O-arborescences of D, i.e., we admit each arborescence as a potential dependency tree .
    Page 2, “Dependency Parsing”
  3. Let y 6 32(30) be a legal dependency tree for at; if the arc a = (i,j> E y, we refer to i as the parent of j (denoted i = 7t(j)) and j as a child of i.
    Page 2, “Dependency Parsing”
  4. A dependency tree is called projective if it only contains projective arcs.
    Page 2, “Dependency Parsing”
  5. The formulation to be introduced in §3 makes use of the notion of the incidence vector associated with a dependency tree 3/ E 3?
    Page 2, “Dependency Parsing”
  6. Considering simultaneously all incidence vectors of legal dependency trees and taking the convex hull, we obtain a polyhedron that we call the arborescence polytope, denoted by Z Each vertex of Z can be identified with a dependency tree in The Minkowski-Weyl theorem (Rockafellar, 1970) ensures that Z has a representation of the form Z(:c) = {z 6 RV” | Az g b}, for some p—by-|A| matrix A and some vector b in RP.
    Page 2, “Dependency Parsing”
  7. In §3, we will provide a compact representation of an outer polytope Z 2 Z whose integer vertices correspond to dependency trees .
    Page 2, “Dependency Parsing”
  8. Hence, the problem of finding the dependency tree that maximizes some linear function of the inci-
    Page 2, “Dependency Parsing”
  9. 1The general case where A Q V2 is also of interest; it arises whenever a constraint or a lexicon forbids some arcs from appearing in dependency tree .
    Page 2, “Dependency Parsing”
  10. A subgraph y = (V, B) is a legal dependency tree (i.e., y E y(m)) if and only if the following conditions are met:
    Page 4, “Dependency Parsing as an ILP”
  11. Furthermore, the integer points of Z are precisely the incidence vectors of dependency trees in 32(30); these are obtained by replacing Eq.
    Page 4, “Dependency Parsing as an ILP”

See all papers in Proc. ACL 2009 that mention dependency tree.

See all papers in Proc. ACL that mention dependency tree.

Back to top.

parse trees

Appears in 9 sentences as: parse tree (1) parse trees (8)
In Concise Integer Linear Programming Formulations for Dependency Parsing
  1. Let us first describe formally the set of legal dependency parse trees .
    Page 1, “Dependency Parsing”
  2. We define the set of legal dependency parse trees of at (denoted 34:10)) as the set of O-arborescences of D, i.e., we admit each arborescence as a potential dependency tree.
    Page 2, “Dependency Parsing”
  3. Combinatorial algorithms (Chu and Liu, 1965; Edmonds, 1967) can solve this problem in cubic time.4 If the dependency parse trees are restricted to be projective, cubic-time algorithms are available via dynamic programming (Eisner, 1996).
    Page 3, “Dependency Parsing”
  4. Our formulations rely on a concise polyhedral representation of the set of candidate dependency parse trees , as sketched in §2.l.
    Page 3, “Dependency Parsing as an ILP”
  5. For most languages, dependency parse trees tend to be nearly projective (cf.
    Page 6, “Dependency Parsing as an ILP”
  6. It would be straightforward to adapt the constraints in §3.5 to allow only projective parse trees : simply force 23" = 0 for any a E A.
    Page 6, “Dependency Parsing as an ILP”
  7. Then, 32 will be redefined as the set ofprojec-tive dependency parse trees .
    Page 7, “Dependency Parsing as an ILP”
  8. net /projects /mstparser 11Note that, unlike reranking approaches, there are still exponentially many candidate parse trees after pruning.
    Page 7, “Experiments”
  9. 13Unlike our model, the hybrid models used here as baselines make use of the dependency labels at training time; indeed, the transition-based parser is trained to predict a labeled dependency parse tree , and the graph-based parser use these predicted labels as input features.
    Page 7, “Experiments”

See all papers in Proc. ACL 2009 that mention parse trees.

See all papers in Proc. ACL that mention parse trees.

Back to top.

graph-based

Appears in 7 sentences as: graph-based (7)
In Concise Integer Linear Programming Formulations for Dependency Parsing
  1. There has been much recent work on dependency parsing using graph-based , transition-based, and hybrid methods; see Nivre and McDonald (2008) for an overview.
    Page 2, “Dependency Parsing”
  2. Typical graph-based methods consider linear classifiers of the form
    Page 2, “Dependency Parsing”
  3. Our approach will build a graph-based parser without the drawback of a restriction to local features.
    Page 3, “Dependency Parsing as an ILP”
  4. baselines, all of them state-of-the-art parsers based on non-arc-factored models: the second order model of McDonald and Pereira (2006), the hybrid model of Nivre and McDonald (2008), which combines a (labeled) transition-based and a graph-based parser, and a refinement of the latter, due to Martins et al.
    Page 7, “Experiments”
  5. Comparing with the baselines, we observe that our full model outperforms that of McDonald and Pereira (2006), and is in line with the most accurate dependency parsers (Nivre and McDonald, 2008; Martins et al., 2008), obtained by combining transition-based and graph-based parsers.14 Notice that our model, compared with these hybrid parsers, has the advantage of not requiring an ensemble configuration (eliminating, for example, the need to tune two parsers).
    Page 7, “Experiments”
  6. 13Unlike our model, the hybrid models used here as baselines make use of the dependency labels at training time; indeed, the transition-based parser is trained to predict a labeled dependency parse tree, and the graph-based parser use these predicted labels as input features.
    Page 7, “Experiments”
  7. 14See also Zhang and Clark (2008) for a different approach that combines transition-based and graph-based methods.
    Page 7, “Experiments”

See all papers in Proc. ACL 2009 that mention graph-based.

See all papers in Proc. ACL that mention graph-based.

Back to top.

linear program

Appears in 5 sentences as: linear program (3) linear programming (2)
In Concise Integer Linear Programming Formulations for Dependency Parsing
  1. We formulate the problem of non-projective dependency parsing as a polynomial-sized integer linear program .
    Page 1, “Abstract”
  2. The model parameters are learned in a max-margin framework by employing a linear programming relaxation.
    Page 1, “Abstract”
  3. Much attention has recently been devoted to integer linear programming (ILP) formulations of NLP problems, with interesting results in applications like semantic role labeling (Roth and Yih, 2005; Punyakanok et al., 2004), dependency parsing (Riedel and Clarke, 2006), word alignment for machine translation (Lacoste-Julien et al., 2006), summarization (Clarke and Lapata, 2008), and coreference resolution (Denis and Baldridge, 2007), among others.
    Page 1, “Introduction”
  4. A linear program (LP) is an optimization problem of the form
    Page 3, “Dependency Parsing as an ILP”
  5. If we add the constraint x E Zd, then the above is called an integer linear program (ILP).
    Page 3, “Dependency Parsing as an ILP”

See all papers in Proc. ACL 2009 that mention linear program.

See all papers in Proc. ACL that mention linear program.

Back to top.

soft constraints

Appears in 3 sentences as: Soft constraints (1) soft constraints (2)
In Concise Integer Linear Programming Formulations for Dependency Parsing
  1. Our formulation is able to handle nonlocal output features in an efficient manner; not only is it compatible with prior knowledge encoded as hard constraints, it can also learn soft constraints from data.
    Page 1, “Abstract”
  2. 0 Soft constraints may be automatically learned from data.
    Page 1, “Introduction”
  3. These features can act as soft constraints whose penalty values are automatically learned from data; in addition, our model is also compatible with expert knowledge in the form of hard constraints.
    Page 8, “Conclusions”

See all papers in Proc. ACL 2009 that mention soft constraints.

See all papers in Proc. ACL that mention soft constraints.

Back to top.