Dependency Grammar Induction via Bitext Projection Constraints
Ganchev, Kuzman and Gillenwater, Jennifer and Taskar, Ben

Article Structure

Abstract

Broad-coverage annotated treebanks necessary to train parsers do not exist for many resource-poor languages.

Introduction

For English and a handful of other languages, there are large, well-annotated corpora with a variety of linguistic information ranging from named entity to discourse structure.

Approach

At a high level our approach is illustrated in Figure 1(a).

Parsing Models

We explored two parsing models: a generative model used by several authors for unsupervised induction and a discriminative model used for fully supervised training.

Posterior Regularization

Graca et al.

Experiments

We conducted experiments on two languages: Bulgarian and Spanish, using each of the parsing models.

Related Work

Our work most closely relates to Hwa et al.

Conclusion

In this paper, we proposed a novel and effective learning scheme for transferring dependency parses across bitext.

Topics

generative model

Appears in 11 sentences as: Generative model (1) generative model (8) generative models (1) generic model (1)
In Dependency Grammar Induction via Bitext Projection Constraints
  1. We explored two parsing models: a generative model used by several authors for unsupervised induction and a discriminative model used for fully supervised training.
    Page 2, “Parsing Models”
  2. We also used a generative model based on dependency model with valence (Klein and Manning, 2004).
    Page 3, “Parsing Models”
  3. Generative model
    Page 5, “Experiments”
  4. We found that the generative model gets confused by punctuation and tends to predict that periods at the end of sentences are the parents of words in the sentence.
    Page 5, “Experiments”
  5. We call the generic model described above “no-rules” to distinguish it from the language-specific constraints we introduce in the sequel.
    Page 5, “Experiments”
  6. Discriminative models outperform the generative models in the majority of cases.
    Page 5, “Experiments”
  7. Using the straightforward approach outlined above is a dramatic improvement over the standard link-left baseline (and the unsupervised generative model as we discuss below), however it doesn’t have any information about the annotation guidelines used for the testing corpus.
    Page 6, “Experiments”
  8. The generative model we use is a state of the art model for unsupervised parsing and is our only
    Page 6, “Experiments”
  9. Unfortunately, we found generative model performance was disappointing overall.
    Page 7, “Experiments”
  10. As we shall see below, the discriminative parser performs even better than the generative model .
    Page 7, “Experiments”
  11. While the generative model can get confused and perform poorly when the training data contains very long sentences, the discriminative parser does not appear to have this drawback.
    Page 8, “Experiments”

See all papers in Proc. ACL 2009 that mention generative model.

See all papers in Proc. ACL that mention generative model.

Back to top.

parse tree

Appears in 6 sentences as: parse tree (5) parse trees (1)
In Dependency Grammar Induction via Bitext Projection Constraints
  1. In particular, we address challenges (1) and (2) by avoiding commitment to an entire projected parse tree in the target language during training.
    Page 2, “Introduction”
  2. (2005) also note these problems and solve them by introducing dozens of rules to transform the transferred parse trees .
    Page 2, “Approach”
  3. The parsing model defines a conditional distribution p9(z | x) over each projective parse tree 2 for a particular sentence X, parameterized by a vector 6.
    Page 2, “Parsing Models”
  4. where z is a directed edge contained in the parse tree 2 and gb is a feature function.
    Page 3, “Parsing Models”
  5. where r(x) is the part of speech tag of the root of the parse tree 2, z is an edge from parent zp to child 20 in direction zd, either left or right, and vz indicates valency—false if zp has no other children further from it in direction zd than 20, true otherwise.
    Page 3, “Parsing Models”
  6. The baseline constructs a full parse tree from the incomplete and possibly conflicting transferred edges using a simple random process.
    Page 5, “Experiments”

See all papers in Proc. ACL 2009 that mention parse tree.

See all papers in Proc. ACL that mention parse tree.

Back to top.

POS tag

Appears in 5 sentences as: POS tag (3) POS tagged (1) POS tagger (1) POS tags (1)
In Dependency Grammar Induction via Bitext Projection Constraints
  1. We used the Tokyo tagger (Tsuruoka and Tsujii, 2005) to POS tag the English tokens, and generated parses using the first-order model of McDonald et al.
    Page 4, “Experiments”
  2. For Bulgarian we trained the Stanford POS tagger (Toutanova et al., 2003) on the Bul-
    Page 4, “Experiments”
  3. The Spanish Europarl data was POS tagged with the FreeLing language analyzer (Atserias et al., 2006).
    Page 5, “Experiments”
  4. To prevent such false transfer, we filter out alignments between incompatible POS tags .
    Page 5, “Experiments”
  5. The left panel of Table 3 shows the most common errors by child POS tag, as well as by true parent and guessed parent POS tag .
    Page 5, “Experiments”

See all papers in Proc. ACL 2009 that mention POS tag.

See all papers in Proc. ACL that mention POS tag.

Back to top.

word alignments

Appears in 5 sentences as: word alignment (2) word alignments (4)
In Dependency Grammar Induction via Bitext Projection Constraints
  1. Nevertheless, several challenges to accurate training and evaluation from aligned bitext remain: (1) partial word alignment due to non-literal or distant translation; (2) errors in word alignments and source language parses, (3) grammatical annotation choices that differ across languages and linguistic theories (e. g., how to analyze auxiliary verbs, conjunctions).
    Page 1, “Introduction”
  2. First, parser and word alignment errors cause much of the transferred information to be wrong.
    Page 2, “Approach”
  3. For both corpora, we performed word alignments with the open source PostCAT (Graca et al., 2009) toolkit.
    Page 4, “Experiments”
  4. Preliminary experiments showed that our word alignments were not always appropriate for syntactic transfer, even when they were correct for translation.
    Page 5, “Experiments”
  5. (2005) found that transferring dependencies directly was not sufficient to get a parser with reasonable performance, even when both the source language parses and the word alignments are performed by hand.
    Page 8, “Related Work”

See all papers in Proc. ACL 2009 that mention word alignments.

See all papers in Proc. ACL that mention word alignments.

Back to top.

CoNLL

Appears in 4 sentences as: CoNLL (4)
In Dependency Grammar Induction via Bitext Projection Constraints
  1. We evaluate our approach on Bulgarian and Spanish CoNLL shared task data and show that we consistently outperform unsupervised methods and can outperform supervised learning for limited training data.
    Page 1, “Abstract”
  2. We evaluate our results on the Bulgarian and Spanish corpora from the CoNLL X shared task.
    Page 2, “Introduction”
  3. gtreebank corpus from CoNLL X.
    Page 5, “Experiments”
  4. Figure 2: Learning curve of the discriminative no-rules transfer model on Bulgarian bitext, testing on CoNLL train sentences of up to 10 words.
    Page 6, “Experiments”

See all papers in Proc. ACL 2009 that mention CoNLL.

See all papers in Proc. ACL that mention CoNLL.

Back to top.

dependency parser

Appears in 4 sentences as: dependency parser (2) dependency parses (1) dependency parsing (1)
In Dependency Grammar Induction via Bitext Projection Constraints
  1. Recently, dependency parsing has gained popularity as a simpler, computationally more efficient alternative to constituency parsing and has spurred several supervised learning approaches (Eisner, 1996; Yamada and Matsumoto, 2003a; Nivre and Nilsson, 2005; McDonald et al., 2005) as well as unsupervised induction (Klein and Manning, 2004; Smith and Eisner, 2006).
    Page 1, “Introduction”
  2. A parallel corpus is word-level aligned using an alignment toolkit (Graca et al., 2009) and the source (English) is parsed using a dependency parser (McDonald et al., 2005).
    Page 2, “Approach”
  3. In this volume (Druck et al., 2009) use this framework to train a dependency parser based on constraints stated as corpus-wide expected values of linguistic rules.
    Page 8, “Related Work”
  4. In this paper, we proposed a novel and effective learning scheme for transferring dependency parses across bitext.
    Page 8, “Conclusion”

See all papers in Proc. ACL 2009 that mention dependency parser.

See all papers in Proc. ACL that mention dependency parser.

Back to top.

parsing model

Appears in 4 sentences as: parsing model (2) parsing models (2)
In Dependency Grammar Induction via Bitext Projection Constraints
  1. After filtering to identify well-behaved sentences and high confidence projected dependencies, we learn a probabilistic parsing model using the posterior regularization framework (Graca et al., 2008).
    Page 2, “Approach”
  2. We explored two parsing models : a generative model used by several authors for unsupervised induction and a discriminative model used for fully supervised training.
    Page 2, “Parsing Models”
  3. The parsing model defines a conditional distribution p9(z | x) over each projective parse tree 2 for a particular sentence X, parameterized by a vector 6.
    Page 2, “Parsing Models”
  4. We conducted experiments on two languages: Bulgarian and Spanish, using each of the parsing models .
    Page 4, “Experiments”

See all papers in Proc. ACL 2009 that mention parsing model.

See all papers in Proc. ACL that mention parsing model.

Back to top.

sentence pair

Appears in 4 sentences as: sentence pair (4)
In Dependency Grammar Induction via Bitext Projection Constraints
  1. Figure 1(b) shows an aligned sentence pair example where dependencies are perfectly conserved across the alignment.
    Page 2, “Approach”
  2. For example, in some sentence pair we might find 10 edges that have both end points aligned and can be transferred.
    Page 2, “Approach”
  3. In grammar transfer, our basic constraint is of the form: the expected proportion of conserved edges in a sentence pair is at least 77 (the exact proportion we used was 0.9, which was determined using unlabeled data as described in Section 5).
    Page 3, “Posterior Regularization”
  4. Our basic model uses constraints of the form: the expected proportion of conserved edges in a sentence pair is at least 77 = 90%.1
    Page 5, “Experiments”

See all papers in Proc. ACL 2009 that mention sentence pair.

See all papers in Proc. ACL that mention sentence pair.

Back to top.

treebank

Appears in 4 sentences as: treebank (2) treebanks (2)
In Dependency Grammar Induction via Bitext Projection Constraints
  1. Broad-coverage annotated treebanks necessary to train parsers do not exist for many resource-poor languages.
    Page 1, “Abstract”
  2. We evaluate our approach by transferring from an English parser trained on the Penn treebank to Bulgarian and Spanish.
    Page 2, “Introduction”
  3. In our experiments we evaluate the learned models on dependency treebanks (Nivre et al., 2007).
    Page 2, “Approach”
  4. (2005) with projective decoding, trained on sections 2-21 of the Penn treebank with dependencies extracted using the head rules of Yamada and Matsumoto (2003b).
    Page 4, “Experiments”

See all papers in Proc. ACL 2009 that mention treebank.

See all papers in Proc. ACL that mention treebank.

Back to top.

word-level

Appears in 3 sentences as: word-level (3)
In Dependency Grammar Induction via Bitext Projection Constraints
  1. We consider generative and discriminative models for dependency grammar induction that use word-level alignments and a source language parser (English) to constrain the space of possible target trees.
    Page 1, “Abstract”
  2. For example, several early works (Yarowsky and Ngai, 2001; Yarowsky et al., 2001; Merlo et al., 2002) demonstrate transfer of shallow processing tools such as part-of-speech taggers and noun-phrase chunkers by using word-level alignment models (Brown et al., 1994; Och and Ney, 2000).
    Page 1, “Introduction”
  3. A parallel corpus is word-level aligned using an alignment toolkit (Graca et al., 2009) and the source (English) is parsed using a dependency parser (McDonald et al., 2005).
    Page 2, “Approach”

See all papers in Proc. ACL 2009 that mention word-level.

See all papers in Proc. ACL that mention word-level.

Back to top.