Stop-probability estimates computed on a large corpus improve Unsupervised Dependency Parsing
Mareċek, David and Straka, Milan

Article Structure

Abstract

Even though the quality of unsupervised dependency parsers grows, they often fail in recognition of very basic dependencies.

Introduction

The task of unsupervised dependency parsing (which strongly relates to the grammar induction task) has become popular in the last decade, and its quality has been greatly increasing during this period.

Related Work

Reducibility: The notion of reducibility belongs to the traditional linguistic criteria for recogniz-

STOP-probability estimation

3.1 Recognition of reducible sequences

Model

We use the standard generative Dependency Model with Valence (Klein and Manning, 2004).

Inference

We employ the Gibbs sampling algorithm (Gilks et al., 1996).

Experiments

6.1 Data

Conclusions and Future Work

In this work, we studied the possibility of estimating the DMV stop-probabilities from a large raw corpus.

Topics

treebanks

Appears in 24 sentences as: treebank (11) treebanks (14)
In Stop-probability estimates computed on a large corpus improve Unsupervised Dependency Parsing
  1. By incorporating this knowledge into Dependency Model with Valence, we managed to considerably outperform the state-of-the-art results in terms of average attachment score over 20 treebanks from CoNLL 2006 and 2007 shared tasks.
    Page 1, “Abstract”
  2. This is still far below the supervised approaches, but their indisputable advantage is the fact that no annotated treebanks are needed and the induced structures are not burdened by any linguistic conventions.
    Page 1, “Introduction”
  3. supervised parsers always only simulate the treebanks they were trained on, whereas unsupervised parsers have an ability to be fitted to different particular applications.
    Page 1, “Introduction”
  4. stop words in the treebank should be 2/3.
    Page 4, “STOP-probability estimation”
  5. Finally, we obtain the probability of the whole generated treebank as a product over the trees:
    Page 5, “Model”
  6. no matter how the trees are ordered in the treebank , the Ptreebank is always the same.
    Page 5, “Model”
  7. The first type are CoNLL treebanks from the year 2006 (Buchholz and Marsi, 2006) and 2007 (Nivre et al., 2007), which we use for inference and for evaluation.
    Page 6, “Experiments”
  8. The Wikipedia texts were automatically tokenized and segmented to sentences so that their tokenization was similar to the one in the CoNLL evaluation treebanks .
    Page 6, “Experiments”
  9. To evaluate the quality of our estimations, we compare them with P301), the stop probabilities computed directly on the evaluation treebanks .
    Page 7, “Experiments”
  10. The individual points represent the individual POS tags, their size (area) shows their frequency in the particular treebank .
    Page 7, “Experiments”
  11. Their estimated stop probability is not very high (about 0.65), while in the real treebank they are almost always leaves.
    Page 7, “Experiments”

See all papers in Proc. ACL 2013 that mention treebanks.

See all papers in Proc. ACL that mention treebanks.

Back to top.

POS tags

Appears in 21 sentences as: POS tag (9) POS tags (15)
In Stop-probability estimates computed on a large corpus improve Unsupervised Dependency Parsing
  1. Rasooli and Faili (2012) and Bisk and Hockenmaier (2012) made some efforts to boost the verbocentricity of the inferred structures; however, both of the approaches require manual identification of the POS tags marking the verbs, which renders them useless when unsupervised POS tags are employed.
    Page 1, “Introduction”
  2. Our dependency model contained a submodel which directly prioritized subtrees that form reducible sequences of POS tags .
    Page 2, “Related Work”
  3. Reducibility scores of given POS tag sequences were estimated using a large corpus of Wikipedia articles.
    Page 2, “Related Work”
  4. The weakness of this approach was the fact that longer sequences of POS tags are very sparse and no reducibility scores could be estimated for them.
    Page 2, “Related Work”
  5. In this paper, we avoid this shortcoming by estimating the STOP probabilities for individual POS tags only.
    Page 2, “Related Work”
  6. (2011) identify the POS tags by a cross-lingual transfer.
    Page 3, “Related Work”
  7. Hereinafter, Psfifzxch, dir) denotes the STOP-probability we want to estimate from a large corpus; ch is the head’s POS tag and dir is the direction in which the STOP probability is estimated.
    Page 3, “STOP-probability estimation”
  8. For each POS tag 0;, in the given corpus, we first compute its left and right “raw” score Sst0p(ch, left) and Sst0p(ch, right) as the relative number of times a word with POS tag 0;, was in the first (or last) position in a reducible sequence found in the corpus.
    Page 3, “STOP-probability estimation”
  9. Their main purpose is to sort the POS tags according to their “reducibility”.
    Page 3, “STOP-probability estimation”
  10. It may happen that for many POS tags there are no reducible sequences found.
    Page 3, “STOP-probability estimation”
  11. Such smoothing ensures that more frequent irreducible POS tags get a lower Sstop score than the less frequent ones.
    Page 4, “STOP-probability estimation”

See all papers in Proc. ACL 2013 that mention POS tags.

See all papers in Proc. ACL that mention POS tags.

Back to top.

dependency tree

Appears in 10 sentences as: dependency tree (7) dependency trees (3)
In Stop-probability estimates computed on a large corpus improve Unsupervised Dependency Parsing
  1. 1The adjacent-word baseline is a dependency tree in which each word is attached to the previous (or the following) word.
    Page 1, “Introduction”
  2. Figure 1: Example of a dependency tree .
    Page 2, “Introduction”
  3. Figure 1 shows an example of a dependency tree .
    Page 2, “Introduction”
  4. The probability of the whole dependency tree T is the following:
    Page 4, “Model”
  5. A random projective dependency tree is assigned to each sentence in the corpus.
    Page 5, “Inference”
  6. For each sentence, we sample a new dependency tree based on all other trees that are currently in the corpus.
    Page 5, “Inference”
  7. Parsing: Based on the collected counts, we compute the final dependency trees using the Chu-Liu/Edmonds’ algorithm (1965) for finding maximum directed spanning trees.
    Page 5, “Inference”
  8. Our goal is to sample a new projective dependency tree T with probability proportional to Ptree Since the factors are exchangeable, we can deal with any tree as if it was the last one in the corpus.
    Page 5, “Inference”
  9. Even if we average the strictly projective dependency trees , some non-projective edges may appear in the result.
    Page 6, “Inference”
  10. This is quite high, but still, it is one of the lowest among other more frequent tags, and thus verbs tend to be the roots of the dependency trees .
    Page 8, “Experiments”

See all papers in Proc. ACL 2013 that mention dependency tree.

See all papers in Proc. ACL that mention dependency tree.

Back to top.

CoNLL

Appears in 9 sentences as: CoNLL (9)
In Stop-probability estimates computed on a large corpus improve Unsupervised Dependency Parsing
  1. By incorporating this knowledge into Dependency Model with Valence, we managed to considerably outperform the state-of-the-art results in terms of average attachment score over 20 treebanks from CoNLL 2006 and 2007 shared tasks.
    Page 1, “Abstract”
  2. The first type are CoNLL treebanks from the year 2006 (Buchholz and Marsi, 2006) and 2007 (Nivre et al., 2007), which we use for inference and for evaluation.
    Page 6, “Experiments”
  3. The Wikipedia texts were automatically tokenized and segmented to sentences so that their tokenization was similar to the one in the CoNLL evaluation treebanks.
    Page 6, “Experiments”
  4. spective CoNLL training data.
    Page 7, “Experiments”
  5. The y-aXis shows the stop probabilities estimated on Wikipedia by our algorithm, while the X-aXis shows the stop probabilities computed on the evaluation CoNLL data.
    Page 7, “Experiments”
  6. mated from raw Wikipedia corpora (y-aXis) and of Pffop probabilities computed from CoNLL treebanks (X-aXis).
    Page 7, “Experiments”
  7. We do not provide results for Chinese since we do not have any appropriate tokenizer at our disposal (see Section 3), and also for Turkish from CoNLL 2006 since the data is not available to us.
    Page 8, “Experiments”
  8. We proved that such prior knowledge about stop-probabilities incorporated into the standard DMV model significantly improves the unsupervised dependency parsing and, since we are not aware of any other fully unsupervised dependency parser with higher average attachment score over CoNLL data, we state that we reached a new state-of-the-art result.5
    Page 8, “Conclusions and Future Work”
  9. However, they do not provide scores measured on other CoNLL treebanks.
    Page 8, “Conclusions and Future Work”

See all papers in Proc. ACL 2013 that mention CoNLL.

See all papers in Proc. ACL that mention CoNLL.

Back to top.

dependency parsing

Appears in 7 sentences as: dependency parser (1) dependency parsers (2) dependency parsing (5)
In Stop-probability estimates computed on a large corpus improve Unsupervised Dependency Parsing
  1. Even though the quality of unsupervised dependency parsers grows, they often fail in recognition of very basic dependencies.
    Page 1, “Abstract”
  2. The task of unsupervised dependency parsing (which strongly relates to the grammar induction task) has become popular in the last decade, and its quality has been greatly increasing during this period.
    Page 1, “Introduction”
  3. We have directly utilized the aforementioned criteria for dependency relations in unsupervised dependency parsing in our previous paper (Marecek and Zabokrtsky, 2012).
    Page 2, “Related Work”
  4. Dependency Model with Valence (DMV) has been the most popular approach to unsupervised dependency parsing in the recent years.
    Page 2, “Related Work”
  5. Other approaches to unsupervised dependency parsing were described e.g.
    Page 3, “Related Work”
  6. We proved that such prior knowledge about stop-probabilities incorporated into the standard DMV model significantly improves the unsupervised dependency parsing and, since we are not aware of any other fully unsupervised dependency parser with higher average attachment score over CoNLL data, we state that we reached a new state-of-the-art result.5
    Page 8, “Conclusions and Future Work”
  7. We suppose that many of the current works on unsupervised dependency parsers use gold POS tags only as a simplification of this task, and that the ultimate purpose of this effort is to develop a fully unsupervised induction of linguistic structure from raw texts that would be useful across many languages, domains, and applications.
    Page 9, “Conclusions and Future Work”

See all papers in Proc. ACL 2013 that mention dependency parsing.

See all papers in Proc. ACL that mention dependency parsing.

Back to top.

Gibbs sampling

Appears in 3 sentences as: Gibbs sampler (1) Gibbs sampling (2)
In Stop-probability estimates computed on a large corpus improve Unsupervised Dependency Parsing
  1. Section 5 describes the inference algorithm based on Gibbs sampling .
    Page 2, “Introduction”
  2. We employ the Gibbs sampling algorithm (Gilks et al., 1996).
    Page 5, “Inference”
  3. We have also found that the Gibbs sampler does not always converge to a similar grammar.
    Page 8, “Experiments”

See all papers in Proc. ACL 2013 that mention Gibbs sampling.

See all papers in Proc. ACL that mention Gibbs sampling.

Back to top.

grammar induction

Appears in 3 sentences as: grammar inducer (1) grammar induction (2)
In Stop-probability estimates computed on a large corpus improve Unsupervised Dependency Parsing
  1. The task of unsupervised dependency parsing (which strongly relates to the grammar induction task) has become popular in the last decade, and its quality has been greatly increasing during this period.
    Page 1, “Introduction”
  2. Such approaches achieve better results; however, they are useless for grammar induction from plain text.
    Page 3, “Related Work”
  3. The first one (spiIZ) is the DMV-based grammar inducer by Spitkovsky et al.
    Page 8, “Experiments”

See all papers in Proc. ACL 2013 that mention grammar induction.

See all papers in Proc. ACL that mention grammar induction.

Back to top.

subtrees

Appears in 3 sentences as: subtrees (3)
In Stop-probability estimates computed on a large corpus improve Unsupervised Dependency Parsing
  1. Our hypothesis is a generalization of the original hypothesis since it allows a reducible sequence to form several adjacent subtrees .
    Page 2, “Introduction”
  2. Our dependency model contained a submodel which directly prioritized subtrees that form reducible sequences of POS tags.
    Page 2, “Related Work”
  3. Or, in terms of dependency structure: A reducible sequence consists of one or more adjacent subtrees .
    Page 3, “STOP-probability estimation”

See all papers in Proc. ACL 2013 that mention subtrees.

See all papers in Proc. ACL that mention subtrees.

Back to top.