Utilizing Dependency Language Models for Graph-based Dependency Parsing Models
Chen, Wenliang and Zhang, Min and Li, Haizhou

Article Structure

Abstract

Most previous graph-based parsing models increase decoding complexity when they use high-order features due to exact-inference decoding.

Introduction

In recent years, there are many data-driven models that have been proposed for dependency parsing (McDonald and Nivre, 2007).

Dependency language model

Language models play a very important role for statistical machine translation (SMT).

Parsing with dependency language model

In this section, we propose a parsing model which includes the dependency language model by extending the model of McDonald et al.

Decoding

In this section, we turn to the problem of adding the DLM in the decoding algorithm.

Implementation Details

5.1 Baseline parser

Experiments

We conducted experiments on English and Chinese data.

Analysis

Dependency parsers tend to perform worse on heads which have many children.

Related work

Several previous studies related to our work have been conducted.

Conclusion

We have presented an approach to utilizing the dependency language model to improve graph-based dependency parsing.

Topics

parsing model

Appears in 19 sentences as: parsing model (14) parsing models (5)
In Utilizing Dependency Language Models for Graph-based Dependency Parsing Models
  1. Most previous graph-based parsing models increase decoding complexity when they use high-order features due to exact-inference decoding.
    Page 1, “Abstract”
  2. In this paper, we present an approach to enriching high—order feature representations for graph-based dependency parsing models using a dependency language model and beam search.
    Page 1, “Abstract”
  3. Based on the dependency language model, we represent a set of features for the parsing model .
    Page 1, “Abstract”
  4. Finally, the features are efficiently integrated into the parsing model during decoding using beam search.
    Page 1, “Abstract”
  5. Among them, graph-based dependency parsing models have achieved state-of-the-art performance for a wide range of Ian-guages as shown in recent CoNLL shared tasks
    Page 1, “Introduction”
  6. The parsing model searches for the final dependency trees by considering the original scores and the scores of DLM.
    Page 2, “Introduction”
  7. The DLM-based features can capture the N-gram information of the parent-children structures for the parsing model .
    Page 2, “Introduction”
  8. Our new parsing model can utilize rich high-order feature representations but without increasing the complexity.
    Page 2, “Introduction”
  9. 0 We utilize the dependency language model to enhance the graph-based parsing model .
    Page 2, “Introduction”
  10. o The new parsing model uses the rich high-order features defined over a view of large scope and and additional large raw corpus, but without increasing the decoding complexity.
    Page 2, “Introduction”
  11. In this paper, we use a linear model to calculate the scores for the parsing models (defined in Section 3.1).
    Page 2, “Dependency language model”

See all papers in Proc. ACL 2012 that mention parsing model.

See all papers in Proc. ACL that mention parsing model.

Back to top.

language model

Appears in 15 sentences as: language model (13) Language models (1) language models (1)
In Utilizing Dependency Language Models for Graph-based Dependency Parsing Models
  1. In this paper, we present an approach to enriching high—order feature representations for graph-based dependency parsing models using a dependency language model and beam search.
    Page 1, “Abstract”
  2. The dependency language model is built on a large-amount of additional auto-parsed data that is processed by a baseline parser.
    Page 1, “Abstract”
  3. Based on the dependency language model , we represent a set of features for the parsing model.
    Page 1, “Abstract”
  4. In this paper, we solve this issue by enriching the feature representations for a graph-based model using a dependency language model (DLM) (Shen et al., 2008).
    Page 1, “Introduction”
  5. 0 We utilize the dependency language model to enhance the graph-based parsing model.
    Page 2, “Introduction”
  6. Language models play a very important role for statistical machine translation (SMT).
    Page 2, “Dependency language model”
  7. The standard N-gram based language model predicts the next word based on the N — 1 immediate previous words.
    Page 2, “Dependency language model”
  8. However, the traditional N-gram language model can not capture long-distance word relations.
    Page 2, “Dependency language model”
  9. (2008) proposed a dependency language model (DLM) to exploit long-distance word relations for SMT.
    Page 2, “Dependency language model”
  10. Suppose, we use a N-gram dependency language model .
    Page 2, “Dependency language model”
  11. In this section, we propose a parsing model which includes the dependency language model by extending the model of McDonald et al.
    Page 2, “Parsing with dependency language model”

See all papers in Proc. ACL 2012 that mention language model.

See all papers in Proc. ACL that mention language model.

Back to top.

graph-based

Appears in 13 sentences as: Graph-based (1) graph-based (13)
In Utilizing Dependency Language Models for Graph-based Dependency Parsing Models
  1. Most previous graph-based parsing models increase decoding complexity when they use high-order features due to exact-inference decoding.
    Page 1, “Abstract”
  2. In this paper, we present an approach to enriching high—order feature representations for graph-based dependency parsing models using a dependency language model and beam search.
    Page 1, “Abstract”
  3. Among them, graph-based dependency parsing models have achieved state-of-the-art performance for a wide range of Ian-guages as shown in recent CoNLL shared tasks
    Page 1, “Introduction”
  4. In the graph-based models, dependency parsing is treated as a structured prediction problem in which the graphs are usually represented as factored structures.
    Page 1, “Introduction”
  5. How to enrich high-order feature representations without increasing the decoding complexity for graph-based models becomes a very challenging problem in the dependency parsing task.
    Page 1, “Introduction”
  6. In this paper, we solve this issue by enriching the feature representations for a graph-based model using a dependency language model (DLM) (Shen et al., 2008).
    Page 1, “Introduction”
  7. 0 We utilize the dependency language model to enhance the graph-based parsing model.
    Page 2, “Introduction”
  8. 3.1 Graph-based parsing model
    Page 3, “Parsing with dependency language model”
  9. The graph-based parsing model aims to search for the maximum spanning tree (MST) in a graph (McDonald et al., 2005).
    Page 3, “Parsing with dependency language model”
  10. We implement our parsers based on the MSTParserl, a freely available implementation of the graph-based model proposed by (McDonald and Pereira, 2006).
    Page 5, “Implementation Details”
  11. Table 7 shows the performance of the graph-based systems that were compared, where McDonald06 refers to the second-order parser of McDonald
    Page 7, “Experiments”

See all papers in Proc. ACL 2012 that mention graph-based.

See all papers in Proc. ACL that mention graph-based.

Back to top.

dependency parsing

Appears in 10 sentences as: Dependency parsers (1) dependency parsing (9)
In Utilizing Dependency Language Models for Graph-based Dependency Parsing Models
  1. In this paper, we present an approach to enriching high—order feature representations for graph-based dependency parsing models using a dependency language model and beam search.
    Page 1, “Abstract”
  2. In recent years, there are many data-driven models that have been proposed for dependency parsing (McDonald and Nivre, 2007).
    Page 1, “Introduction”
  3. Among them, graph-based dependency parsing models have achieved state-of-the-art performance for a wide range of Ian-guages as shown in recent CoNLL shared tasks
    Page 1, “Introduction”
  4. In the graph-based models, dependency parsing is treated as a structured prediction problem in which the graphs are usually represented as factored structures.
    Page 1, “Introduction”
  5. How to enrich high-order feature representations without increasing the decoding complexity for graph-based models becomes a very challenging problem in the dependency parsing task.
    Page 1, “Introduction”
  6. Dependency parsers tend to perform worse on heads which have many children.
    Page 8, “Analysis”
  7. (2008) used a clustering algorithm to produce word clusters on a large amount of unannotated data and represented new features based on the clusters for dependency parsing models.
    Page 8, “Related work”
  8. (2009) proposed an approach that extracted partial tree structures from a large amount of data and used them as the additional features to improve dependency parsing .
    Page 8, “Related work”
  9. They extended a Semi-supervised Structured Conditional Model (SS-SCM)(Suzuki and Isozaki, 2008) to the dependency parsing problem and combined their method with the approach of Koo et al.
    Page 8, “Related work”
  10. We have presented an approach to utilizing the dependency language model to improve graph-based dependency parsing .
    Page 8, “Conclusion”

See all papers in Proc. ACL 2012 that mention dependency parsing.

See all papers in Proc. ACL that mention dependency parsing.

Back to top.

dependency tree

Appears in 8 sentences as: dependency tree (4) dependency trees (4)
In Utilizing Dependency Language Models for Graph-based Dependency Parsing Models
  1. The basic idea behind is that we use the DLM to evaluate whether a valid dependency tree (McDonald and Nivre, 2007)
    Page 1, “Introduction”
  2. The parsing model searches for the final dependency trees by considering the original scores and the scores of DLM.
    Page 2, “Introduction”
  3. (2008), to score entire dependency trees .
    Page 2, “Dependency language model”
  4. Let y be a dependency tree for ac and H be a set that includes the words that have at least one dependent.
    Page 2, “Dependency language model”
  5. For a dependency tree , we calculate the probability as follows:
    Page 2, “Dependency language model”
  6. Let T(Gx) be the set of all the subgraphs of Ga; that are valid dependency trees (McDonald and Nivre, 2007) for sentence :10.
    Page 3, “Parsing with dependency language model”
  7. The formulation defines the score of a dependency tree y E T(Gx) to be the sum of the edge scores,
    Page 3, “Parsing with dependency language model”
  8. Given the dependency trees , we estimate the probability distribution by relative frequency:
    Page 5, “Implementation Details”

See all papers in Proc. ACL 2012 that mention dependency tree.

See all papers in Proc. ACL that mention dependency tree.

Back to top.

UAS

Appears in 7 sentences as: UAS (10)
In Utilizing Dependency Language Models for Graph-based Dependency Parsing Models
  1. We measured the parser quality by the unlabeled attachment score ( UAS ), i.e., the percentage of tokens (excluding all punctuation tokens) with the correct HEAD.
    Page 6, “Experiments”
  2. Figure 4 shows the UAS curves on the development set, where K is beam size for Intersect and K-best for Rescoring, the X-aXis represents K, and the Y—aXis represents the UAS scores.
    Page 6, “Experiments”
  3. UAS
    Page 6, “Experiments”
  4. Orderl UAS Order2 UAS
    Page 7, “Experiments”
  5. Orderl UAS Order2 UAS MST1 86.38 MST2 88.11 MST—DLMl 90.66 MST—DLM2 91.62 MSTBl 88.38 MSTB2 88.66 MSTB—DLMl 91.38 MSTB—DLM2 91.59
    Page 7, “Experiments”
  6. Type System UAS Cost McDonald06 91.5 0(n3) G KooO8-standard 92.02 0(n4) Koo 10-modell 93.04 0(n4) KooOS-dep2c 93.16 0(n4) Suzuki09 93.79 0(n4) Chen09-ord2s 92.51 0(n3) Zhoull 92.64 0(n4)
    Page 7, “Experiments”
  7. System UAS Chen08 86.52
    Page 8, “Experiments”

See all papers in Proc. ACL 2012 that mention UAS.

See all papers in Proc. ACL that mention UAS.

Back to top.

N-gram

Appears in 7 sentences as: N-gram (7)
In Utilizing Dependency Language Models for Graph-based Dependency Parsing Models
  1. The N-gram DLM has the ability to predict the next child based on the N-l immediate previous children and their head (Shen et al., 2008).
    Page 1, “Introduction”
  2. The DLM-based features can capture the N-gram information of the parent-children structures for the parsing model.
    Page 2, “Introduction”
  3. The standard N-gram based language model predicts the next word based on the N — 1 immediate previous words.
    Page 2, “Dependency language model”
  4. However, the traditional N-gram language model can not capture long-distance word relations.
    Page 2, “Dependency language model”
  5. The N-gram DLM predicts the next child of a head based on the N — 1 immediate previous children and the head itself.
    Page 2, “Dependency language model”
  6. Suppose, we use a N-gram dependency language model.
    Page 2, “Dependency language model”
  7. Then, we studied the effect of adding different N-gram DLMs to MSTl.
    Page 6, “Experiments”

See all papers in Proc. ACL 2012 that mention N-gram.

See all papers in Proc. ACL that mention N-gram.

Back to top.

development set

Appears in 4 sentences as: development set (3) development sets (1)
In Utilizing Dependency Language Models for Graph-based Dependency Parsing Models
  1. The numbers, 10% and 30%, are tuned on the development sets in the experiments.
    Page 5, “Implementation Details”
  2. Figure 4 shows the UAS curves on the development set , where K is beam size for Intersect and K-best for Rescoring, the X-aXis represents K, and the Y—aXis represents the UAS scores.
    Page 6, “Experiments”
  3. Table 3: The parsing times on the development set (seconds for all the sentences)
    Page 6, “Experiments”
  4. Table 3 shows the parsing times of Intersect on the development set for English.
    Page 6, “Experiments”

See all papers in Proc. ACL 2012 that mention development set.

See all papers in Proc. ACL that mention development set.

Back to top.

Semi-supervised

Appears in 4 sentences as: Semi-supervised (2) semi-supervised (2)
In Utilizing Dependency Language Models for Graph-based Dependency Parsing Models
  1. Suzuki2009 (Suzuki et al., 2009) reported the best reported result by combining a Semi-supervised Structured Conditional Model (Suzuki and Isozaki, 2008) with the method of (Koo et al., 2008).
    Page 7, “Experiments”
  2. G denotes the supervised graph-based parsers, S denotes the graph-based parsers with semi-supervised methods, D denotes our new parsers
    Page 7, “Experiments”
  3. (2009) presented a semi-supervised learning approach.
    Page 8, “Related work”
  4. They extended a Semi-supervised Structured Conditional Model (SS-SCM)(Suzuki and Isozaki, 2008) to the dependency parsing problem and combined their method with the approach of Koo et al.
    Page 8, “Related work”

See all papers in Proc. ACL 2012 that mention Semi-supervised.

See all papers in Proc. ACL that mention Semi-supervised.

Back to top.

beam search

Appears in 4 sentences as: beam search (4)
In Utilizing Dependency Language Models for Graph-based Dependency Parsing Models
  1. In this paper, we present an approach to enriching high—order feature representations for graph-based dependency parsing models using a dependency language model and beam search .
    Page 1, “Abstract”
  2. Finally, the features are efficiently integrated into the parsing model during decoding using beam search .
    Page 1, “Abstract”
  3. We use beam search to choose the one having the overall best score as the final parse, where K spans are built at each step (Zhang and Clark, 2008).
    Page 5, “Decoding”
  4. Since the setting of K (for beam search ) affects our parsers, we studied its influence on the development
    Page 6, “Experiments”

See all papers in Proc. ACL 2012 that mention beam search.

See all papers in Proc. ACL that mention beam search.

Back to top.

feature templates

Appears in 3 sentences as: feature templates (3)
In Utilizing Dependency Language Models for Graph-based Dependency Parsing Models
  1. 3.3 DLM-based feature templates
    Page 3, “Parsing with dependency language model”
  2. The feature templates are outlined in Table l, where TYPE refers to one of the typeszPL or PR, h_pos refers to the part-of-speech tag of :1: h, h_word refers to the lexical form of :1: h, ch_pos refers to the part-of-speech tag of mch, and ch_word refers to the lexical form of mm.
    Page 3, “Parsing with dependency language model”
  3. Table 1: DLM-based feature templates
    Page 4, “Decoding”

See all papers in Proc. ACL 2012 that mention feature templates.

See all papers in Proc. ACL that mention feature templates.

Back to top.

maximum spanning

Appears in 3 sentences as: maximum spanning (3)
In Utilizing Dependency Language Models for Graph-based Dependency Parsing Models
  1. The graph-based parsing model aims to search for the maximum spanning tree (MST) in a graph (McDonald et al., 2005).
    Page 3, “Parsing with dependency language model”
  2. The parsing model finds a maximum spanning tree (MST), which is the highest scoring tree in T(Gx).
    Page 3, “Parsing with dependency language model”
  3. In our approach, we consider the scores of the DLM when searching for the maximum spanning tree.
    Page 3, “Parsing with dependency language model”

See all papers in Proc. ACL 2012 that mention maximum spanning.

See all papers in Proc. ACL that mention maximum spanning.

Back to top.

feature set

Appears in 3 sentences as: feature set (1) feature setting (1) feature settings (1)
In Utilizing Dependency Language Models for Graph-based Dependency Parsing Models
  1. With the rich feature set in Table l, the running time of Intersect is longer than the time of Rescoring.
    Page 5, “Decoding”
  2. Table 2 shows the feature settings of the systems, where MSTl/2 refers to the basic first—/second-order parser and MSTB l/2 refers to the enhanced first-/second-order parser.
    Page 6, “Experiments”
  3. MSTBl and MSTB2 used the same feature setting , but used different order models.
    Page 6, “Experiments”

See all papers in Proc. ACL 2012 that mention feature set.

See all papers in Proc. ACL that mention feature set.

Back to top.

parsing algorithm

Appears in 3 sentences as: parsing algorithm (4)
In Utilizing Dependency Language Models for Graph-based Dependency Parsing Models
  1. The algorithm was extensions of the parsing algorithm of (Eisner, 1996), which was a modified version of the CKY chart parsing algorithm .
    Page 4, “Decoding”
  2. The parsing algorithm independently parses the left and right dependents of a word and combines them later.
    Page 4, “Decoding”
  3. Because the parsing algorithm is in the bottom-up style, the nearer children are generated earlier than the farther ones of the same head.
    Page 4, “Decoding”

See all papers in Proc. ACL 2012 that mention parsing algorithm.

See all papers in Proc. ACL that mention parsing algorithm.

Back to top.

part-of-speech

Appears in 3 sentences as: part-of-speech (4)
In Utilizing Dependency Language Models for Graph-based Dependency Parsing Models
  1. The feature templates are outlined in Table l, where TYPE refers to one of the typeszPL or PR, h_pos refers to the part-of-speech tag of :1: h, h_word refers to the lexical form of :1: h, ch_pos refers to the part-of-speech tag of mch, and ch_word refers to the lexical form of mm.
    Page 3, “Parsing with dependency language model”
  2. We first perform word segmentation (if needed) and part-of-speech tagging.
    Page 5, “Implementation Details”
  3. After that, we obtain the word-segmented sentences with the part-of-speech tags.
    Page 5, “Implementation Details”

See all papers in Proc. ACL 2012 that mention part-of-speech.

See all papers in Proc. ACL that mention part-of-speech.

Back to top.

part-of-speech tag

Appears in 3 sentences as: part-of-speech tag (2) part-of-speech tagging (1) part-of-speech tags (1)
In Utilizing Dependency Language Models for Graph-based Dependency Parsing Models
  1. The feature templates are outlined in Table l, where TYPE refers to one of the typeszPL or PR, h_pos refers to the part-of-speech tag of :1: h, h_word refers to the lexical form of :1: h, ch_pos refers to the part-of-speech tag of mch, and ch_word refers to the lexical form of mm.
    Page 3, “Parsing with dependency language model”
  2. We first perform word segmentation (if needed) and part-of-speech tagging .
    Page 5, “Implementation Details”
  3. After that, we obtain the word-segmented sentences with the part-of-speech tags .
    Page 5, “Implementation Details”

See all papers in Proc. ACL 2012 that mention part-of-speech tag.

See all papers in Proc. ACL that mention part-of-speech tag.

Back to top.

spanning tree

Appears in 3 sentences as: spanning tree (3)
In Utilizing Dependency Language Models for Graph-based Dependency Parsing Models
  1. The graph-based parsing model aims to search for the maximum spanning tree (MST) in a graph (McDonald et al., 2005).
    Page 3, “Parsing with dependency language model”
  2. The parsing model finds a maximum spanning tree (MST), which is the highest scoring tree in T(Gx).
    Page 3, “Parsing with dependency language model”
  3. In our approach, we consider the scores of the DLM when searching for the maximum spanning tree .
    Page 3, “Parsing with dependency language model”

See all papers in Proc. ACL 2012 that mention spanning tree.

See all papers in Proc. ACL that mention spanning tree.

Back to top.

Treebank

Appears in 3 sentences as: Treebank (3)
In Utilizing Dependency Language Models for Graph-based Dependency Parsing Models
  1. For English, we used the Penn Treebank (Marcus et al., 1993) in our experiments.
    Page 5, “Experiments”
  2. For Chinese, we used the Chinese Treebank (CTB) version 4.04 in the experiments.
    Page 5, “Experiments”
  3. 3 We ensured that the text used for extracting subtrees did not include the sentences of the Penn Treebank .
    Page 5, “Experiments”

See all papers in Proc. ACL 2012 that mention Treebank.

See all papers in Proc. ACL that mention Treebank.

Back to top.