Transfer Learning for Constituency-Based Grammars
Zhang, Yuan and Barzilay, Regina and Globerson, Amir

Article Structure

Abstract

In this paper, we consider the problem of cross-formalism transfer in parsing.

Introduction

Over the last several decades, linguists have introduced many different grammars for describing the syntax of natural languages.

Related Work

Our work belongs to a broader class of research on transfer learning in parsing.

The Learning Problem

Recall that our goal is to learn how to parse the target formalisms while using two annotated sources: a small set of sentences annotated in the target formalism (e.g., CCG), and a large set of sentences with coarse annotations.

A Joint Model for Two Formalisms

The key idea behind our work is to learn a joint distribution over CCG and CFG parses.

Implementation

This section introduces important implementation details, including supertagging, feature forest pruning and binarization methods.

Features

Feature functions in log-linear models are designed to capture the characteristics of each derivation in the tree.

Evaluation Setup

Experiment and Analysis

Impact of Coarse Annotations on Target Formalism: To analyze the effectiveness of annotation transfer, we fix the number of annotated trees in the target formalism and vary the amount of coarse annotations available to the algorithm during training.

Conclusions

We present a method for cross-formalism transfer in parsing.

Topics

CCG

Appears in 45 sentences as: CCG (46)
In Transfer Learning for Constituency-Based Grammars
  1. We are interested in parsing constituency-based grammars such as HPSG and CCG using a small amount of data specific for the target formalism, and a large quantity of coarse CFG annotations from the Penn Treebank.
    Page 1, “Abstract”
  2. We evaluate our approach on three constituency-based grammars — CCG , HPSG, and LPG, augmented with the Penn Treebank—l.
    Page 1, “Abstract”
  3. A natural candidate for such coarse annotations is context-free grammar (CFG) from the Penn Treebank, while the target formalism can be any constituency-based grammars, such as Combinatory Categorial Grammar ( CCG ) (Steedman, 2001), Lexical Functional Grammar (LFG) (Bresnan, 1982) or Head-Driven Phrase Structure Grammar (HPSG) (Pollard and Sag, 1994).
    Page 1, “Introduction”
  4. We evaluate our approach on three constituency-based grammars — CCG , HPSG, and LPG.
    Page 2, “Introduction”
  5. S CFG Sldcll CCG
    Page 2, “Introduction”
  6. Figure 1: Derivation trees for CFG as well as CCG , HPSG and LFG formalisms.
    Page 2, “Introduction”
  7. also observed on CCG and LFG formalisms.
    Page 2, “Introduction”
  8. There have been several attempts to map annotations in coarse grammars like CFG to annotations in richer grammar, like HPSG, LFG, or CCG .
    Page 2, “Related Work”
  9. For instance, Hockenmaier and Steedman (2002) made thousands of POS and constituent modifications to the Penn Treebank to facilitate transfer to CCG .
    Page 3, “Related Work”
  10. Recall that our goal is to learn how to parse the target formalisms while using two annotated sources: a small set of sentences annotated in the target formalism (e.g., CCG ), and a large set of sentences with coarse annotations.
    Page 3, “The Learning Problem”
  11. For simplicity we focus on the CCG formalism in what follows.
    Page 3, “The Learning Problem”

See all papers in Proc. ACL 2013 that mention CCG.

See all papers in Proc. ACL that mention CCG.

Back to top.

Treebank

Appears in 27 sentences as: Treebank (20) treebank (3) treebanks (4)
In Transfer Learning for Constituency-Based Grammars
  1. We are interested in parsing constituency-based grammars such as HPSG and CCG using a small amount of data specific for the target formalism, and a large quantity of coarse CFG annotations from the Penn Treebank .
    Page 1, “Abstract”
  2. While all of the target formalisms share a similar basic syntactic structure with Penn Treebank CFG, they also encode additional constraints and semantic features.
    Page 1, “Abstract”
  3. The standard solution to this bottleneck has relied on manually crafted transformation rules that map readily available syntactic annotations (e.g, the Penn Treebank ) to the desired formalism.
    Page 1, “Introduction”
  4. In addition, designing these rules frequently requires external resources such as Wordnet, and even involves correction of the existing treebank .
    Page 1, “Introduction”
  5. A natural candidate for such coarse annotations is context-free grammar (CFG) from the Penn Treebank , while the target formalism can be any constituency-based grammars, such as Combinatory Categorial Grammar (CCG) (Steedman, 2001), Lexical Functional Grammar (LFG) (Bresnan, 1982) or Head-Driven Phrase Structure Grammar (HPSG) (Pollard and Sag, 1994).
    Page 1, “Introduction”
  6. All of these formalisms share a similar basic syntactic structure with Penn Treebank CFG.
    Page 1, “Introduction”
  7. For instance, Penn Treebank annotations do not make an explicit distinction between complement and adjunct, while all the above grammars mark these
    Page 1, “Introduction”
  8. In LFG, this information is captured in the mapping equation, namely T SBJ zi, while Penn Treebank represents it as a functional tag, such as NP-SBJ.
    Page 2, “Introduction”
  9. Specifically, each node in the target tree has two labels: an entry which is specific to the target formalism, and a latent label containing a value from the Penn Treebank tagset, such as NP (see Figure 2).
    Page 2, “Introduction”
  10. Adding 15,000 Penn Treebank sentences during training leads to 78.5% labeled dependency F-score, an absolute improvement of 6.2%.
    Page 2, “Introduction”
  11. For instance, mappings may specify how to convert traces and functional tags in Penn Treebank to the f-structure in LFG (Cahill, 2004).
    Page 2, “Related Work”

See all papers in Proc. ACL 2013 that mention Treebank.

See all papers in Proc. ACL that mention Treebank.

Back to top.

Penn Treebank

Appears in 20 sentences as: Penn Treebank (20)
In Transfer Learning for Constituency-Based Grammars
  1. We are interested in parsing constituency-based grammars such as HPSG and CCG using a small amount of data specific for the target formalism, and a large quantity of coarse CFG annotations from the Penn Treebank .
    Page 1, “Abstract”
  2. While all of the target formalisms share a similar basic syntactic structure with Penn Treebank CFG, they also encode additional constraints and semantic features.
    Page 1, “Abstract”
  3. The standard solution to this bottleneck has relied on manually crafted transformation rules that map readily available syntactic annotations (e.g, the Penn Treebank ) to the desired formalism.
    Page 1, “Introduction”
  4. A natural candidate for such coarse annotations is context-free grammar (CFG) from the Penn Treebank , while the target formalism can be any constituency-based grammars, such as Combinatory Categorial Grammar (CCG) (Steedman, 2001), Lexical Functional Grammar (LFG) (Bresnan, 1982) or Head-Driven Phrase Structure Grammar (HPSG) (Pollard and Sag, 1994).
    Page 1, “Introduction”
  5. All of these formalisms share a similar basic syntactic structure with Penn Treebank CFG.
    Page 1, “Introduction”
  6. For instance, Penn Treebank annotations do not make an explicit distinction between complement and adjunct, while all the above grammars mark these
    Page 1, “Introduction”
  7. In LFG, this information is captured in the mapping equation, namely T SBJ zi, while Penn Treebank represents it as a functional tag, such as NP-SBJ.
    Page 2, “Introduction”
  8. Specifically, each node in the target tree has two labels: an entry which is specific to the target formalism, and a latent label containing a value from the Penn Treebank tagset, such as NP (see Figure 2).
    Page 2, “Introduction”
  9. Adding 15,000 Penn Treebank sentences during training leads to 78.5% labeled dependency F-score, an absolute improvement of 6.2%.
    Page 2, “Introduction”
  10. For instance, mappings may specify how to convert traces and functional tags in Penn Treebank to the f-structure in LFG (Cahill, 2004).
    Page 2, “Related Work”
  11. For instance, Hockenmaier and Steedman (2002) made thousands of POS and constituent modifications to the Penn Treebank to facilitate transfer to CCG.
    Page 3, “Related Work”

See all papers in Proc. ACL 2013 that mention Penn Treebank.

See all papers in Proc. ACL that mention Penn Treebank.

Back to top.

binarization

Appears in 10 sentences as: Binarization (1) binarization (7) binarized (2) binarizing (1)
In Transfer Learning for Constituency-Based Grammars
  1. Note that the two derivations share the same ( binarized ) tree structure.
    Page 3, “The Learning Problem”
  2. This section introduces important implementation details, including supertagging, feature forest pruning and binarization methods.
    Page 4, “Implementation”
  3. 5.3 Binarization
    Page 5, “Implementation”
  4. Since Penn Treebank trees are not binarized, we construct a simple procedure for binarizing them.
    Page 5, “Implementation”
  5. This makes it possible to transfer binarization rules between formalisms.
    Page 5, “Implementation”
  6. Suppose we want to learn the binarization rule of the following derivation in CFG:
    Page 5, “Implementation”
  7. We now look for binary derivations with these POS in the target formalism corpus, and take the most common binarization form.
    Page 5, “Implementation”
  8. For example, we may find that the most common binarization to bi-narize the CFG derivation in Equation 4 is:
    Page 5, “Implementation”
  9. We also experiment with using fixed binarization rules such as left/right branching, instead of learning them.
    Page 5, “Implementation”
  10. Note that after binarization , grandparent and sibling information becomes very important in encoding the structure.
    Page 6, “Features”

See all papers in Proc. ACL 2013 that mention binarization.

See all papers in Proc. ACL that mention binarization.

Back to top.

F-score

Appears in 10 sentences as: F-score (10)
In Transfer Learning for Constituency-Based Grammars
  1. For instance, the model trained on 500 HPSG sentences achieves labeled dependency F-score of 72.3%.
    Page 2, “Introduction”
  2. Adding 15,000 Penn Treebank sentences during training leads to 78.5% labeled dependency F-score , an absolute improvement of 6.2%.
    Page 2, “Introduction”
  3. This results in a drop on the dependency F-score by about 5%.
    Page 5, “Implementation”
  4. First, following previous work, we evaluate our method using the labeled and unlabeled predicate-argument dependency F-score .
    Page 7, “Evaluation Setup”
  5. The dependency F-score captures both the target-
    Page 7, “Evaluation Setup”
  6. For instance, there is a gain of 6.2% in labeled dependency F-score for HPSG formalism when 15,000 CFG trees are used.
    Page 7, “Experiment and Analysis”
  7. Across all three grammars, we can observe that adding CFG data has a more pronounced effect on the PARSEVAL measure than the dependency F-score .
    Page 8, “Experiment and Analysis”
  8. On the other hand, predicate-argument dependency F-score (Figure 5ac) also relies on the target grammar information.
    Page 8, “Experiment and Analysis”
  9. treebank, the gains of PARSEVAL are expected to be larger than that of dependency F-score .
    Page 8, “Experiment and Analysis”
  10. Table 4: The labeled/unlabeled dependency F-score comparisons between our model and state-of-the-art parsers.
    Page 8, “Experiment and Analysis”

See all papers in Proc. ACL 2013 that mention F-score.

See all papers in Proc. ACL that mention F-score.

Back to top.

feature templates

Appears in 7 sentences as: feature templates (7)
In Transfer Learning for Constituency-Based Grammars
  1. In this section, we first introduce how different types of feature templates are designed, and then show an example of how the features help transfer the syntactic structure information.
    Page 5, “Features”
  2. Note that the same feature templates are used for all the target grammar formalisms.
    Page 5, “Features”
  3. We define the following feature templates : fbmary for binary derivations, funary for unary derivations, and from; for the root nodes.
    Page 5, “Features”
  4. The final list of binary feature templates is shown in Table 2.
    Page 6, “Features”
  5. Table 2: Binary feature templates used in f (y, S Unary and root features follow a similar pattern.
    Page 6, “Features”
  6. In order to apply the same feature templates to other target formalisms, we only need to assign the atomic features 7“ and hl with the formalism-specific values.
    Page 6, “Features”
  7. We do not need extra engineering work on redesigning the feature templates .
    Page 6, “Features”

See all papers in Proc. ACL 2013 that mention feature templates.

See all papers in Proc. ACL that mention feature templates.

Back to top.

log-linear

Appears in 5 sentences as: log-linear (5)
In Transfer Learning for Constituency-Based Grammars
  1. As is standard in such settings, the distribution will be log-linear in a set of features of these parses.
    Page 3, “A Joint Model for Two Formalisms”
  2. Instead, we assume that the distribution over yCFG is a log-linear model with parameters 601:0 (i.e., a sub-vector of 6) , namely:
    Page 4, “A Joint Model for Two Formalisms”
  3. Feature functions in log-linear models are designed to capture the characteristics of each derivation in the tree.
    Page 5, “Features”
  4. In this setup, the model reduces to a normal log-linear model for the target formalism.
    Page 7, “Evaluation Setup”
  5. It’s not surprising that Cahill’s model outperforms our log-linear model because it relies heavily on handcrafted rules optimized for the dataset.
    Page 9, “Experiment and Analysis”

See all papers in Proc. ACL 2013 that mention log-linear.

See all papers in Proc. ACL that mention log-linear.

Back to top.

log-linear model

Appears in 4 sentences as: log-linear model (3) log-linear models (1)
In Transfer Learning for Constituency-Based Grammars
  1. Instead, we assume that the distribution over yCFG is a log-linear model with parameters 601:0 (i.e., a sub-vector of 6) , namely:
    Page 4, “A Joint Model for Two Formalisms”
  2. Feature functions in log-linear models are designed to capture the characteristics of each derivation in the tree.
    Page 5, “Features”
  3. In this setup, the model reduces to a normal log-linear model for the target formalism.
    Page 7, “Evaluation Setup”
  4. It’s not surprising that Cahill’s model outperforms our log-linear model because it relies heavily on handcrafted rules optimized for the dataset.
    Page 9, “Experiment and Analysis”

See all papers in Proc. ACL 2013 that mention log-linear model.

See all papers in Proc. ACL that mention log-linear model.

Back to top.

evaluation metrics

Appears in 3 sentences as: Evaluation Metrics (1) evaluation metrics (3)
In Transfer Learning for Constituency-Based Grammars
  1. Evaluation Metrics: We use two evaluation metrics .
    Page 7, “Evaluation Setup”
  2. Moreover, increasing the number of coarse annotations used in training leads to further improvement on different evaluation metrics .
    Page 7, “Experiment and Analysis”
  3. Figure 5 also illustrates a slightly different characteristics of transfer performance between two evaluation metrics .
    Page 8, “Experiment and Analysis”

See all papers in Proc. ACL 2013 that mention evaluation metrics.

See all papers in Proc. ACL that mention evaluation metrics.

Back to top.