Shift-Reduce CCG Parsing with a Dependency Model
Xu, Wenduan and Clark, Stephen and Zhang, Yue

Article Structure

Abstract

This paper presents the first dependency model for a shift-reduce CCG parser.

Introduction

Combinatory Categorial Grammar (CCG; Steedman (2000)) is able to derive typed dependency structures (Hockenmaier, 2003; Clark and Curran, 2007), providing a useful approximation to the underlying predicate-argument relations of “who did what to whom”.

Shift-Reduce with Beam-Search

This section describes how shift-reduce techniques can be applied to CCG, following Zhang and Clark (2011).

The Dependency Model

Categories in CCG are either basic (such as NP and PP) or complex (such as (S[dcl]\NP)/NP).

Experiments

We implement our shift-reduce parser on top of the core C&C code base (Clark and Curran, 2007) and evaluate it against the shift-reduce parser of Zhang and Clark (2011) (henceforth Z&C) and the chart-based normal-form and hybrid models of Clark and Curran (2007).

Conclusion

We have presented a dependency model for a shift-reduce CCG parser, which fully aligns CCG parsing with the left-to-right, incremental nature of a shift-reduce parser.

Topics

CCG

Appears in 31 sentences as: CCG (34)
In Shift-Reduce CCG Parsing with a Dependency Model
  1. This paper presents the first dependency model for a shift-reduce CCG parser.
    Page 1, “Abstract”
  2. Modelling dependencies is desirable for a number of reasons, including handling the “spurious” ambiguity of CCG; fitting well with the theory of CCG ; and optimizing for structures which are evaluated at test time.
    Page 1, “Abstract”
  3. Standard CCGBank tests show the model achieves up to 1.05 labeled F-score improvements over three existing, competitive CCG parsing models.
    Page 1, “Abstract”
  4. Combinatory Categorial Grammar ( CCG ; Steedman (2000)) is able to derive typed dependency structures (Hockenmaier, 2003; Clark and Curran, 2007), providing a useful approximation to the underlying predicate-argument relations of “who did what to whom”.
    Page 1, “Introduction”
  5. To date, CCG remains the most competitive formalism for recovering “deep” dependencies arising from many linguistic phenomena such as raising, control, extraction and coordination (Rimell et al., 2009; Nivre et al., 2010).
    Page 1, “Introduction”
  6. To achieve its expressiveness, CCG exhibits so-called “spurious” ambiguity, permitting many nonstandard surface derivations which ease the recovery of certain dependencies, especially those arising from type-raising and composition.
    Page 1, “Introduction”
  7. But this raises the question of what is the most suitable model for CCG : should we model the derivations, the dependencies, or both?
    Page 1, “Introduction”
  8. Modelling dependencies, as a proxy for the semantic interpretation, fits well with the theory of CCG , in which Steedman (2000) argues that the derivation is merely a “trace” of the underlying syntactic process, and that the structure which is built, and predicated over when applying constraints on grammaticality, is the semantic interpretation.
    Page 1, “Introduction”
  9. And third, it has been argued that dependencies are an ideal representation for parser evaluation, especially for CCG (Briscoe and Carroll, 2006; Clark and Hockenmaier, 2002), and so optimizing for dependency recovery makes sense from an evaluation perspective.
    Page 1, “Introduction”
  10. In this paper, we fill a gap in the literature by developing the first dependency model for a shift-reduce CCG parser.
    Page 1, “Introduction”
  11. Shift-reduce parsing applies naturally to CCG (Zhang and Clark, 2011), and the left-to-right, incremental nature of the decoding fits with CCG’s cognitive claims.
    Page 1, “Introduction”

See all papers in Proc. ACL 2014 that mention CCG.

See all papers in Proc. ACL that mention CCG.

Back to top.

gold-standard

Appears in 23 sentences as: gold-standard (24)
In Shift-Reduce CCG Parsing with a Dependency Model
  1. A challenge arises from the fact that the oracle needs to keep track of exponentially many gold-standard derivations, which is solved by integrating a packed parse forest with the beam-search decoder.
    Page 1, “Abstract”
  2. and Curran, 2007) is to model derivations directly, restricting the gold-standard to be the normal-form derivations (Eisner, 1996) from CCGBank (Hockenmaier and Steedman, 2007).
    Page 1, “Introduction”
  3. Clark and Curran (2006) show how the dependency model from Clark and Curran (2007) extends naturally to the partial-training case, and also how to obtain dependency data cheaply from gold-standard lexical category sequences alone.
    Page 1, “Introduction”
  4. A challenge arises from the potentially exponential number of derivations leading to a gold-standard dependency structure, which the oracle needs to keep track of.
    Page 2, “Introduction”
  5. The derivations are not explicitly part of the data, since the forest is built from the gold-standard dependencies.
    Page 2, “Introduction”
  6. We refer to the shift-reduce model of Zhang and Clark (2011) as the normal-form model, where the oracle for each sentence specifies a unique sequence of gold-standard actions which produces the corresponding normal-form derivation.
    Page 2, “Shift-Reduce with Beam-Search”
  7. In the next section, we describe a dependency oracle which considers all sequences of actions producing a gold-standard dependency structure to be correct.
    Page 2, “Shift-Reduce with Beam-Search”
  8. However, the difference compared to the normal-form model is that we do not assume a single gold-standard sequence of actions.
    Page 3, “The Dependency Model”
  9. Similar to Goldberg and Nivre (2012), we define an oracle which determines, for a gold-standard dependency structure, G, what the valid transition sequences are (i.e.
    Page 3, “The Dependency Model”
  10. The dependency model requires all the conjunctive and disjunctive nodes of Q that are part of the derivations leading to a gold-standard dependency structure G. We refer to such derivations as correct derivations and the packed forest containing all these derivations as the oracle forest, denoted as Q0, which is a subset of Q.
    Page 4, “The Dependency Model”
  11. The main intuition behind the algorithm is that a gold-standard dependency structure decomposes over derivations; thus gold-standard dependencies realized at conjunctive nodes can be counted when Q is built, and all nodes that are part of Q0 can then be marked out of Q by traversing it top-down.
    Page 4, “The Dependency Model”

See all papers in Proc. ACL 2014 that mention gold-standard.

See all papers in Proc. ACL that mention gold-standard.

Back to top.

shift-reduce

Appears in 21 sentences as: Shift-reduce (1) shift-reduce (24)
In Shift-Reduce CCG Parsing with a Dependency Model
  1. This paper presents the first dependency model for a shift-reduce CCG parser.
    Page 1, “Abstract”
  2. In this paper, we fill a gap in the literature by developing the first dependency model for a shift-reduce CCG parser.
    Page 1, “Introduction”
  3. Shift-reduce parsing applies naturally to CCG (Zhang and Clark, 2011), and the left-to-right, incremental nature of the decoding fits with CCG’s cognitive claims.
    Page 1, “Introduction”
  4. Results on the standard CCGBank tests show that our parser achieves absolute labeled F-score gains of up to 0.5 over the shift-reduce parser of Zhang and Clark (2011); and up to 1.05 and 0.64 over the normal-form and hybrid models of Clark and Curran (2007), respectively.
    Page 2, “Introduction”
  5. This section describes how shift-reduce techniques can be applied to CCG, following Zhang and Clark (2011).
    Page 2, “Shift-Reduce with Beam-Search”
  6. First we describe the deterministic process which a parser would follow when tracing out a single, correct derivation; then we describe how a model of normal-form derivations — or, more accurately, a sequence of shift-reduce actions leading to a normal-form derivation —can be used with beam-search to develop a nondeterministic parser which selects the highest scoring sequence of actions.
    Page 2, “Shift-Reduce with Beam-Search”
  7. Note this section only describes a normal-form derivation model for shift-reduce parsing.
    Page 2, “Shift-Reduce with Beam-Search”
  8. The shift-reduce algorithm adapted to CCG is similar to that of shift-reduce dependency parsing (Yamada and Matsumoto, 2003; Nivre and McDonald, 2008; Zhang and Clark, 2008; Huang and Sagae, 2010).
    Page 2, “Shift-Reduce with Beam-Search”
  9. Figure l: Deterministic example of shift-reduce CCG parsing (lexical categories omitted on queue).
    Page 2, “Shift-Reduce with Beam-Search”
  10. tions that have been built as part of the shift-reduce process.
    Page 2, “Shift-Reduce with Beam-Search”
  11. Figure 1 shows a deterministic example for the sentence Mr. President visited Paris, giving a single sequence of shift-reduce actions which produces a correct derivation (i.e.
    Page 2, “Shift-Reduce with Beam-Search”

See all papers in Proc. ACL 2014 that mention shift-reduce.

See all papers in Proc. ACL that mention shift-reduce.

Back to top.

subtrees

Appears in 7 sentences as: Subtrees (1) subtrees (6)
In Shift-Reduce CCG Parsing with a Dependency Model
  1. Following Zhang and Clark (2011), we define each item in the parser as a pair (3, q), where q is a queue of remaining input, consisting of words and a set of possible lexical categories for each word (with go being the front word), and s is the stack that holds subtrees so, 31, (with so at the top).
    Page 2, “Shift-Reduce with Beam-Search”
  2. Subtrees on the stack are partial deriva-
    Page 2, “Shift-Reduce with Beam-Search”
  3. If |s| > 0 and the subtrees on s can lead to a correct derivation in (D9 using further actions, we say 3 is a partial-realization of G, denoted as s N G. And we define s N G for |s| = 0.
    Page 5, “The Dependency Model”
  4. 2a; then a stack containing the two subtrees in Fig.
    Page 5, “The Dependency Model”
  5. 3a is a partial-realization, while a stack containing the three subtrees in Fig.
    Page 5, “The Dependency Model”
  6. Note that each of the three subtrees in Fig.
    Page 5, “The Dependency Model”
  7. 3b is present in (PG; however, these subtrees cannot be combined into the single correct derivation, since the correct sequence of shift-reduce actions must first combine the lexical categories for Mr. and President before shifting the lexical category for visited.
    Page 5, “The Dependency Model”

See all papers in Proc. ACL 2014 that mention subtrees.

See all papers in Proc. ACL that mention subtrees.

Back to top.

perceptron

Appears in 6 sentences as: perceptron (7)
In Shift-Reduce CCG Parsing with a Dependency Model
  1. The discrimina-tive model is global and trained with the structured perceptron .
    Page 1, “Introduction”
  2. We also show how perceptron learning with beam-search (Collins and Roark, 2004) can be extended to handle the additional ambiguity, by adapting the “violation-fixing” perceptron of Huang et al.
    Page 2, “Introduction”
  3. We also show, in Section 3.3, how perceptron training with early-update (Collins and Roark, 2004) can be used in this setting.
    Page 3, “The Dependency Model”
  4. We use the averaged perceptron (Collins, 2002) to train a global linear model and score each action.
    Page 6, “The Dependency Model”
  5. Since there are potentially many gold items, and one gold item is required for the perceptron update, a decision needs
    Page 7, “The Dependency Model”
  6. We choose to reward the highest scoring gold item, in line with the violation-fixing framework; and penalize the highest scoring incorrect item, using the standard perceptron update.
    Page 7, “The Dependency Model”

See all papers in Proc. ACL 2014 that mention perceptron.

See all papers in Proc. ACL that mention perceptron.

Back to top.

F-score

Appears in 5 sentences as: F-score (5)
In Shift-Reduce CCG Parsing with a Dependency Model
  1. Standard CCGBank tests show the model achieves up to 1.05 labeled F-score improvements over three existing, competitive CCG parsing models.
    Page 1, “Abstract”
  2. Results on the standard CCGBank tests show that our parser achieves absolute labeled F-score gains of up to 0.5 over the shift-reduce parser of Zhang and Clark (2011); and up to 1.05 and 0.64 over the normal-form and hybrid models of Clark and Curran (2007), respectively.
    Page 2, “Introduction”
  3. On both the full and reduced sets, our parser achieves the highest F-score .
    Page 8, “Experiments”
  4. In comparison with C&C, our parser shows significant increases across all metrics, with 0.57% and 1.06% absolute F-score improvements over the hybrid and normal-form models, respectively.
    Page 8, “Experiments”
  5. While our parser achieved lower precision than Z&C, it is more balanced and gives higher recall for all of the dependency relations except the last one, and higher F-score for over half of them.
    Page 9, “Experiments”

See all papers in Proc. ACL 2014 that mention F-score.

See all papers in Proc. ACL that mention F-score.

Back to top.

highest scoring

Appears in 5 sentences as: highest scored (1) highest scores (1) highest scoring (4)
In Shift-Reduce CCG Parsing with a Dependency Model
  1. First we describe the deterministic process which a parser would follow when tracing out a single, correct derivation; then we describe how a model of normal-form derivations — or, more accurately, a sequence of shift-reduce actions leading to a normal-form derivation —can be used with beam-search to develop a nondeterministic parser which selects the highest scoring sequence of actions.
    Page 2, “Shift-Reduce with Beam-Search”
  2. An item becomes a candidate output once it has an empty queue, and the parser keeps track of the highest scored candidate output and returns the best one as the final output.
    Page 2, “Shift-Reduce with Beam-Search”
  3. 5 In Algorithm 3 we abuse notation by using HG [0] to denote the highest scoring gold item in the set.
    Page 7, “The Dependency Model”
  4. We choose to reward the highest scoring gold item, in line with the violation-fixing framework; and penalize the highest scoring incorrect item, using the standard perceptron update.
    Page 7, “The Dependency Model”
  5. Again, our parser achieves the highest scores across all metrics (for both the full and reduced test sets), except for precision and lexical category assignment, where Z&C performed better.
    Page 9, “Experiments”

See all papers in Proc. ACL 2014 that mention highest scoring.

See all papers in Proc. ACL that mention highest scoring.

Back to top.

development set

Appears in 3 sentences as: development set (3)
In Shift-Reduce CCG Parsing with a Dependency Model
  1. The beam size was tuned on the development set , and a value of 128 was found to achieve a reasonable balance of accuracy and speed; hence this value was used for all experiments.
    Page 8, “Experiments”
  2. dependency length on the development set .
    Page 8, “Experiments”
  3. Table 1 shows the accuracies of all parsers on the development set , in terms of labeled precision and recall over the predicate-argument dependencies in CCGBank.
    Page 8, “Experiments”

See all papers in Proc. ACL 2014 that mention development set.

See all papers in Proc. ACL that mention development set.

Back to top.

POS tag

Appears in 3 sentences as: POS tag (2) POS tagging (1) POS tags (1)
In Shift-Reduce CCG Parsing with a Dependency Model
  1. For example, one template returns the top category on the stack plus its head word, together with the first word and its POS tag on the queue.
    Page 7, “Experiments”
  2. Another template returns the second category on the stack, together with the POS tag of its head word.
    Page 7, “Experiments”
  3. We use 10-fold cross validation for POS tagging and supertagging the training data, and automatically assigned POS tags for all experiments.
    Page 8, “Experiments”

See all papers in Proc. ACL 2014 that mention POS tag.

See all papers in Proc. ACL that mention POS tag.

Back to top.

precision and recall

Appears in 3 sentences as: precision and recall (3)
In Shift-Reduce CCG Parsing with a Dependency Model
  1. Figure 4: Labeled precision and recall relative to normal-form model is used.
    Page 8, “Experiments”
  2. Table 1 shows the accuracies of all parsers on the development set, in terms of labeled precision and recall over the predicate-argument dependencies in CCGBank.
    Page 8, “Experiments”
  3. To probe this further we compare labeled precision and recall relative to dependency length, as measured by the distance between the two words in a dependency, grouped into bins of 5 values.
    Page 8, “Experiments”

See all papers in Proc. ACL 2014 that mention precision and recall.

See all papers in Proc. ACL that mention precision and recall.

Back to top.