Parsing Noun Phrase Structure with CCG
Vadas, David and Curran, James R.

Article Structure

Abstract

Statistical parsing of noun phrase (NP) structure has been hampered by a lack of gold-standard data.

Introduction

Internal noun phrase (NP) structure is not recovered by a number of widely-used parsers, e.g.

Background

Parsing of NPs is typically framed as NP bracketing, where the task is limited to discriminating between left and right-branching NPs of three nouns only:

Conversion Process

This section describes the process of converting the Vadas and Curran (2007a) data to CCG derivations.

NER features

Named entity recognition (NER) provides information that is particularly relevant for NP parsing, simply because entities are nouns.

Experiments

Our experiments are run with the C&C CCG parser (Clark and Curran, 2007b), and will evaluate the changes made to CCGbank, as well as the effectiveness of the NER features.

DepBank evaluation

One problem with the evaluation in the previous section, is that the original CCGbank is not expected to recover internal NP structure, making its task easier and inflating its performance.

Conclusion

The first contribution of this paper is the application of the Vadas and Curran (2007a) data to Combinatory Categorial Grammar.

Topics

F-score

Appears in 20 sentences as: F-SCORE (7) F-score (13)
In Parsing Noun Phrase Structure with CCG
  1. These features are targeted at improving the recovery of NP structure, increasing parser performance by 0.64% F-score .
    Page 1, “Introduction”
  2. | PREC | RECALL | F-SCORE
    Page 7, “Experiments”
  3. | PREC | RECALL | F-SCORE
    Page 7, “Experiments”
  4. Table 2 shows that F-score has dropped by 0.61%.
    Page 7, “Experiments”
  5. We are using the normal-form parser model and report labelled precision, recall and F-score for all dependencies.
    Page 7, “Experiments”
  6. The F-score drops by 0.31% in our new version of the corpus.
    Page 7, “Experiments”
  7. Vadas and Curran (2007a) experienced a similar drop in performance on Penn Treebank data, and noted that the F-score for NML and JJP brackets was about 20% lower than the overall figure.
    Page 7, “Experiments”
  8. | PREC | RECALL F-SCORE
    Page 7, “Experiments”
  9. | PREC | RECALL | F-SCORE
    Page 7, “Experiments”
  10. These tags are accurate with an F-score of 96.34%, with precision 96.20% and recall 96.49%.
    Page 7, “Experiments”
  11. The NP corrected model drops an additional 0.1% F-score over the original model, suggesting that POS tags are particularly important for recovering internal NP structure.
    Page 7, “Experiments”

See all papers in Proc. ACL 2008 that mention F-score.

See all papers in Proc. ACL that mention F-score.

Back to top.

NER

Appears in 19 sentences as: NER (21)
In Parsing Noun Phrase Structure with CCG
  1. We also implement novel NER features that generalise the lexical information needed to parse NPs and provide important semantic information.
    Page 1, “Abstract”
  2. In particular, we implement new features using NER tags from the BBN Entity Type Corpus (Weischedel and Brunstein, 2005).
    Page 1, “Introduction”
  3. Applying the NER features results in a total increase of 1.51%.
    Page 1, “Introduction”
  4. Named entity recognition ( NER ) provides information that is particularly relevant for NP parsing, simply because entities are nouns.
    Page 5, “NER features”
  5. There has also been recent work combining NER and parsing in the biomedical field.
    Page 5, “NER features”
  6. Lewin (2007) experiments with detecting base-NPs using NER information, while Buyko et al.
    Page 5, “NER features”
  7. Our experiments are run with the C&C CCG parser (Clark and Curran, 2007b), and will evaluate the changes made to CCGbank, as well as the effectiveness of the NER features.
    Page 6, “Experiments”
  8. Table 5: Parsing results with NER features
    Page 7, “Experiments”
  9. 5.3 NER features results
    Page 7, “Experiments”
  10. Table 5 shows the results of adding the NER features we described in Section 4.
    Page 7, “Experiments”
  11. This is because incorrect right-branching NPs such as Air Force contract would introduce noise to the NER features.
    Page 7, “Experiments”

See all papers in Proc. ACL 2008 that mention NER.

See all papers in Proc. ACL that mention NER.

Back to top.

CCG

Appears in 18 sentences as: CCG (19)
In Parsing Noun Phrase Structure with CCG
  1. CCGbank (Hockenmaier and Steedman, 2007) is the primary English corpus for Combinatory Categorial Grammar ( CCG ) (Steedman, 2000) and was created by a semiautomatic conversion from the Penn Treebank.
    Page 1, “Introduction”
  2. However, CCG is a binary branching grammar, and as such, cannot leave NP structure underspecified.
    Page 1, “Introduction”
  3. :e Structure with CCG
    Page 1, “Introduction”
  4. The CCG parser now recovers additional structure learnt from our NP corrected corpus, increasing performance by 0.92%.
    Page 1, “Introduction”
  5. We use these brackets to determine new gold-standard CCG derivations in Section 3.
    Page 2, “Background”
  6. Combinatory Categorial Grammar ( CCG ) (Steedman, 2000) is a type-driven, lexicalised theory of
    Page 2, “Background”
  7. This is an advantage of CCG , allowing it to recover long-range dependencies without the need for postprocessing, as is the case for many other parsers.
    Page 2, “Background”
  8. Our work creates the correct CCG derivation, shown in Figure 1(b), and removes the need for the grammar rule in (3).
    Page 2, “Background”
  9. 2.2 CCG parsing
    Page 3, “Background”
  10. The C&C CCG parser (Clark and Curran, 2007b) is used to perform our experiments, and to evaluate the effect of the changes to CCGbank.
    Page 3, “Background”
  11. This section describes the process of converting the Vadas and Curran (2007a) data to CCG derivations.
    Page 3, “Conversion Process”

See all papers in Proc. ACL 2008 that mention CCG.

See all papers in Proc. ACL that mention CCG.

Back to top.

gold-standard

Appears in 11 sentences as: gold-standard (11)
In Parsing Noun Phrase Structure with CCG
  1. Statistical parsing of noun phrase (NP) structure has been hampered by a lack of gold-standard data.
    Page 1, “Abstract”
  2. We correct these errors in CCGbank using a gold-standard corpus of NP structure, resulting in a much more accurate corpus.
    Page 1, “Abstract”
  3. Recently, Vadas and Curran (2007a) annotated internal NP structure for the entire Penn Treebank, providing a large gold-standard corpus for NP bracketing.
    Page 2, “Background”
  4. We use these brackets to determine new gold-standard CCG derivations in Section 3.
    Page 2, “Background”
  5. PropBank (Palmer et al., 2005) is used as a gold-standard to inform these decisions, similar to the way that we use the Vadas and Curran (2007a) data.
    Page 2, “Background”
  6. Table 3: Parsing results with gold-standard POS tags
    Page 7, “Experiments”
  7. Table 4 shows that, unsur-prisingly, performance is lower without the gold-standard data.
    Page 7, “Experiments”
  8. We can see that parsing F-score has dropped by about 2% compared to using gold-standard POS and NER data, however, the NER features still improve performance by about 0.3%.
    Page 7, “Experiments”
  9. Clark and Curran (2007a) report an upper bound on performance, using gold-standard CCGbank dependencies, of 84.76% F-score.
    Page 8, “DepBank evaluation”
  10. Firstly, we show the figures achieved using gold-standard CCGbank derivations in Table 7.
    Page 8, “DepBank evaluation”
  11. Table 7: DepBank gold-standard evaluation
    Page 8, “DepBank evaluation”

See all papers in Proc. ACL 2008 that mention gold-standard.

See all papers in Proc. ACL that mention gold-standard.

Back to top.

POS tags

Appears in 9 sentences as: POS tag (1) POS tags (8)
In Parsing Noun Phrase Structure with CCG
  1. Since we are applying these to CCGbank NP structures rather than the Penn Treebank, the POS tag based heuristics are sufficient to determine heads accurately.
    Page 3, “Conversion Process”
  2. Some POS tags require special behaviour.
    Page 4, “Conversion Process”
  3. Accordingly, we do not alter tokens with POS tags of DT and PRP s. Instead, their sibling node is given the category N and their parent node is made the head.
    Page 4, “Conversion Process”
  4. Many of these features generalise the head words and/or POS tags that are already part of the feature set.
    Page 6, “NER features”
  5. There are already features in the model describing each combination of the children’s head words and POS tags , which we extend to include combinations with
    Page 6, “NER features”
  6. Table 3: Parsing results with gold-standard POS tags
    Page 7, “Experiments”
  7. Table 4: Parsing results with automatic POS tags
    Page 7, “Experiments”
  8. We have also experimented with using automatically assigned POS tags .
    Page 7, “Experiments”
  9. The NP corrected model drops an additional 0.1% F-score over the original model, suggesting that POS tags are particularly important for recovering internal NP structure.
    Page 7, “Experiments”

See all papers in Proc. ACL 2008 that mention POS tags.

See all papers in Proc. ACL that mention POS tags.

Back to top.

Penn Treebank

Appears in 8 sentences as: Penn Treebank (8)
In Parsing Noun Phrase Structure with CCG
  1. This is a significant problem for CCGbank, where binary branching NP derivations are often incorrect, a result of the automatic conversion from the Penn Treebank .
    Page 1, “Abstract”
  2. This is because their training data, the Penn Treebank (Marcus et al., 1993), does not fully annotate NP structure.
    Page 1, “Introduction”
  3. The flat structure described by the Penn Treebank can be seen in this example:
    Page 1, “Introduction”
  4. CCGbank (Hockenmaier and Steedman, 2007) is the primary English corpus for Combinatory Categorial Grammar (CCG) (Steedman, 2000) and was created by a semiautomatic conversion from the Penn Treebank .
    Page 1, “Introduction”
  5. Recently, Vadas and Curran (2007a) annotated internal NP structure for the entire Penn Treebank , providing a large gold-standard corpus for NP bracketing.
    Page 2, “Background”
  6. We apply one preprocessing step on the Penn Treebank data, where if multiple tokens are enclosed by brackets, then a NML node is placed around those
    Page 3, “Conversion Process”
  7. Since we are applying these to CCGbank NP structures rather than the Penn Treebank , the POS tag based heuristics are sufficient to determine heads accurately.
    Page 3, “Conversion Process”
  8. Vadas and Curran (2007a) experienced a similar drop in performance on Penn Treebank data, and noted that the F-score for NML and JJP brackets was about 20% lower than the overall figure.
    Page 7, “Experiments”

See all papers in Proc. ACL 2008 that mention Penn Treebank.

See all papers in Proc. ACL that mention Penn Treebank.

Back to top.

Treebank

Appears in 8 sentences as: Treebank (8)
In Parsing Noun Phrase Structure with CCG
  1. This is a significant problem for CCGbank, where binary branching NP derivations are often incorrect, a result of the automatic conversion from the Penn Treebank .
    Page 1, “Abstract”
  2. This is because their training data, the Penn Treebank (Marcus et al., 1993), does not fully annotate NP structure.
    Page 1, “Introduction”
  3. The flat structure described by the Penn Treebank can be seen in this example:
    Page 1, “Introduction”
  4. CCGbank (Hockenmaier and Steedman, 2007) is the primary English corpus for Combinatory Categorial Grammar (CCG) (Steedman, 2000) and was created by a semiautomatic conversion from the Penn Treebank .
    Page 1, “Introduction”
  5. Recently, Vadas and Curran (2007a) annotated internal NP structure for the entire Penn Treebank , providing a large gold-standard corpus for NP bracketing.
    Page 2, “Background”
  6. We apply one preprocessing step on the Penn Treebank data, where if multiple tokens are enclosed by brackets, then a NML node is placed around those
    Page 3, “Conversion Process”
  7. Since we are applying these to CCGbank NP structures rather than the Penn Treebank , the POS tag based heuristics are sufficient to determine heads accurately.
    Page 3, “Conversion Process”
  8. Vadas and Curran (2007a) experienced a similar drop in performance on Penn Treebank data, and noted that the F-score for NML and JJP brackets was about 20% lower than the overall figure.
    Page 7, “Experiments”

See all papers in Proc. ACL 2008 that mention Treebank.

See all papers in Proc. ACL that mention Treebank.

Back to top.

named entity

Appears in 4 sentences as: named entities (1) Named entity (1) named entity (2)
In Parsing Noun Phrase Structure with CCG
  1. Named entity recognition (NER) provides information that is particularly relevant for NP parsing, simply because entities are nouns.
    Page 5, “NER features”
  2. coordinate structure in biological named entities .
    Page 6, “NER features”
  3. We identify constituents that dominate tokens that all have the same NE tag, as these nodes will not cause a “crossing bracket” with the named entity .
    Page 6, “NER features”
  4. We also take into account whether the constituent spans the entire named entity .
    Page 6, “NER features”

See all papers in Proc. ACL 2008 that mention named entity.

See all papers in Proc. ACL that mention named entity.

Back to top.