Descending-Path Convolution Kernel for Syntactic Structures
Lin, Chen and Miller, Timothy and Kho, Alvin and Bethard, Steven and Dligach, Dmitriy and Pradhan, Sameer and Savova, Guergana

Article Structure

Abstract

Convolution tree kernels are an efficient and effective method for comparing syntactic structures in NLP methods.

Introduction

Syntactic structure can provide useful features for many natural language processing (NLP) tasks such as semantic role labeling, coreference resolution, temporal relation discovery, and others.

Background

2.1 Syntax-based Tree Kernels

Methods

Here we decribe the Descending Path Kernel (DPK).

Evaluation

We applied DPK to two published temporal relation extraction systems: (Miller et al., 2013) in the clinical domain and Cleartk—TimeML (Bethard, 2013) in the general domain respectively.

Conclusion

In this paper, we developed a novel convolution tree kernel (DPK) for measuring syntactic similarity.

Topics

tree kernel

Appears in 18 sentences as: tree kernel (11) Tree Kernels (1) tree kernels (9)
In Descending-Path Convolution Kernel for Syntactic Structures
  1. Convolution tree kernels are an efficient and effective method for comparing syntactic structures in NLP methods.
    Page 1, “Abstract”
  2. However, current kernel methods such as subset tree kernel and partial tree kernel understate the similarity of very similar tree structures.
    Page 1, “Abstract”
  3. Convolution kernels over syntactic trees ( tree kernels ) offer a potential solution to this problem by providing relatively efficient algorithms for computing similarities between entire discrete structures.
    Page 1, “Introduction”
  4. However, conventional tree kernels are sensitive to pattern variations.
    Page 1, “Introduction”
  5. For example, two trees in Figure 1(a) sharing the same structure except for one terminal symbol are deemed at most 67% similar by the conventional tree kernel (PTK) (Moschitti, 2006).
    Page 1, “Introduction”
  6. This approach assigns more robust similarity scores (e. g., 78% similarity in the above example) than other soft matching tree kernels, is faster than the partial tree kernel (Moschitti, 2006), and is less ad hoc than the grammar-based convolution kernel (Zhang et al., 2007).
    Page 1, “Introduction”
  7. 2.1 Syntax-based Tree Kernels
    Page 1, “Background”
  8. Syntax-based tree kernels quantify the similarity between two constituent parses by counting their common substructures.
    Page 1, “Background”
  9. The partial tree kernel (PTK) relaxes the definition of subtrees to allow partial production rule
    Page 1, “Background”
  10. Many methods exist for synthesizing syntactic information for temporal relation extraction, and most use traditional tree kernels with various feature representations.
    Page 2, “Background”
  11. Compared with the previous tree kernels , our descending path kernel has the following advantages: l) the substructures are simplified so that they are more likely to be shared among trees, and therefore the sparse feature issues of previous kernels could be alleviated by this representation; 2) soft matching between two similar structures (e.g., NP—>DT JJ NN versus NP—>DT NN) have high similarity without reference to any corpus or grammar rules;
    Page 3, “Methods”

See all papers in Proc. ACL 2014 that mention tree kernel.

See all papers in Proc. ACL that mention tree kernel.

Back to top.

relation extraction

Appears in 8 sentences as: Relation Extraction (1) relation extraction (7)
In Descending-Path Convolution Kernel for Syntactic Structures
  1. This method is evaluated on two temporal relation extraction tasks and demonstrates its advantage over rich syntactic representations.
    Page 1, “Abstract”
  2. 2.2 Temporal Relation Extraction
    Page 2, “Background”
  3. Among NLP tasks that use syntactic information, temporal relation extraction has been drawing growing attention because of its wide applications in multiple domains.
    Page 2, “Background”
  4. Many methods exist for synthesizing syntactic information for temporal relation extraction , and most use traditional tree kernels with various feature representations.
    Page 2, “Background”
  5. (2009) used the path-enclosed tree (PET) representation to represent syntactic information for temporal relation extraction on the Time-Bank (Pustej ovsky et al., 2003) and the AQUAINT TimeML corpusl.
    Page 2, “Background”
  6. We applied DPK to two published temporal relation extraction systems: (Miller et al., 2013) in the clinical domain and Cleartk—TimeML (Bethard, 2013) in the general domain respectively.
    Page 4, “Evaluation”
  7. Table 2: Comparison of tree kernel performance for temporal relation extraction on THYME and TempEval-2013 data.
    Page 5, “Evaluation”
  8. Future work will explore 1) a composite kernel which uses DPK for PET trees, SST for BT and PT, and feature kernel for flat features, so that different tree kernels can work with their ideal syntactic representations; 2) incorporate dependency structures for tree kernel analysis 3) applying DPK to other relation extraction tasks on various corpora.
    Page 5, “Conclusion”

See all papers in Proc. ACL 2014 that mention relation extraction.

See all papers in Proc. ACL that mention relation extraction.

Back to top.

similarity scores

Appears in 8 sentences as: similarity score (4) similarity scores (5)
In Descending-Path Convolution Kernel for Syntactic Structures
  1. Although soft-matching approaches can improve the similarity scores , they are corpus-dependent and match relaxations may be task-specific.
    Page 1, “Abstract”
  2. We propose an alternative approach called descending path kernel which gives intuitive similarity scores on comparable structures.
    Page 1, “Abstract”
  3. This approach assigns more robust similarity scores (e. g., 78% similarity in the above example) than other soft matching tree kernels, is faster than the partial tree kernel (Moschitti, 2006), and is less ad hoc than the grammar-based convolution kernel (Zhang et al., 2007).
    Page 1, “Introduction”
  4. For example, the similarity score between the NPs in Figure l(b) would be zero since the production rule is different (the overall similarity score is above-zero because of matching pre-terminals).
    Page 1, “Background”
  5. Unlike SST and PTK, once the root category comparison is successfully completed, DPK looks at all paths that go through it and accumulates their similarity scores independent of ordering — in other words, it will ignore the ordering of the children in its pro-
    Page 4, “Methods”
  6. This means, for example, that if the rule production NP —> NN J J DT were ever found in a tree, to DPK it would be indistinguishable from the common production NP —> DT JJ NN, despite having inverted word order, and thus would have a maximal similarity score .
    Page 4, “Methods”
  7. For the tree kernel KT, subset tree (SST) kernel was applied on each tree representation p. The final similarity score between two instances is the T-weighted sum of the similarities of all representations, combined with the flat feature (FF) similarity as measured by a feature kernel K F (linear or polynomial).
    Page 5, “Evaluation”
  8. This kernel uses a descending path representation in trees to allow higher similarity scores on partially matching structures, while being simpler and faster than other methods for doing the same.
    Page 5, “Conclusion”

See all papers in Proc. ACL 2014 that mention similarity scores.

See all papers in Proc. ACL that mention similarity scores.

Back to top.

word order

Appears in 4 sentences as: word order (2) word ordering (2)
In Descending-Path Convolution Kernel for Syntactic Structures
  1. However, another view of the DPK is possible by thinking of it as cheaply calculating rule production similarity by taking advantage of relatively strict English word ordering .
    Page 4, “Methods”
  2. This means, for example, that if the rule production NP —> NN J J DT were ever found in a tree, to DPK it would be indistinguishable from the common production NP —> DT JJ NN, despite having inverted word order , and thus would have a maximal similarity score.
    Page 4, “Methods”
  3. SST and PTK would assign this pair a much lower score for having completely different ordering, but we suggest that cases such as these are very rare due to the relatively strict word ordering of English.
    Page 4, “Methods”
  4. This analysis does lead to a hypothesis for the general viability of the DPK, suggesting that in languages with freer word order it may give inflated scores to structures that are syntactically dissimilar if they have the same constituent components in different order.
    Page 4, “Methods”

See all papers in Proc. ACL 2014 that mention word order.

See all papers in Proc. ACL that mention word order.

Back to top.

parse tree

Appears in 3 sentences as: parse tree (3)
In Descending-Path Convolution Kernel for Syntactic Structures
  1. Figure 2: A parse tree (left) and its descending paths according to Definition 1 (l - length).
    Page 2, “Background”
  2. Definition 1 (Descending Path): Let T be a parse tree , 2) any nonterminal node in T, do a descendant of 2), including terminals.
    Page 3, “Methods”
  3. Figure 2 illustrates a parse tree and its descending paths of different lengths.
    Page 3, “Methods”

See all papers in Proc. ACL 2014 that mention parse tree.

See all papers in Proc. ACL that mention parse tree.

Back to top.