Verb Classification using Distributional Similarity in Syntactic and Semantic Structures
Croce, Danilo and Moschitti, Alessandro and Basili, Roberto and Palmer, Martha

Article Structure

Abstract

In this paper, we propose innovative representations for automatic classification of verbs according to mainstream linguistic theories, namely VerbNet and FrameNet.

Introduction

Verb classification is a fundamental topic of computational linguistics research given its importance for understanding the role of verbs in conveying semantics of natural language (NL).

Related work

Our target task is verb classification but at the same time our models exploit distributional models as well as structural kernels.

Structural Similarity Functions

In this paper we model verb classifiers by exploiting previous technology for kernel methods.

Verb Classification Models

The design of SPTK-based algorithms for our verb classification requires the modeling of two different aspects: (i) a tree representation for the verbs; and (ii) the lexical similarity suitable for the task.

Experiments

In these experiments, we tested the impact of our different verb representations using different kernels, similarities and parameters.

Model Analysis and Discussion

We carried out analysis of system errors and its induced features.

Conclusion

We have proposed new approaches to characterize verb classes in learning algorithms.

Topics

tree kernels

Appears in 9 sentences as: tree kernel (4) Tree Kernels (2) tree kernels (8)
In Verb Classification using Distributional Similarity in Syntactic and Semantic Structures
  1. Then, we design advanced similarity functions between such structures, i.e., semantic tree kernel functions, for exploiting distributional and grammatical information in Support Vector Machines.
    Page 1, “Abstract”
  2. Recently, DMs have been also proposed in integrated syntacticsemantic structures that feed advanced learning functions, such as the semantic tree kernels discussed in (Bloehdorn and Moschitti, 2007a; Bloehdorn and Moschitti, 2007b; Mehdad et al., 2010; Croce et al., 2011).
    Page 3, “Related work”
  3. In particular, we design new models for verb classification by adopting algorithms for structural similarity, known as Smoothed Partial Tree Kernels (SPTKs) (Croce et al., 2011).
    Page 3, “Structural Similarity Functions”
  4. 3.2 Tree Kernels driven by Semantic Similarity To our knowledge, two main types of tree kernels exploit lexical similarity: the syntactic semantic tree kernel defined in (Bloehdom and Moschitti, 2007a) applied to constituency trees and the smoothed partial tree kernels (SPTKs) defined in (Croce et al., 2011), which generalizes the former.
    Page 3, “Structural Similarity Functions”
  5. The A function determines the richness of the kernel space and thus induces different tree kernels, for example, the syntactic tree kernel (STK) (Collins and Duffy, 2002) or the partial tree kernel (PTK) (Moschitti, 2006).
    Page 4, “Structural Similarity Functions”
  6. Here, we apply tree pruning to reduce the computational complexity of tree kernels as it is proportional to the number of nodes in the input trees.
    Page 4, “Verb Classification Models”
  7. To encode dependency structure information in a tree (so that we can use it in tree kernels ), we use (i) lexemes as nodes of our tree, (ii) their dependencies as edges between the nodes and (iii) the dependency labels, e. g., grammatical functions (GR), and POS-Tags, again as tree nodes.
    Page 5, “Verb Classification Models”
  8. available; and (ii) in 76% of the errors only 2 or less argument heads are included in the extracted tree, therefore tree kernels cannot exploit enough lexical information to disambiguate verb senses.
    Page 8, “Model Analysis and Discussion”
  9. the capability of tree kernels to implicitly trigger useful linguistic inductions for complex semantic tasks.
    Page 8, “Model Analysis and Discussion”

See all papers in Proc. ACL 2012 that mention tree kernels.

See all papers in Proc. ACL that mention tree kernels.

Back to top.

IM

Appears in 5 sentences as: IM (6)
In Verb Classification using Distributional Similarity in Syntactic and Semantic Structures
  1. ( IM (VB (target))(OBJ))
    Page 8, “Model Analysis and Discussion”
  2. (VC(VB (target))(OBJ)) (VC(VBG(target))(OBJ)) (OPRD(TO)( IM (VB(target))(OBJ))) (PMOD(VBG(target))(OBJ))
    Page 8, “Model Analysis and Discussion”
  3. (PRP(TO)( IM (VB (target))(OBJ)))
    Page 8, “Model Analysis and Discussion”
  4. (IM(VB (target))(OBJ)(ADV(IN)(PMOD))) (OPRD(TO)( IM (VB(target))(OBJ)(ADV(IN)(PMOD))))
    Page 8, “Model Analysis and Discussion”
  5. (VC(VB (target))(OBJ)) (NMOD(VBG(target))(OPRD)) (VC(VBN(target))(OPRD)) (NMOD(VBN(target))(OPRD)) (PMOD(VBG(target))(OBJ)) (ROOT(SBJ)(VBD(target))(OBJ)(P(,))) (VC(VB (target))(OPRD)) (ROOT(SBJ)(VBZ(target))(OBJ)(P(,))) (NMOD(SBJ(WDT))(VBZ(target))(OPRD)) (NMOD(SBJ)(VBZ(target))(OPRD(SBJ)(TO)( IM )))
    Page 8, “Model Analysis and Discussion”

See all papers in Proc. ACL 2012 that mention IM.

See all papers in Proc. ACL that mention IM.

Back to top.

semantic similarity

Appears in 4 sentences as: Semantic Similarity (1) semantic similarity (3)
In Verb Classification using Distributional Similarity in Syntactic and Semantic Structures
  1. 4 describe previous and our models for syntactic and semantic similarity , respectively, Sec.
    Page 2, “Introduction”
  2. as a direct object is in fact a strong characterization for semantic similarity , as all the nouns m similar to n tend to collocate with the same verbs.
    Page 2, “Related work”
  3. By inducing geometrical notions of vectors and norms through corpus analysis, they provide a topological definition of semantic similarity , i.e., distance in a space.
    Page 2, “Related work”
  4. 3.2 Tree Kernels driven by Semantic Similarity To our knowledge, two main types of tree kernels exploit lexical similarity: the syntactic semantic tree kernel defined in (Bloehdom and Moschitti, 2007a) applied to constituency trees and the smoothed partial tree kernels (SPTKs) defined in (Croce et al., 2011), which generalizes the former.
    Page 3, “Structural Similarity Functions”

See all papers in Proc. ACL 2012 that mention semantic similarity.

See all papers in Proc. ACL that mention semantic similarity.

Back to top.

Support Vector

Appears in 4 sentences as: Support Vector (2) support vectors (2)
In Verb Classification using Distributional Similarity in Syntactic and Semantic Structures
  1. Then, we design advanced similarity functions between such structures, i.e., semantic tree kernel functions, for exploiting distributional and grammatical information in Support Vector Machines.
    Page 1, “Abstract”
  2. The nice property of kernel functions is that they can be used in place of the scalar product of feature vectors to train algorithms such as Support Vector Machines (SVMs).
    Page 2, “Introduction”
  3. In line with the method discussed in (Pighin and Moschitti, 2009b), these fragments are extracted as they appear in most of the support vectors selected during SVM training.
    Page 8, “Model Analysis and Discussion”
  4. the underlying support vectors ) confirm very interesting grammatical generalizations, i.e.
    Page 8, “Model Analysis and Discussion”

See all papers in Proc. ACL 2012 that mention Support Vector.

See all papers in Proc. ACL that mention Support Vector.

Back to top.

learning algorithms

Appears in 3 sentences as: learning algorithms (3)
In Verb Classification using Distributional Similarity in Syntactic and Semantic Structures
  1. Note that their coding in learning algorithms is rather complex: we need to take into account syntactic structures, which may require an exponential number of syntactic features (i.e., all their possible substructures).
    Page 1, “Introduction”
  2. We have proposed new approaches to characterize verb classes in learning algorithms .
    Page 8, “Conclusion”
  3. The advantage of kernel methods is that they can be directly used in some learning algorithms , e.g., SVMs, to train verb classifiers.
    Page 8, “Conclusion”

See all papers in Proc. ACL 2012 that mention learning algorithms.

See all papers in Proc. ACL that mention learning algorithms.

Back to top.

lexicals

Appears in 3 sentences as: lexicalized (1) lexicals (2)
In Verb Classification using Distributional Similarity in Syntactic and Semantic Structures
  1. We define and automatically extract a lexicalized approximation of the latter.
    Page 2, “Introduction”
  2. (ii) The second type of tree uses lexicals as central nodes on which both GR and POS-Tag are added as the rightmost children.
    Page 5, “Verb Classification Models”
  3. For LEX type, we apply a lexical similarity learned with LSA to only pairs of lexicals associated with the same POS-Tag.
    Page 5, “Verb Classification Models”

See all papers in Proc. ACL 2012 that mention lexicals.

See all papers in Proc. ACL that mention lexicals.

Back to top.

statistically significant

Appears in 3 sentences as: statistical significance (1) statistically significant (2)
In Verb Classification using Distributional Similarity in Syntactic and Semantic Structures
  1. We also derive statistical significance of the results by using the model described in (Yeh, 2000) and implemented in (Pado, 2006).
    Page 6, “Experiments”
  2. Third, PTK, which produces more general structures, improves over BR by almost 1.5 ( statistically significant result) when using our dependency structures GRCT and LCT.
    Page 7, “Experiments”
  3. Finally, the best model of SPTK (i.e, using LCT) improves over the best PTK (i.e., using LCT) by almost 1 point ( statistically significant result): this difference is only given by lexical similarity.
    Page 7, “Experiments”

See all papers in Proc. ACL 2012 that mention statistically significant.

See all papers in Proc. ACL that mention statistically significant.

Back to top.

SVM

Appears in 3 sentences as: SVM (3)
In Verb Classification using Distributional Similarity in Syntactic and Semantic Structures
  1. This uses a set of binary SVM classifiers, one for each verb class (frame) 73.
    Page 6, “Experiments”
  2. In the classification phase the binary classifiers are applied by (i) only considering classes that are compatible with the target verbs; and (ii) selecting the class associated with the maximum positive SVM margin.
    Page 6, “Experiments”
  3. In line with the method discussed in (Pighin and Moschitti, 2009b), these fragments are extracted as they appear in most of the support vectors selected during SVM training.
    Page 8, “Model Analysis and Discussion”

See all papers in Proc. ACL 2012 that mention SVM.

See all papers in Proc. ACL that mention SVM.

Back to top.

WordNet

Appears in 3 sentences as: WordNet (3)
In Verb Classification using Distributional Similarity in Syntactic and Semantic Structures
  1. In (Brown et al., 2011), the so called dynamic dependency neighborhoods (DDN), i.e., the set of verbs that are typically collocated with a direct object, are shown to be more helpful than lexical information (e. g., WordNet ).
    Page 2, “Related work”
  2. In supervised language learning, when few examples are available, DMs support cost-effective lexical generalizations, often outperforming knowledge based resources (such as WordNet , as in (Pantel et al., 2007)).
    Page 2, “Related work”
  3. The contribution of (ii) is proportional to the lexical similarity of the tree lexical nodes, where the latter can be evaluated according to distributional models or also lexical resources, e. g., WordNet .
    Page 3, “Structural Similarity Functions”

See all papers in Proc. ACL 2012 that mention WordNet.

See all papers in Proc. ACL that mention WordNet.

Back to top.