Which Are the Best Features for Automatic Verb Classification
Li, Jianguo and Brew, Chris

Article Structure

Abstract

In this work, we develop and evaluate a wide range of feature spaces for deriving Levin—style verb classifications (Levin, 1993).

Introduction

hduch research in lexical acquisnion of verbs has concentrated on the relation between verbs and their argument frames.

Related Work

Earlier work on verb classification has generally adopted one of the two approaches for devising statistical, corpus-based features.

Integration of Syntactic and Lexical Information

In this study, we explore a wider range of features for AVC, focusing particularly on various ways to mix syntactic with lexical information.

Experiment Setup 4.1 Corpus

To collect each type of features, we use the Giga-word Corpus, which consists of samples of recent newswire text data collected from four distinct in-

Machine Learning Method

5.1 Preprocessing Data

Results and Discussion

6.1 Evaluation Metrics

Conclusion and Future Work

We have performed a wide range of experiments to identify which features are most informative in

Acknowledgments

This study was supported by NSF grant 0347799.

Topics

feature sets

Appears in 37 sentences as: feature set (18) feature sets (20)
In Which Are the Best Features for Automatic Verb Classification
  1. We develop feature sets that combine syntactic and lexical information, which are in principle useful for any Levin-style verb classification.
    Page 2, “Introduction”
  2. We test the general applicability and scalability of each feature set to the distinctions among 48 verb classes involving 1,300 verbs, which is, to our knowledge, the largest investigation on English verb classification by far.
    Page 2, “Introduction”
  3. To preview our results, a feature set that combines both syntactic information and lexical information works much better than either of them used alone.
    Page 2, “Introduction”
  4. In addition, mixed feature sets also show potential for scaling well when dealing with larger number of verbs and verb classes.
    Page 2, “Introduction”
  5. The deeper linguistic analysis allows their feature set to cover a variety of indicators of verb semantics, beyond that of frame information.
    Page 2, “Related Work”
  6. We evaluate six different feature sets for their effectiveness in AVC: SCF, DR, CO, ACO, SCF+CO, and J OANISO7 .
    Page 4, “Experiment Setup 4.1 Corpus”
  7. The other four feature sets include both syntactic and lexical information.
    Page 4, “Experiment Setup 4.1 Corpus”
  8. JOANISO7: We use the feature set proposed in J oanis et al.
    Page 4, “Experiment Setup 4.1 Corpus”
  9. In our automatic verb classification experiment, we test the applicability of each feature set to distinctions among up to 48 classes 1.
    Page 5, “Experiment Setup 4.1 Corpus”
  10. We construct a semantic space with each feature set .
    Page 5, “Machine Learning Method”
  11. Except for J ONAISO7 which only contains 224 features, all the other feature sets lead to a very high-dimensional space.
    Page 5, “Machine Learning Method”

See all papers in Proc. ACL 2008 that mention feature sets.

See all papers in Proc. ACL that mention feature sets.

Back to top.

classification tasks

Appears in 9 sentences as: classification task (1) classification tasks (8)
In Which Are the Best Features for Automatic Verb Classification
  1. (2007) demonstrates that the general feature space they devise achieves a rate of error reduction ranging from 48% to 88% over a chance baseline accuracy, across classification tasks of varying difficulty.
    Page 3, “Related Work”
  2. (2007), we adopt a single evaluation measure - macro-averaged recall - for all of our classification tasks .
    Page 6, “Results and Discussion”
  3. (2007) conducts 11 classification tasks including six 2-way classifications, two 3-way classifications, one 6-way classification, one 8-way classification, and one l4-way classification.
    Page 6, “Results and Discussion”
  4. In our experiments, we replicate these 11 classification tasks using the proposed six different feature sets.
    Page 6, “Results and Discussion”
  5. For each classification task in this task set, we randomly select 20 verbs from each class as the training set.
    Page 6, “Results and Discussion”
  6. The results for the 11 classification tasks are summarized in table 5.
    Page 6, “Results and Discussion”
  7. If we could exhaust all the possible n-way (2 g n g 48) classification tasks with the 48 Levin classes we will investigate, it will allow us to draw a firmer conclusion about the general applicability and scalability of a particular feature set.
    Page 7, “Results and Discussion”
  8. However, the number of classification tasks grows really huge when n takes on certain value (e. g. n = 20).
    Page 7, “Results and Discussion”
  9. For each n = 2, 5, 10, 20, 30, 40, 48, we will perform certain number of n-way classification tasks .
    Page 7, “Results and Discussion”

See all papers in Proc. ACL 2008 that mention classification tasks.

See all papers in Proc. ACL that mention classification tasks.

Back to top.

feature space

Appears in 8 sentences as: feature space (6) feature spaces (2)
In Which Are the Best Features for Automatic Verb Classification
  1. In this work, we develop and evaluate a wide range of feature spaces for deriving Levin—style verb classifications (Levin, 1993).
    Page 1, “Abstract”
  2. We perform the classification experiments using Bayesian Multinomial Regression (an efficient log—linear modeling framework which we found to outperform SVMs for this task) with the proposed feature spaces .
    Page 1, “Abstract”
  3. They define a general feature space that is supposed to be applicable to all Levin classes.
    Page 2, “Related Work”
  4. (2007) demonstrates that the general feature space they devise achieves a rate of error reduction ranging from 48% to 88% over a chance baseline accuracy, across classification tasks of varying difficulty.
    Page 3, “Related Work”
  5. However, they also show that their general feature space does not generally improve the classification accuracy over subcategorization frames (see table 1).
    Page 3, “Related Work”
  6. ACO features integrate at least some degree of syntactic information into the feature space .
    Page 3, “Integration of Syntactic and Lexical Information”
  7. Since one of our primary goals is to identify a general feature space that is not specific to any class distinctions, it is of great importance to understand how the classification accuracy is affected when attempting to classify more verbs into a larger number of classes.
    Page 4, “Experiment Setup 4.1 Corpus”
  8. Another feature set that combines syntactic and lexical information, ACO, which keeps function words in the feature space to preserve syntactic information, outperforms the conventional CO on the majority of tasks.
    Page 6, “Results and Discussion”

See all papers in Proc. ACL 2008 that mention feature space.

See all papers in Proc. ACL that mention feature space.

Back to top.

data sparsity

Appears in 6 sentences as: data sparsity (6)
In Which Are the Best Features for Automatic Verb Classification
  1. When the information about a verb type is not available or sufficient for us to draw firm conclusions about its usage, the information about the class to which the verb type belongs can compensate for it, addressing the pervasive problem of data sparsity in a wide range of NLP tasks, such as automatic extraction of subcategorization frames (Korhonen, 2002), semantic role labeling (Swier and Stevenson, 2004; Gildea and Juraf-sky, 2002), natural language generation for machine translation (Habash et al., 2003), and deriving predominant verb senses from unlabeled data (Lapata and Brew, 2004).
    Page 1, “Introduction”
  2. Trying to overcome the problem of data sparsity , Schulte im Walde (2000) explores the additional use of selectional preference features by augmenting each syntactic slot with the concept to which its head noun belongs in an ontology (e.g.
    Page 2, “Related Work”
  3. Although the problem of data sparsity is alleviated to certain extent (3), these features do not generally improve classification performance (Schulte im Walde, 2000; J oanis, 2002).
    Page 2, “Related Work”
  4. Dependency relation (DR): Our way to overcome data sparsity is to break lexicalized frames into lexicalized slots (a.k.a.
    Page 3, “Integration of Syntactic and Lexical Information”
  5. to data sparsity .
    Page 3, “Integration of Syntactic and Lexical Information”
  6. Our experiments on verb classification have offered a class-based approach to alleviate data sparsity problem in parsing.
    Page 8, “Conclusion and Future Work”

See all papers in Proc. ACL 2008 that mention data sparsity.

See all papers in Proc. ACL that mention data sparsity.

Back to top.

dependency relations

Appears in 5 sentences as: Dependency relation (1) Dependency relations (1) dependency relations (3)
In Which Are the Best Features for Automatic Verb Classification
  1. Dependency relation (DR): Our way to overcome data sparsity is to break lexicalized frames into lexicalized slots (a.k.a.
    Page 3, “Integration of Syntactic and Lexical Information”
  2. dependency relations ).
    Page 3, “Integration of Syntactic and Lexical Information”
  3. Dependency relations contain both syntactic and lexical information (4).
    Page 3, “Integration of Syntactic and Lexical Information”
  4. We therefore decide to break it into two individual dependency relations : PP(with), PP-fork.
    Page 3, “Integration of Syntactic and Lexical Information”
  5. Although dependency relations have been widely used in automatic acquisition of lexical information, such as detection of polysemy (Lin, 1998) and WSD (McCarthy et al., 2004), their utility in AVC still remains untested.
    Page 3, “Integration of Syntactic and Lexical Information”

See all papers in Proc. ACL 2008 that mention dependency relations.

See all papers in Proc. ACL that mention dependency relations.

Back to top.

im

Appears in 5 sentences as: im (7)
In Which Are the Best Features for Automatic Verb Classification
  1. It is therefore unsurprising that much work on verb classification has adopted them as features (Schulte im Walde, 2000; Brew and Schulte im Walde, 2002; Korhonen et al., 2003).
    Page 2, “Related Work”
  2. Trying to overcome the problem of data sparsity, Schulte im Walde (2000) explores the additional use of selectional preference features by augmenting each syntactic slot with the concept to which its head noun belongs in an ontology (e.g.
    Page 2, “Related Work”
  3. Although the problem of data sparsity is alleviated to certain extent (3), these features do not generally improve classification performance (Schulte im Walde, 2000; J oanis, 2002).
    Page 2, “Related Work”
  4. However, some of the functions words, prepositions in particular, are known to carry great amount of syntactic information that is related to lexical meanings of verbs (Schulte im Walde, 2003; Brew and Schulte im Walde, 2002; J oanis et al., 2007).
    Page 3, “Integration of Syntactic and Lexical Information”
  5. For example, Schulte im Walde (2000) uses 153 verbs in 30 classes, and Joanis et al.
    Page 4, “Experiment Setup 4.1 Corpus”

See all papers in Proc. ACL 2008 that mention im.

See all papers in Proc. ACL that mention im.

Back to top.

lexicalized

Appears in 5 sentences as: Lexicalized (2) lexicalized (4)
In Which Are the Best Features for Automatic Verb Classification
  1. Lexicalized frames are usually obtained
    Page 2, “Related Work”
  2. Dependency relation (DR): Our way to overcome data sparsity is to break lexicalized frames into lexicalized slots (a.k.a.
    Page 3, “Integration of Syntactic and Lexical Information”
  3. We first build a lexicalized frame for the verb break: NPl(he)-V—NP2(door)-PP(with:hammer).
    Page 4, “Experiment Setup 4.1 Corpus”
  4. Based on the lexicalized frame, we construct an SCF NPl-NP2-PPwith for break.
    Page 4, “Experiment Setup 4.1 Corpus”
  5. Lexicalized PCFGs (where head words annotate phrasal nodes) have proved a key tool for high performance PCFG parsing, however its performance is hampered by the sparse lexical dependency exhibited in the Penn Treebank.
    Page 8, “Conclusion and Future Work”

See all papers in Proc. ACL 2008 that mention lexicalized.

See all papers in Proc. ACL that mention lexicalized.

Back to top.

semantic space

Appears in 5 sentences as: semantic space (4) semantic spaces (1)
In Which Are the Best Features for Automatic Verb Classification
  1. We represent the semantic space for verbs as a matrix of frequencies, where each row corresponds to a Levin verb and each column represents a given feature.
    Page 5, “Machine Learning Method”
  2. We construct a semantic space with each feature set.
    Page 5, “Machine Learning Method”
  3. For instance, the semantic space with CO features contains over one million columns, which is too huge and cumbersome.
    Page 5, “Machine Learning Method”
  4. One way to avoid these high-dimensional spaces is to assume that most of the features are irrelevant, an assumption adopted by many of the previous studies working with high-dimensional semantic spaces (Burgess and Lund, 1997; Pado and Lapata, 2007; Rohde et al., 2004).
    Page 5, “Machine Learning Method”
  5. Burgess and Lund (1997) suggests that the semantic space can be reduced by keeping only the k columns (features) with the highest variance.
    Page 5, “Machine Learning Method”

See all papers in Proc. ACL 2008 that mention semantic space.

See all papers in Proc. ACL that mention semantic space.

Back to top.

CCG

Appears in 4 sentences as: CCG (4)
In Which Are the Best Features for Automatic Verb Classification
  1. SCF and DR: These more linguistically informed features are constructed based on the grammatical relations generated by the C&C CCG parser (Clark and Curran, 2007).
    Page 4, “Experiment Setup 4.1 Corpus”
  2. We extract features on the basis of the output generated by the C&C CCG parser.
    Page 4, “Experiment Setup 4.1 Corpus”
  3. One explanation for the poor performance could be that we use all the frames generated by the CCG parser in our experiment.
    Page 8, “Results and Discussion”
  4. To see if Levin-selected SCFs are more effective for AVC, we match each SCF generated by the C&C CCG parser (CCG-SCF) to one of 78 Levin-defined SCFs, and refer to the resulting SCF set as unfiltered-Levin-SCF.
    Page 8, “Results and Discussion”

See all papers in Proc. ACL 2008 that mention CCG.

See all papers in Proc. ACL that mention CCG.

Back to top.

co-occurrence

Appears in 4 sentences as: Co-occurrence (1) co-occurrence (3)
In Which Are the Best Features for Automatic Verb Classification
  1. Co-occurrence (CO): CO features mostly convey lexical information only and are generally considered not particularly sensitive to argument structures (Rohde et al., 2004).
    Page 3, “Integration of Syntactic and Lexical Information”
  2. Adapted co-occurrence (ACO): Conventional CO features generally adopt a stop list to filter out function words.
    Page 3, “Integration of Syntactic and Lexical Information”
  3. On the other hand, the co-occurrence feature (CO), which is believed to convey only lexical information, outperforms SCF on every n-way classification when n 2 10, suggesting that verbs in the same Levin classes tend to share their neighboring words.
    Page 7, “Results and Discussion”
  4. In fact, even the simple co-occurrence feature (CO) yields a better performance (42.4%) than these Levin-selected SCF sets.
    Page 8, “Results and Discussion”

See all papers in Proc. ACL 2008 that mention co-occurrence.

See all papers in Proc. ACL that mention co-occurrence.

Back to top.

semantic role

Appears in 3 sentences as: semantic role (2) semantic roles (1)
In Which Are the Best Features for Automatic Verb Classification
  1. Many scholars hypothesize that the behavior of a verb, particularly with respect to the expression of arguments and the assignment of semantic roles is to a large extent driven by deep semantic regularities (Dowty, 1991; Green, 1974; Goldberg, 1995; Levin, 1993).
    Page 1, “Introduction”
  2. When the information about a verb type is not available or sufficient for us to draw firm conclusions about its usage, the information about the class to which the verb type belongs can compensate for it, addressing the pervasive problem of data sparsity in a wide range of NLP tasks, such as automatic extraction of subcategorization frames (Korhonen, 2002), semantic role labeling (Swier and Stevenson, 2004; Gildea and Juraf-sky, 2002), natural language generation for machine translation (Habash et al., 2003), and deriving predominant verb senses from unlabeled data (Lapata and Brew, 2004).
    Page 1, “Introduction”
  3. 0 Animacy of NPs: The animacy of the semantic role corresponding to the head noun in each syntactic slot can also distinguish classes of verbs.
    Page 3, “Related Work”

See all papers in Proc. ACL 2008 that mention semantic role.

See all papers in Proc. ACL that mention semantic role.

Back to top.