Improving pairwise coreference models through feature space hierarchy learning
Lassalle, Emmanuel and Denis, Pascal

Article Structure

Abstract

This paper proposes a new method for significantly improving the performance of pairwise coreference models.

Introduction

Coreference resolution is the problem of partitioning a sequence of noun phrases (or mentions), as they occur in a natural language text, into a set of referential entities.

Modeling pairs

Pairwise models basically employ one local classifier to decide whether two mentions are coreferential or not.

Hierarchizing feature spaces

In this section, we have to keep in mind that separating the pairs in different models is the same as building a large feature space in which the parameter w can be learned by parts in independent subspaces.

System description

Our system consists in the pairwise model obtained by cutting a hierarchy (the PA with selected feature space) and using a greedy decoder to create clusters from the output.

Experiments

5.1 Data

Conclusion and perspectives

In this paper, we described a method for selecting a feature space among a very large number of choices by using linearity and by combining indicators to separate the instances.

Topics

feature space

Appears in 41 sentences as: feature space (28) Feature spaces (1) feature spaces (14)
In Improving pairwise coreference models through feature space hierarchy learning
  1. In effect, our approach finds an optimal feature space (derived from a base feature set and indicator set) for discriminating coreferential mention pairs.
    Page 1, “Abstract”
  2. Although our approach explores a very large space of possible feature spaces , it remains tractable by exploiting the structure of the hierarchies built from the indicators.
    Page 1, “Abstract”
  3. It is worth noting that, from a machine learning point of view, this is related to feature extraction in that both approaches in effect recast the pairwise classification problem in higher dimensional feature spaces .
    Page 2, “Introduction”
  4. We will see that this is also equivalent to selecting a single large adequate feature space by using the data.
    Page 2, “Introduction”
  5. Given a document, the number of mentions is fixed and each pair of mentions follows a certain distribution (that we partly observe in a feature space ).
    Page 2, “Modeling pairs”
  6. 2.2 Feature spaces 2.2.1 Definitions
    Page 3, “Modeling pairs”
  7. that casts pairs into a feature space F through which we observe them.
    Page 3, “Modeling pairs”
  8. For technical coherence, we assume that gbfl and ¢f2 have the same values when projected on the feature space J31 0 F2: it means that common features from two feature spaces have the same values.
    Page 3, “Modeling pairs”
  9. In particular, when using a mapping to a too small feature space , the classifier cannot capture the distribution very well: the data is too noisy.
    Page 3, “Modeling pairs”
  10. In this way, we may separate positive and negative pairs more easily if we cast each kind of pair into a specific feature space .
    Page 3, “Modeling pairs”
  11. Let us call these feature spaces J31 and F2.
    Page 3, “Modeling pairs”

See all papers in Proc. ACL 2013 that mention feature space.

See all papers in Proc. ACL that mention feature space.

Back to top.

coreference

Appears in 13 sentences as: Coreference (1) coreference (8) coreferent (1) coreferential (6)
In Improving pairwise coreference models through feature space hierarchy learning
  1. This paper proposes a new method for significantly improving the performance of pairwise coreference models.
    Page 1, “Abstract”
  2. In effect, our approach finds an optimal feature space (derived from a base feature set and indicator set) for discriminating coreferential mention pairs.
    Page 1, “Abstract”
  3. Coreference resolution is the problem of partitioning a sequence of noun phrases (or mentions), as they occur in a natural language text, into a set of referential entities.
    Page 1, “Introduction”
  4. A common approach to this problem is to separate it into two modules: on the one hand, one defines a model for evaluating coreference links, in general a discriminative classifier that detects coreferential mention pairs.
    Page 1, “Introduction”
  5. In this kind of architecture, the performance of the entire coreference system strongly depends on the quality of the local pairwise classifier.1 Consequently, a lot of research effort on coreference resolution has focused on trying to boost the performance of the pairwise classifier.
    Page 1, “Introduction”
  6. The main question we raise is, given a set of indicators (such as grammatical types, distance between two mentions, or named entity types), how to best partition the pool of mention pair examples in order to best discriminate coreferential pairs from non coreferential ones.
    Page 2, “Introduction”
  7. Pairwise models basically employ one local classifier to decide whether two mentions are coreferential or not.
    Page 2, “Modeling pairs”
  8. For instance, some coreference resolution systems process different kinds of anaphors separately, which suggests for example that pairs containing an anaphoric pronoun behave differently from pairs with non-
    Page 2, “Modeling pairs”
  9. where Q classically represents randomness, X is the space of objects (“mention pairs”) that is not directly observable and yij(w) E 3/ = {+1, —1} are the labels indicating whether mi and 7m are coreferential or not.
    Page 3, “Modeling pairs”
  10. From this formal point of view, the task of coreference resolution consists in fixing of, obserVing labeled samples (in): y)t}t€TrainSet and, given partially observed new variables
    Page 3, “Modeling pairs”
  11. We tested 3 classical greedy link selection strategies that form clusters from the classifier decision: Closest-First (merge mentions with their closest coreferent mention on the left) (Soon et al., 2001),
    Page 6, “System description”

See all papers in Proc. ACL 2013 that mention coreference.

See all papers in Proc. ACL that mention coreference.

Back to top.

coreference resolution

Appears in 6 sentences as: Coreference resolution (1) coreference resolution (5)
In Improving pairwise coreference models through feature space hierarchy learning
  1. Coreference resolution is the problem of partitioning a sequence of noun phrases (or mentions), as they occur in a natural language text, into a set of referential entities.
    Page 1, “Introduction”
  2. In this kind of architecture, the performance of the entire coreference system strongly depends on the quality of the local pairwise classifier.1 Consequently, a lot of research effort on coreference resolution has focused on trying to boost the performance of the pairwise classifier.
    Page 1, “Introduction”
  3. For instance, some coreference resolution systems process different kinds of anaphors separately, which suggests for example that pairs containing an anaphoric pronoun behave differently from pairs with non-
    Page 2, “Modeling pairs”
  4. From this formal point of view, the task of coreference resolution consists in fixing of, obserVing labeled samples (in): y)t}t€TrainSet and, given partially observed new variables
    Page 3, “Modeling pairs”
  5. This is a rather idealized setting but our focus is on comparing various pairwise local models rather than on building a full coreference resolution system.
    Page 7, “Experiments”
  6. method to optimize the pairwise model of a coreference resolution system.
    Page 9, “Conclusion and perspectives”

See all papers in Proc. ACL 2013 that mention coreference resolution.

See all papers in Proc. ACL that mention coreference resolution.

Back to top.

entity types

Appears in 5 sentences as: entity type (1) entity types (4)
In Improving pairwise coreference models through feature space hierarchy learning
  1. The main question we raise is, given a set of indicators (such as grammatical types, distance between two mentions, or named entity types ), how to best partition the pool of mention pair examples in order to best discriminate coreferential pairs from non coreferential ones.
    Page 2, “Introduction”
  2. , 0 entity types .
    Page 4, “Hierarchizing feature spaces”
  3. We used classical features that can be found in details in (Bengston and Roth, 2008) and (Rah-man and Ng, 2011): grammatical type and subtype of mentions, string match and substring, apposition and copula, distance (number of separating mentions/sentences/words), gender/number match, synonymy/hypemym and animacy (using WordNet), family name (based on lists), named entity types , syntactic features (gold parse) and anaphoricity detection.
    Page 6, “System description”
  4. As indicators we used: left and right grammatical types and subtypes, entity types , a boolean indicating if the mentions are in the same sentence, and a very coarse histogram of distance in terms of sentences.
    Page 6, “System description”
  5. Depending on the document category, we found some variations as to which hierarchy was learned in each setting, but we noticed that parameters starting with right and left gramtypes often produced quite good hierarchies: for instance right gramtype —> left gramtype —> same sentence —> right named entity type .
    Page 8, “Experiments”

See all papers in Proc. ACL 2013 that mention entity types.

See all papers in Proc. ACL that mention entity types.

Back to top.

Shared Task

Appears in 5 sentences as: Shared Task (4) Shared Tasks (1)
In Improving pairwise coreference models through feature space hierarchy learning
  1. Our experiments on the C0NLL-2012 Shared Task English datasets (gold mentions) indicate that our method is robust relative to different clustering strategies and evaluation metrics, showing large and consistent improvements over a single pairwise model using the same base features.
    Page 1, “Abstract”
  2. As will be shown based on a variety of experiments on the CoNLL-2012 Shared Task English datasets, these improvements are consistent across different evaluation metrics and for the most part independent of the clustering decoder that was used.
    Page 2, “Introduction”
  3. We evaluated the system on the English part of the corpus provided in the CoNLL-2012 Shared Task (Pradhan et al., 2012), referred to as CoNLL-2012 here.
    Page 7, “Experiments”
  4. These metrics were recently used in the CoNLL-2011 and -2012 Shared Tasks .
    Page 7, “Experiments”
  5. The best classifier-decoder combination reaches a score of 67.19, which would place it above the mean score (66.41) of the systems that took part in the C0NLL—2012 Shared Task (gold mentions track).
    Page 8, “Experiments”

See all papers in Proc. ACL 2013 that mention Shared Task.

See all papers in Proc. ACL that mention Shared Task.

Back to top.

data sparsity

Appears in 4 sentences as: data sparseness (1) data sparsity (3)
In Improving pairwise coreference models through feature space hierarchy learning
  1. In effect, we want to learn the “best” subspaces for our different models: that is, subspaces that are neither too coarse (i.e., unlikely to separate the data well) nor too specific (i.e., prone to data sparseness and noise).
    Page 2, “Introduction”
  2. In practice, we have to cope with data sparsity : there will not be enough data to properly train a linear model on such a space.
    Page 3, “Modeling pairs”
  3. The small number of outputs of an indicator is required for practical reasons: if a category of pairs is too refined, the associated feature space will suffer from data sparsity .
    Page 4, “Hierarchizing feature spaces”
  4. We observed that product-hierarchies did not performed well without cutting (especially when using longer sequences of indicators, because of data sparsity ) and could obtain scores lower than the single model.
    Page 8, “Experiments”

See all papers in Proc. ACL 2013 that mention data sparsity.

See all papers in Proc. ACL that mention data sparsity.

Back to top.

dynamic programming

Appears in 4 sentences as: dynamic programming (4)
In Improving pairwise coreference models through feature space hierarchy learning
  1. hierarchies with dynamic programming .
    Page 2, “Introduction”
  2. Now selecting the best space for one of these measures can be achieved by using dynamic programming techniques.
    Page 5, “Hierarchizing feature spaces”
  3. We use a dynamic programming technique to compute the best hierarchy by cutting this tree and only keeping classifiers situated at the leaf.
    Page 5, “Hierarchizing feature spaces”
  4. We employed dynamic programming on hierarchies of indicators to compute the feature space providing the best pairwise classifications efficiently.
    Page 8, “Conclusion and perspectives”

See all papers in Proc. ACL 2013 that mention dynamic programming.

See all papers in Proc. ACL that mention dynamic programming.

Back to top.

evaluation metrics

Appears in 3 sentences as: Evaluation metrics (1) evaluation metrics (2)
In Improving pairwise coreference models through feature space hierarchy learning
  1. Our experiments on the C0NLL-2012 Shared Task English datasets (gold mentions) indicate that our method is robust relative to different clustering strategies and evaluation metrics , showing large and consistent improvements over a single pairwise model using the same base features.
    Page 1, “Abstract”
  2. As will be shown based on a variety of experiments on the CoNLL-2012 Shared Task English datasets, these improvements are consistent across different evaluation metrics and for the most part independent of the clustering decoder that was used.
    Page 2, “Introduction”
  3. 5.3 Evaluation metrics
    Page 7, “Experiments”

See all papers in Proc. ACL 2013 that mention evaluation metrics.

See all papers in Proc. ACL that mention evaluation metrics.

Back to top.

named entity

Appears in 3 sentences as: named entity (3)
In Improving pairwise coreference models through feature space hierarchy learning
  1. The main question we raise is, given a set of indicators (such as grammatical types, distance between two mentions, or named entity types), how to best partition the pool of mention pair examples in order to best discriminate coreferential pairs from non coreferential ones.
    Page 2, “Introduction”
  2. We used classical features that can be found in details in (Bengston and Roth, 2008) and (Rah-man and Ng, 2011): grammatical type and subtype of mentions, string match and substring, apposition and copula, distance (number of separating mentions/sentences/words), gender/number match, synonymy/hypemym and animacy (using WordNet), family name (based on lists), named entity types, syntactic features (gold parse) and anaphoricity detection.
    Page 6, “System description”
  3. Depending on the document category, we found some variations as to which hierarchy was learned in each setting, but we noticed that parameters starting with right and left gramtypes often produced quite good hierarchies: for instance right gramtype —> left gramtype —> same sentence —> right named entity type.
    Page 8, “Experiments”

See all papers in Proc. ACL 2013 that mention named entity.

See all papers in Proc. ACL that mention named entity.

Back to top.

significant improvement

Appears in 3 sentences as: significant improvement (1) significant improvements (1) significantly improving (1)
In Improving pairwise coreference models through feature space hierarchy learning
  1. This paper proposes a new method for significantly improving the performance of pairwise coreference models.
    Page 1, “Abstract”
  2. Instead of imposing heuristic product of features, we will show that a clever separation of instances leads to significant improvements of the pairwise model.
    Page 3, “Modeling pairs”
  3. Using different kinds of greedy decoders, we showed a significant improvement of the system.
    Page 9, “Conclusion and perspectives”

See all papers in Proc. ACL 2013 that mention significant improvement.

See all papers in Proc. ACL that mention significant improvement.

Back to top.