Decentralized Entity-Level Modeling for Coreference Resolution
Durrett, Greg and Hall, David and Klein, Dan

Article Structure

Abstract

Efficiently incorporating entity-level information is a challenge for coreference resolution systems due to the difficulty of exact inference over partitions.

Introduction

The inclusion of entity-level features has been a driving force behind the development of many coreference resolution systems (Luo et al., 2004; Rahman and Ng, 2009; Haghighi and Klein, 2010; Lee et al., 2011).

Example

We begin with an example motivating our use of entity-level features.

Models

We will first present our BASIC model (Section 3.1) and describe the features it incorporates (Section 3.2), then explain how to extend it to use transitive features (Sections 3.3 and 3.4).

Learning

Now that we have defined our model, we must decide how to train its weights w. The first issue to address is one of the supervision provided.

Inference

Inference in the BASIC model is straightforward.

Related Work

Our BASIC model is a mention-ranking approach resembling models used by Denis and Baldridge (2008) and Rahman and Ng (2009), though it is trained using a novel parameterized loss function.

Experiments

We use the datasets, experimental setup, and scoring program from the CoNLL 2011 shared task (Pradhan et al., 2011), based on the OntoNotes corpus (Hovy et al., 2006).

Conclusion

In this work, we presented a novel coreference architecture that can both take advantage of standard pairwise features as well as use transitivity to enforce coherence of decentralized entity-level properties within coreference clusters.

Topics

coreference

Appears in 35 sentences as: coreference (35) coreferent (3)
In Decentralized Entity-Level Modeling for Coreference Resolution
  1. Efficiently incorporating entity-level information is a challenge for coreference resolution systems due to the difficulty of exact inference over partitions.
    Page 1, “Abstract”
  2. We describe an end-to-end discriminative probabilistic model for coreference that, along with standard pairwise features, enforces structural agreement constraints between specified properties of coreferent mentions.
    Page 1, “Abstract”
  3. The inclusion of entity-level features has been a driving force behind the development of many coreference resolution systems (Luo et al., 2004; Rahman and Ng, 2009; Haghighi and Klein, 2010; Lee et al., 2011).
    Page 1, “Introduction”
  4. However, such systems may be locked into bad coreference decisions and are difficult to directly optimize for standard evaluation metrics.
    Page 1, “Introduction”
  5. structural agreement factors softly drive properties of coreferent mentions to agree with one another.
    Page 1, “Introduction”
  6. This is a key feature of our model: mentions manage their partial membership in various coreference chains, so that information about entity-level properties is decentralized and propagated across individual mentions, and we never need to explicitly instantiate entities.
    Page 1, “Introduction”
  7. One way is to exploit the correct coreference decision we have already made, they A referring to people, since people are not as likely to have a price as art items are.
    Page 2, “Example”
  8. Because even these six mentions have hundreds of potential partitions into coreference chains, we cannot search over partitions exhaustively, and therefore we must design our model to be able to use this information while still admitting an efficient inference scheme.
    Page 2, “Example”
  9. ,i—1,<neW>}; this variable specifies mention i’s selected antecedent or indicates that it begins a new coreference chain.
    Page 2, “Models”
  10. Note that a set of coreference chains 0 (the final desired output) can be uniquely determined from a, but a is not uniquely determined by C.
    Page 2, “Models”
  11. Figure 1: Our BASIC coreference model.
    Page 2, “Models”

See all papers in Proc. ACL 2013 that mention coreference.

See all papers in Proc. ACL that mention coreference.

Back to top.

CoNLL

Appears in 9 sentences as: CoNLL (9)
In Decentralized Entity-Level Modeling for Coreference Resolution
  1. We evaluate our system on the dataset from the CoNLL 2011 shared task using three different types of properties: synthetic oracle properties, entity phi features (number, gender, animacy, and NER type), and properties derived from unsupervised clusters targeting semantic type information.
    Page 1, “Introduction”
  2. Our final system is competitive with the winner of the CoNLL 2011 shared task (Lee et al., 2011).
    Page 1, “Introduction”
  3. We use the datasets, experimental setup, and scoring program from the CoNLL 2011 shared task (Pradhan et al., 2011), based on the OntoNotes corpus (Hovy et al., 2006).
    Page 6, “Experiments”
  4. 5 Unfortunately, their publicly-available system is closed-source and performs poorly on the CoNLL shared task dataset, so direct comparison is difficult.
    Page 6, “Experiments”
  5. Table l: CoNLL metric scores for our four different systems incorporating noisy oracle data.
    Page 7, “Experiments”
  6. Table 2: CoNLL metric scores for our systems incorporating phi features.
    Page 8, “Experiments”
  7. Table 3: CoNLL metric scores for our systems incorporating clustering features.
    Page 8, “Experiments”
  8. We also show in in Table 5 the results of our system on the CoNLL test set, and see that it performs comparably to the Stanford coreference system (Lee et al., 2011).
    Page 9, “Experiments”
  9. Our transitive system is more effective at using properties than a pairwise system and a previous entity-level system, and it achieves performance comparable to that of the Stanford coreference resolution system, the winner of the CoNLL 2011 shared task.
    Page 9, “Conclusion”

See all papers in Proc. ACL 2013 that mention CoNLL.

See all papers in Proc. ACL that mention CoNLL.

Back to top.

loss function

Appears in 5 sentences as: loss function (5)
In Decentralized Entity-Level Modeling for Coreference Resolution
  1. This optimizes for the 0-1 loss; however, we are much more interested in optimizing with respect to a coreference-specific loss function .
    Page 5, “Learning”
  2. We modify Equation 1 to use a new probability distribution P’ instead of P, where P’(a|xi) oc P(a|mi)exp(l(a,C)) and l(a, C) is a loss function .
    Page 5, “Learning”
  3. In order to perform inference efficiently, l(a, 0) must decompose linearly across mentions: l(a, C) = 2:121 l(az-, C Commonly-used coreference metrics such as MUC (Vilain et al., 1995) and B3 (Bagga and Baldwin, 1998) do not have this property, so we instead make use of a parameterized loss function that does and fit the parameters to give good performance.
    Page 5, “Learning”
  4. Our BASIC model is a mention-ranking approach resembling models used by Denis and Baldridge (2008) and Rahman and Ng (2009), though it is trained using a novel parameterized loss function .
    Page 6, “Related Work”
  5. However, we found no simple way to change the relative performance characteristics of our various systems; notably, modifying the parameters of the loss function mentioned in Section 4 or changing it entirely did not trade off these three metrics but merely increased or decreased them in lockstep.
    Page 9, “Experiments”

See all papers in Proc. ACL 2013 that mention loss function.

See all papers in Proc. ACL that mention loss function.

Back to top.

shared task

Appears in 5 sentences as: shared task (5)
In Decentralized Entity-Level Modeling for Coreference Resolution
  1. We evaluate our system on the dataset from the CoNLL 2011 shared task using three different types of properties: synthetic oracle properties, entity phi features (number, gender, animacy, and NER type), and properties derived from unsupervised clusters targeting semantic type information.
    Page 1, “Introduction”
  2. Our final system is competitive with the winner of the CoNLL 2011 shared task (Lee et al., 2011).
    Page 1, “Introduction”
  3. We use the datasets, experimental setup, and scoring program from the CoNLL 2011 shared task (Pradhan et al., 2011), based on the OntoNotes corpus (Hovy et al., 2006).
    Page 6, “Experiments”
  4. 5 Unfortunately, their publicly-available system is closed-source and performs poorly on the CoNLL shared task dataset, so direct comparison is difficult.
    Page 6, “Experiments”
  5. Our transitive system is more effective at using properties than a pairwise system and a previous entity-level system, and it achieves performance comparable to that of the Stanford coreference resolution system, the winner of the CoNLL 2011 shared task .
    Page 9, “Conclusion”

See all papers in Proc. ACL 2013 that mention shared task.

See all papers in Proc. ACL that mention shared task.

Back to top.

clusterings

Appears in 4 sentences as: clusterings (4)
In Decentralized Entity-Level Modeling for Coreference Resolution
  1. Their system could be extended to handle property information like we do, but our system has many other advantages, such as freedom from a pre-specified list of entity types, the ability to use multiple input clusterings , and discriminative projection of clusters.
    Page 6, “Related Work”
  2. Finally, we consider mention properties derived from unsupervised clusterings ; these properties are designed to target semantic properties of nominals that should behave more like the oracle features than the phi features do.
    Page 8, “Experiments”
  3. We consider clusterings that take as input pairs (n, 7“) of a noun head n and a string 7“ which contains the semantic role of n (or some approximation thereof) conjoined with its governor.
    Page 8, “Experiments”
  4. We use four different clusterings in our experiments, each with twenty clusters: dependency-parse-derived NAIVEBAYES clusters, semantic-role-derived CONDITIONAL clusters, SRL-derived NAIVEBAYES clusters generating a NOVERB token when 7“ cannot be determined, and SRL-derived NAIVEBAYES clusters with all pronoun tuples discarded.
    Page 8, “Experiments”

See all papers in Proc. ACL 2013 that mention clusterings.

See all papers in Proc. ACL that mention clusterings.

Back to top.

log-linear

Appears in 4 sentences as: log-linear (4)
In Decentralized Entity-Level Modeling for Coreference Resolution
  1. We use a log-linear model that can be expressed as a factor graph.
    Page 1, “Introduction”
  2. Each unary factor A,- has a log-linear form with features examining mention 2', its selected antecedent ai, and the document context at.
    Page 2, “Models”
  3. The final log-linear model is given by the following formula:
    Page 4, “Models”
  4. We then report the corresponding chains 0(a) as the system output.3 For learning, the gradient takes the standard form of the gradient of a log-linear model, a difference of expected feature counts under the gold annotation and under no annotation.
    Page 5, “Inference”

See all papers in Proc. ACL 2013 that mention log-linear.

See all papers in Proc. ACL that mention log-linear.

Back to top.

NER

Appears in 4 sentences as: NER (4)
In Decentralized Entity-Level Modeling for Coreference Resolution
  1. We evaluate our system on the dataset from the CoNLL 2011 shared task using three different types of properties: synthetic oracle properties, entity phi features (number, gender, animacy, and NER type), and properties derived from unsupervised clusters targeting semantic type information.
    Page 1, “Introduction”
  2. Agreement features: Gender, number, animacy, and NER type of the current mention and the antecedent (separately and conjoined).
    Page 3, “Models”
  3. Each mention 2' has been augmented with a single property node pi E {1, ..., The unary B factors encode prior knowledge about the setting of each pi; these factors may be hard (I will not refer to a plural entity), soft (such as a distribution over named entity types output by an NER tagger), or practically uniform (e. g. the last name Smith does not specify a particular gender).
    Page 3, “Models”
  4. We use the standard automatic parses and NER tags for each document.
    Page 6, “Experiments”

See all papers in Proc. ACL 2013 that mention NER.

See all papers in Proc. ACL that mention NER.

Back to top.

coreference resolution

Appears in 3 sentences as: coreference resolution (3)
In Decentralized Entity-Level Modeling for Coreference Resolution
  1. Efficiently incorporating entity-level information is a challenge for coreference resolution systems due to the difficulty of exact inference over partitions.
    Page 1, “Abstract”
  2. The inclusion of entity-level features has been a driving force behind the development of many coreference resolution systems (Luo et al., 2004; Rahman and Ng, 2009; Haghighi and Klein, 2010; Lee et al., 2011).
    Page 1, “Introduction”
  3. Our transitive system is more effective at using properties than a pairwise system and a previous entity-level system, and it achieves performance comparable to that of the Stanford coreference resolution system, the winner of the CoNLL 2011 shared task.
    Page 9, “Conclusion”

See all papers in Proc. ACL 2013 that mention coreference resolution.

See all papers in Proc. ACL that mention coreference resolution.

Back to top.

entity types

Appears in 3 sentences as: entity type (1) entity types (2)
In Decentralized Entity-Level Modeling for Coreference Resolution
  1. Each mention 2' has been augmented with a single property node pi E {1, ..., The unary B factors encode prior knowledge about the setting of each pi; these factors may be hard (I will not refer to a plural entity), soft (such as a distribution over named entity types output by an NER tagger), or practically uniform (e. g. the last name Smith does not specify a particular gender).
    Page 3, “Models”
  2. Suppose that we are using named entity type as an entity-level property.
    Page 4, “Models”
  3. Their system could be extended to handle property information like we do, but our system has many other advantages, such as freedom from a pre-specified list of entity types , the ability to use multiple input clusterings, and discriminative projection of clusters.
    Page 6, “Related Work”

See all papers in Proc. ACL 2013 that mention entity types.

See all papers in Proc. ACL that mention entity types.

Back to top.

log-linear model

Appears in 3 sentences as: log-linear model (3)
In Decentralized Entity-Level Modeling for Coreference Resolution
  1. We use a log-linear model that can be expressed as a factor graph.
    Page 1, “Introduction”
  2. The final log-linear model is given by the following formula:
    Page 4, “Models”
  3. We then report the corresponding chains 0(a) as the system output.3 For learning, the gradient takes the standard form of the gradient of a log-linear model , a difference of expected feature counts under the gold annotation and under no annotation.
    Page 5, “Inference”

See all papers in Proc. ACL 2013 that mention log-linear model.

See all papers in Proc. ACL that mention log-linear model.

Back to top.

semantic roles

Appears in 3 sentences as: semantic role (1) semantic roles (2)
In Decentralized Entity-Level Modeling for Coreference Resolution
  1. This observation argues for enforcing agreement of entity-level semantic properties during inference, specifically properties relating to permitted semantic roles .
    Page 2, “Example”
  2. Throughout this section, let cc be a variable containing the words in a document along with any relevant precomputed annotation (such as parse information, semantic roles , etc.
    Page 2, “Models”
  3. We consider clusterings that take as input pairs (n, 7“) of a noun head n and a string 7“ which contains the semantic role of n (or some approximation thereof) conjoined with its governor.
    Page 8, “Experiments”

See all papers in Proc. ACL 2013 that mention semantic roles.

See all papers in Proc. ACL that mention semantic roles.

Back to top.