Bootstrapping coreference resolution using word associations
Kobdani, Hamidreza and Schuetze, Hinrich and Schiehlen, Michael and Kamp, Hans

Article Structure

Abstract

In this paper, we present an unsupervised framework that bootstraps a complete coreference resolution (CoRe) system from word associations mined from a large unlabeled corpus.

Introduction

Coreference resolution (CoRe) is the process of finding markables (noun phrases) referring to the same real world entity or concept.

Related Work

There are three main approaches to CoRe: supervised, semi-supervised (or weakly supervised) and unsupervised.

System Architecture

Figure 1 illustrates the system architecture of our unsupervised self-trained CoRe system (UNSEL).

Experimental Setup

4.1 Data Sets

Results and Discussion

Table 1 compares our unsupervised self-trained model UNSEL and unsupervised model A-INF to

Conclusion

In this paper, we have demonstrated the utility of association information for coreference resolution.

Topics

coreference

Appears in 31 sentences as: COreference (1) Coreference (2) coreference (18) coreferent (11)
In Bootstrapping coreference resolution using word associations
  1. In this paper, we present an unsupervised framework that bootstraps a complete coreference resolution (CoRe) system from word associations mined from a large unlabeled corpus.
    Page 1, “Abstract”
  2. We show that word associations are useful for CoRe — e. g., the strong association between Obama and President is an indicator of likely coreference .
    Page 1, “Abstract”
  3. Coreference resolution (CoRe) is the process of finding markables (noun phrases) referring to the same real world entity or concept.
    Page 1, “Introduction”
  4. Until recently, most approaches tried to solve the problem by binary classification, where the probability of a pair of markables being coreferent is estimated from labeled data.
    Page 1, “Introduction”
  5. Alternatively, a model that determines whether a markable is coreferent with a preceding cluster can be used.
    Page 1, “Introduction”
  6. It is a likely coreference pair, but this information is not accessible to standard CoRe systems because they only use string-based features (often called lexical features), named entity features and semantic word class features (e.g., from WordNet) that do not distinguish,
    Page 1, “Introduction”
  7. Our experiments are conducted using the MCORE system (“Modular COreference REsolution”).1 MCORE can operate in three different settings: unsupervised (subsystem A-INF), supervised (subsystem SUCRE (Kobdani and Schutze, 2010)), and self-trained (subsystem UNSEL).
    Page 2, “Introduction”
  8. SUCRE (“SUpervised Coreference REsolution”) is trained on a labeled corpus (manually or automatically labeled) similar to standard CoRe systems.
    Page 2, “Introduction”
  9. We demonstrate that word association information can be used to develop an unsupervised model for shallow coreference resolution (subsystem A-INF).
    Page 2, “Introduction”
  10. We use the term semi-supervised for approaches that use some amount of human-labeled coreference pairs.
    Page 2, “Related Work”
  11. (2002) used co-training for coreference resolution, a semi-supervised method.
    Page 2, “Related Work”

See all papers in Proc. ACL 2011 that mention coreference.

See all papers in Proc. ACL that mention coreference.

Back to top.

coreference resolution

Appears in 8 sentences as: Coreference resolution (1) coreference resolution (5) COreference REsolution” (1) Coreference REsolution” (1)
In Bootstrapping coreference resolution using word associations
  1. In this paper, we present an unsupervised framework that bootstraps a complete coreference resolution (CoRe) system from word associations mined from a large unlabeled corpus.
    Page 1, “Abstract”
  2. Coreference resolution (CoRe) is the process of finding markables (noun phrases) referring to the same real world entity or concept.
    Page 1, “Introduction”
  3. Our experiments are conducted using the MCORE system (“Modular COreference REsolution” ).1 MCORE can operate in three different settings: unsupervised (subsystem A-INF), supervised (subsystem SUCRE (Kobdani and Schutze, 2010)), and self-trained (subsystem UNSEL).
    Page 2, “Introduction”
  4. SUCRE (“SUpervised Coreference REsolution” ) is trained on a labeled corpus (manually or automatically labeled) similar to standard CoRe systems.
    Page 2, “Introduction”
  5. We demonstrate that word association information can be used to develop an unsupervised model for shallow coreference resolution (subsystem A-INF).
    Page 2, “Introduction”
  6. (2002) used co-training for coreference resolution , a semi-supervised method.
    Page 2, “Related Work”
  7. We take a self-training approach to coreference resolution : We first label the corpus using the unsupervised model A-INF and then train the supervised model SUCRE on this automatically labeled training corpus.
    Page 3, “System Architecture”
  8. In this paper, we have demonstrated the utility of association information for coreference resolution .
    Page 9, “Conclusion”

See all papers in Proc. ACL 2011 that mention coreference resolution.

See all papers in Proc. ACL that mention coreference resolution.

Back to top.

feature space

Appears in 8 sentences as: feature space (6) feature spaces (4)
In Bootstrapping coreference resolution using word associations
  1. Typical systems use a rich feature space based on lexical, syntactic and semantic knowledge.
    Page 1, “Introduction”
  2. We view association information as an example of a shallow feature space which contrasts with the rich feature space that is generally used in CoRe.
    Page 2, “Introduction”
  3. The feature spaces are the shallow and rich feature spaces .
    Page 2, “Introduction”
  4. We show that the performance of UNSEL is better than the performance of other unsupervised systems when it is self-trained on the automatically labeled corpus and uses the leveraging effect of a rich feature space .
    Page 2, “Introduction”
  5. Not only is it able to deal with shallow information spaces (A-INF), but it can also deliver competitive results for rich feature spaces (SUCRE and UNSEL).
    Page 2, “Introduction”
  6. These researchers show that a “deterministic” system (essentially a rule-based system) that uses a rich feature space including lexical, syntactic and semantic features can improve CoRe performance.
    Page 3, “Related Work”
  7. To summarize, the advantages of our self-training approach are: (i) We cover cases that do not occur in the unlabeled corpus (better recall effect); and (ii) we use the leveraging effect of a rich feature space including distance, person, number, gender etc.
    Page 8, “Results and Discussion”
  8. In addition, we showed that our system is a flexible and modular framework that is able to learn from data with different quality (perfect vs noisy markable detection) and domain; and is able to deliver good results for shallow information spaces and competitive results for rich feature spaces .
    Page 9, “Conclusion”

See all papers in Proc. ACL 2011 that mention feature space.

See all papers in Proc. ACL that mention feature space.

Back to top.

labeled data

Appears in 8 sentences as: Labeled Data (1) labeled data (8)
In Bootstrapping coreference resolution using word associations
  1. We show that this unsuperVised system has better CoRe performance than other learning approaches that do not use manually labeled data .
    Page 1, “Abstract”
  2. Until recently, most approaches tried to solve the problem by binary classification, where the probability of a pair of markables being coreferent is estimated from labeled data .
    Page 1, “Introduction”
  3. Self-training approaches usually include the use of some manually labeled data .
    Page 1, “Introduction”
  4. In contrast, our self-trained system is not trained on any manually labeled data and is therefore a completely unsupervised system.
    Page 1, “Introduction”
  5. Although training on automatically labeled data can be viewed as a form of supervision, we reserve the term supervised system for systems that are trained on manually labeled data .
    Page 1, “Introduction”
  6. not with approaches that make some limited use of labeled data .
    Page 3, “Related Work”
  7. Automatically Labeled Data
    Page 3, “System Architecture”
  8. Thus, this comparison of ACE-2/Ontonotes results is evidence that in a realistic scenario using association information in an unsupervised self-trained system is almost as good as a system trained on manually labeled data .
    Page 9, “Results and Discussion”

See all papers in Proc. ACL 2011 that mention labeled data.

See all papers in Proc. ACL that mention labeled data.

Back to top.

feature vector

Appears in 4 sentences as: feature vector (3) feature vectors (1)
In Bootstrapping coreference resolution using word associations
  1. After filtering, we then calculate a feature vector for each generated pair that survived filters (i)—(iv).
    Page 5, “System Architecture”
  2. Then (Einstein,he), (Hawking,he) and (Novoselov, he) will all be assigned the feature vector <l, No, Proper Noun, Personal Pronoun, Yes>.
    Page 8, “Results and Discussion”
  3. Using the same representation of pairs, suppose that for the sequence of markables Biden, Obama, President the markable pairs (Biden,President) and (0bama,President) are assigned the feature vectors <8, No, Proper Noun, Proper Noun, Yes> and <l, No, Proper Noun, Proper Noun, Yes>, respectively.
    Page 8, “Results and Discussion”
  4. with the second feature vector (distance=l) as coreferent than with the first one (distance=8) in the entire automatically labeled training set.
    Page 8, “Results and Discussion”

See all papers in Proc. ACL 2011 that mention feature vector.

See all papers in Proc. ACL that mention feature vector.

Back to top.

precision and recall

Appears in 4 sentences as: precision and recall (4)
In Bootstrapping coreference resolution using word associations
  1. Both precision and recall are improved with two exceptions: recall of B3 decreases from line 2 to 3 and from 15 to 16.
    Page 7, “Results and Discussion”
  2. In contrast to F1, there is no consistent trend for precision and recall .
    Page 8, “Results and Discussion”
  3. But this higher variability for precision and recall is to be expected since every system trades the two measures off differently.
    Page 8, “Results and Discussion”
  4. With more training data, UNSEL can correct more of its precision and recall errors.
    Page 9, “Results and Discussion”

See all papers in Proc. ACL 2011 that mention precision and recall.

See all papers in Proc. ACL that mention precision and recall.

Back to top.

semi-supervised

Appears in 3 sentences as: semi-supervised (3)
In Bootstrapping coreference resolution using word associations
  1. There are three main approaches to CoRe: supervised, semi-supervised (or weakly supervised) and unsupervised.
    Page 2, “Related Work”
  2. We use the term semi-supervised for approaches that use some amount of human-labeled coreference pairs.
    Page 2, “Related Work”
  3. (2002) used co-training for coreference resolution, a semi-supervised method.
    Page 2, “Related Work”

See all papers in Proc. ACL 2011 that mention semi-supervised.

See all papers in Proc. ACL that mention semi-supervised.

Back to top.

unlabeled data

Appears in 3 sentences as: Unlabeled Data (1) unlabeled data (2)
In Bootstrapping coreference resolution using word associations
  1. Co-training puts features into disjoint subsets when learning from labeled and unlabeled data and tries to leverage this split for better performance.
    Page 2, “Related Work”
  2. Unlabeled Data
    Page 3, “System Architecture”
  3. For an unsupervised approach, which only needs unlabeled data , there is little cost to creating large training sets.
    Page 9, “Results and Discussion”

See all papers in Proc. ACL 2011 that mention unlabeled data.

See all papers in Proc. ACL that mention unlabeled data.

Back to top.