Distributional Identification of Non-Referential Pronouns
Bergsma, Shane and Lin, Dekang and Goebel, Randy

Article Structure

Abstract

We present an automatic approach to determining whether a pronoun in text refers to a preceding noun phrase or is instead non-referential.

Introduction

The goal of coreference resolution is to determine which noun phrases in a document refer to the same real-world entity.

Related Work

The difficulty of non-referential pronouns has been acknowledged since the beginning of computational resolution of anaphora.

Methodology

3.1 Definition

Evaluation

4.1 Labelled It Data

Results

5.1 System Comparison

Conclusion

We have presented an approach to detecting non-referential pronouns in text based on the distribution of the pronoun’s context.

Topics

F-Score

Appears in 13 sentences as: F-Score (10) F-score (4)
In Distributional Identification of Non-Referential Pronouns
  1. F-Score (F) is the geometric average of precision and recall; it is the most common non-referential detection metric.
    Page 6, “Evaluation”
  2. Table 4 gives precision, recall, F-score , and accuracy on the Train/ Test split.
    Page 6, “Results”
  3. Note that while the LL system has high detection precision, it has very low recall, sharply reducing F-score .
    Page 6, “Results”
  4. The MINIPL approach sacrifices some precision for much higher recall, but again has fairly low F-score .
    Page 6, “Results”
  5. To our knowledge, our COMBO system, with an F-Score of 77.1%, achieves the highest performance of any non-referential system yet implemented.
    Page 6, “Results”
  6. Even more importantly, DISTRIB, which requires only minimal linguistic processing and no encoding of specific indicator patterns, achieves 75.8% F-Score .
    Page 6, “Results”
  7. Using no truncation (Unaltered) drops the F-Score by 4.3%, while truncating the patterns to a length of four only drops the F-Score by 1.4%, a difference which is not statistically significant.
    Page 6, “Results”
  8. Table 5: 10-fold cross validation F-Score (%).
    Page 7, “Results”
  9. Table 5 compares the 10-fold cross-validation F-score of our systems on the four data sets.
    Page 7, “Results”
  10. Furthermore, selecting just the three most useful filler type counts as features (#1,#2,#5), boosts F-Score on Europarl to 86.5%, 10% above the full COMBO system.
    Page 7, “Results”
  11. Surprisingly, the drop in F-Score was only one percent, to 74.8%.
    Page 7, “Results”

See all papers in Proc. ACL 2008 that mention F-Score.

See all papers in Proc. ACL that mention F-Score.

Back to top.

coreference

Appears in 11 sentences as: coreference (11)
In Distributional Identification of Non-Referential Pronouns
  1. The goal of coreference resolution is to determine which noun phrases in a document refer to the same real-world entity.
    Page 1, “Introduction”
  2. As part of this task, coreference resolution systems must decide which pronouns refer to preceding noun phrases (called antecedents) and which do not.
    Page 1, “Introduction”
  3. In sentence (1), it is an anaphoric pronoun referring to some previous noun phrase, like “the sauce” or “an appointment.” In sentence (2), it is part of the idiomatic expression “make it” meaning “succeed.” A coreference resolution system should find an antecedent for the first it but not the second.
    Page 1, “Introduction”
  4. First of all, research in coreference resolution has shown the benefits of modules for general noun anaphoricity determination (Ng and Cardie, 2002; Denis and Baldridge, 2007).
    Page 2, “Related Work”
  5. Bergsma and Lin (2006) determine the likelihood of coreference along the syntactic path connecting a pronoun to a possible antecedent, by looking at the distribution of the path in text.
    Page 3, “Related Work”
  6. Although coreference evaluations, such as the MUC (1997) tasks, also make this distinction, it is not necessarily used by all researchers.
    Page 3, “Methodology”
  7. Standard coreference resolution data sets annotate all noun phrases that have an antecedent noun phrase in the text.
    Page 5, “Evaluation”
  8. Notably, the first noun-phrase before the context is the word “software.” There is strong compatibility between the pronoun-parent “install” and the candidate antecedent “software.” In a full coreference resolution system, when the anaphora resolution module has a strong preference to link it to an antecedent (which it should when the pronoun is indeed referential), we can override a weak non-referential probability.
    Page 8, “Results”
  9. A consequence of this research was the creation of It-Bank, a collection of thousands of labelled examples of the pronoun it, which will benefit other coreference resolution researchers.
    Page 8, “Conclusion”
  10. Another avenue of study will look at the interaction between coreference resolution and machine translation.
    Page 8, “Conclusion”
  11. In general, jointly optimizing translation and coreference is an exciting and largely unexplored research area, now partly enabled by our portable non-referential detection methodology.
    Page 8, “Conclusion”

See all papers in Proc. ACL 2008 that mention coreference.

See all papers in Proc. ACL that mention coreference.

Back to top.

n-gram

Appears in 9 sentences as: N-Gram (1) n-gram (8)
In Distributional Identification of Non-Referential Pronouns
  1. For example, in our n-gram collection (Section 3.4), “make it in advance” and “make them in advance” occur roughly the same number of times (442 vs. 449), indicating a referential pattern.
    Page 1, “Introduction”
  2. We gather pattern fillers from a large collection of n-gram frequencies.
    Page 3, “Methodology”
  3. We do the same processing to our n-gram corpus.
    Page 4, “Methodology”
  4. Also, other pronouns in the pattern are allowed to match a corresponding pronoun in an n-gram , regardless of differences in inflection and class.
    Page 4, “Methodology”
  5. However, determining part—of-speech in a large n-gram corpus is not simple, nor would it easily extend to other languages.
    Page 4, “Methodology”
  6. It also gives the n-gram counts of pattern fillers matching the first four filler types (there were no matches of the (UNK) type, #4).
    Page 4, “Methodology”
  7. 3.4 N-Gram Data
    Page 5, “Methodology”
  8. For languages where such an extensive n-gram resource is not available, the n- gram counts could also be taken from the page-counts returned by an Internet search engine.
    Page 5, “Methodology”
  9. We smooth by adding 40 to all counts, equal to the minimum count in the n-gram data.
    Page 6, “Evaluation”

See all papers in Proc. ACL 2008 that mention n-gram.

See all papers in Proc. ACL that mention n-gram.

Back to top.

coreference resolution

Appears in 8 sentences as: coreference resolution (8)
In Distributional Identification of Non-Referential Pronouns
  1. The goal of coreference resolution is to determine which noun phrases in a document refer to the same real-world entity.
    Page 1, “Introduction”
  2. As part of this task, coreference resolution systems must decide which pronouns refer to preceding noun phrases (called antecedents) and which do not.
    Page 1, “Introduction”
  3. In sentence (1), it is an anaphoric pronoun referring to some previous noun phrase, like “the sauce” or “an appointment.” In sentence (2), it is part of the idiomatic expression “make it” meaning “succeed.” A coreference resolution system should find an antecedent for the first it but not the second.
    Page 1, “Introduction”
  4. First of all, research in coreference resolution has shown the benefits of modules for general noun anaphoricity determination (Ng and Cardie, 2002; Denis and Baldridge, 2007).
    Page 2, “Related Work”
  5. Standard coreference resolution data sets annotate all noun phrases that have an antecedent noun phrase in the text.
    Page 5, “Evaluation”
  6. Notably, the first noun-phrase before the context is the word “software.” There is strong compatibility between the pronoun-parent “install” and the candidate antecedent “software.” In a full coreference resolution system, when the anaphora resolution module has a strong preference to link it to an antecedent (which it should when the pronoun is indeed referential), we can override a weak non-referential probability.
    Page 8, “Results”
  7. A consequence of this research was the creation of It-Bank, a collection of thousands of labelled examples of the pronoun it, which will benefit other coreference resolution researchers.
    Page 8, “Conclusion”
  8. Another avenue of study will look at the interaction between coreference resolution and machine translation.
    Page 8, “Conclusion”

See all papers in Proc. ACL 2008 that mention coreference resolution.

See all papers in Proc. ACL that mention coreference resolution.

Back to top.

n-grams

Appears in 6 sentences as: n-grams (6)
In Distributional Identification of Non-Referential Pronouns
  1. The maximum size of a context pattern depends on the size of n-grams available in the data.
    Page 3, “Methodology”
  2. In our n- gram collection (Section 3.4), the lengths of the n-grams range from unigrams to 5-grams, so our maximum pattern size is five.
    Page 3, “Methodology”
  3. Shorter n-grams were not found to improve performance on development data and hence are not extracted.
    Page 3, “Methodology”
  4. We then find all n-grams matching our patterns, allowing any token to match the wildcard in place of it.
    Page 4, “Methodology”
  5. We now describe the collection of n-grams and their counts used in our implementation.
    Page 5, “Methodology”
  6. Also, only n-grams appearing more than 40 times are included.
    Page 5, “Methodology”

See all papers in Proc. ACL 2008 that mention n-grams.

See all papers in Proc. ACL that mention n-grams.

Back to top.

noun phrases

Appears in 6 sentences as: noun phrases (6)
In Distributional Identification of Non-Referential Pronouns
  1. The goal of coreference resolution is to determine which noun phrases in a document refer to the same real-world entity.
    Page 1, “Introduction”
  2. As part of this task, coreference resolution systems must decide which pronouns refer to preceding noun phrases (called antecedents) and which do not.
    Page 1, “Introduction”
  3. that do not refer to preceding noun phrases are called non-anaphoric or non-referential pronouns.
    Page 1, “Introduction”
  4. Standard coreference resolution data sets annotate all noun phrases that have an antecedent noun phrase in the text.
    Page 5, “Evaluation”
  5. Of course, full coreference-annotated data is a precious resource, with the pronoun it making up only a small portion of the marked-up noun phrases .
    Page 5, “Evaluation”
  6. We thus provide these same nine-token windows to our human subjects, and ask them to decide whether the pronouns refer to previous noun phrases or not, based on these contexts.
    Page 7, “Results”

See all papers in Proc. ACL 2008 that mention noun phrases.

See all papers in Proc. ACL that mention noun phrases.

Back to top.

maximum entropy

Appears in 5 sentences as: maximum entropy (5)
In Distributional Identification of Non-Referential Pronouns
  1. For classification, we use a maximum entropy model (Berger et al., 1996), from the logistic regression package in Weka (Witten and Frank, 2005), with all default parameter settings.
    Page 6, “Evaluation”
  2. Note that our maximum entropy classifier actually produces a probability of non-referentiality, which is thresholded at 50% to make a classification.
    Page 6, “Evaluation”
  3. When we inspect the probabilities produced by the maximum entropy classifier (Section 4.2), we see only a weak bias for the non-referential class on these examples, reflecting our classifier’s uncertainty.
    Page 8, “Results”
  4. The suitability of this kind of approach to correcting some of our system’s errors is especially obvious when we inspect the probabilities of the maximum entropy model’s output decisions on the Test-200 set.
    Page 8, “Results”
  5. Where the maximum entropy classifier makes mistakes, it does so with less confidence than when it classifies correct examples.
    Page 8, “Results”

See all papers in Proc. ACL 2008 that mention maximum entropy.

See all papers in Proc. ACL that mention maximum entropy.

Back to top.

Feature Vector

Appears in 3 sentences as: Feature Vector (1) feature vector (1) feature vectors (1)
In Distributional Identification of Non-Referential Pronouns
  1. 3.3 Feature Vector Representation
    Page 4, “Methodology”
  2. We can achieve these aims by ordering the counts in a feature vector , and using a labelled set of training examples to learn a classifier that optimally weights the counts.
    Page 4, “Methodology”
  3. We represent feature vectors exactly as described in Section 3.3.
    Page 6, “Evaluation”

See all papers in Proc. ACL 2008 that mention Feature Vector.

See all papers in Proc. ACL that mention Feature Vector.

Back to top.

machine learned

Appears in 3 sentences as: machine learned (2) machine learning (1)
In Distributional Identification of Non-Referential Pronouns
  1. Section 3 also shows how to automatically extract and collect counts for context patterns, and how to combine the information using a machine learned classifier.
    Page 2, “Introduction”
  2. In particular, note the pioneering work of Paice and Husk (1987), the inclusion of non-referential it detection in a full anaphora resolution system by Lappin and Leass (1994), and the machine learning approach of Evans (2001).
    Page 2, “Related Work”
  3. Although machine learned systems can flexibly balance the various indicators and contra-indicators of non-referentiality, a particular feature is only useful if it is relevant to an example in limited labelled training data.
    Page 2, “Related Work”

See all papers in Proc. ACL 2008 that mention machine learned.

See all papers in Proc. ACL that mention machine learned.

Back to top.

statistically significant

Appears in 3 sentences as: statistically significant (3)
In Distributional Identification of Non-Referential Pronouns
  1. The difference between COMBO and DISTRIB is not statistically significant , while both are significantly better than the rule-based approaches.8 This provides strong motivation for a “lightweight” approach to non-referential it detection — one that does not require parsing or handcrafted rules and — is easily ported to new languages and text domains.
    Page 6, “Results”
  2. Using no truncation (Unaltered) drops the F-Score by 4.3%, while truncating the patterns to a length of four only drops the F-Score by 1.4%, a difference which is not statistically significant .
    Page 6, “Results”
  3. Neither are statistically significant ; however there seems to be diminishing returns from longer context patterns.
    Page 7, “Results”

See all papers in Proc. ACL 2008 that mention statistically significant.

See all papers in Proc. ACL that mention statistically significant.

Back to top.