Abstract | Coreferencing entities across documents in a large corpus enables advanced document understanding tasks such as question answering. |
Abstract | This paper presents a novel cross document coreference approach that leverages the profiles of entities which are constructed by using information extraction tools and reconciled by using a within-document coreference module. |
Abstract | We compare the kernelized clustering method with a popular fuzzy relation clustering algorithm (FRC) and show 5% improvement in coreference performance. |
Introduction | Cross document coreference (CDC) is the task of consolidating named entities that appear in multiple documents according to their real referents. |
Introduction | document coreference (WDC), which limits the scope of disambiguation to within the boundary of a document. |
Introduction | Cross document coreference , on the other hand, is a more challenging task because these linguistics cues and sentence structures no longer apply, given the wide variety of context and styles in different documents. |
Methods 2.1 Document Level and Profile Based CDC | We make distinctions between document level and profile based cross document coreference . |
Methods 2.1 Document Level and Profile Based CDC | persons in this work), a within-document coreference (WDC) module then links the entities deemed as referring to the same underlying identity into a WDC chain. |
Methods 2.1 Document Level and Profile Based CDC | Therefore, the chained entities placed in a name cluster are deemed as coreferent . |
Introduction | As is common for many natural language processing problems, the state-of-the-art in noun phrase (NP) coreference resolution is typically quantified based on system performance on manually annotated text corpora. |
Introduction | MUC-6 (1995), ACE NIST (2004)) and their use in many formal evaluations, as a field we can make surprisingly few conclusive statements about the state-of-the-art in NP coreference resolution. |
Introduction | In particular, it remains difiāicult to assess the effectiveness of diflerent coreference resolution approaches, even in relative terms. |
Background | The Chambers and Jurafsky (2008) model learns chains completely unsupervised, (albeit after parsing and resolving coreference in the text) by counting pairs of verbs that share corefer-ring arguments within documents and computing the pointwise mutual information (PMI) between these verb-argument pairs. |
Background | Even more telling is that these arguments are jointly shared (the same or coreferent ) across all three events. |
Evaluation: Cloze | We use the OpenNLP1 coreference engine to resolve entity mentions. |
Narrative Schemas | As mentioned above, narrative chains are learned by parsing the text, resolving coreference , and extracting chains of events that share participants. |
Narrative Schemas | In our new model, argument types are learned simultaneously with narrative chains by finding salient words that represent coreferential arguments. |
Narrative Schemas | We record counts of arguments that are observed with each pair of event slots, build the referential set for each word from its coreference chain, and then represent each observed argument by the most frequent head word in its referential set (ignoring pronouns and mapping entity mentions with person pronouns to a constant PERSON identifier). |
Sample Narrative Schemas | We parse the text into dependency graphs and resolve coreferences . |