Abstract | Experiments on joint parsing and named entity recognition, using the OntoNotes corpus, show that our hierarchical joint model can produce substantial gains over a joint model trained on only the jointly annotated data. |
Base Models | Our hierarchical joint model is composed of three separate models, one for just named entity recognition, one for just parsing, and one for joint parsing and named entity recognition. |
Base Models | 4.1 Semi-CRF for Named Entity Recognition |
Hierarchical Joint Learning | Our experiments are on a joint parsing and named entity task, but the technique is more general and only requires that the base models (the joint model and single-task models) share some features. |
Hierarchical Joint Learning | This section covers the general technique, and we will cover the details of the parsing, named entity , and joint models that we use in Section 4. |
Hierarchical Joint Learning | Features which don’t apply to a particular model type (e. g., parse features in the named entity model) will always be zero, so their weights have no impact on that model’s likelihood function. |
Introduction | These high-level systems typically combine the outputs from many low-level systems, such as parsing, named entity recognition (NER) and coreference resolution. |
Introduction | When trained separately, these single-task models can produce outputs which are inconsistent with one another, such as named entities which do not correspond to any nodes in the parse tree (see Figure l for an example). |
Introduction | Because a named entity should correspond to a node in the parse tree, strong evidence about either aspect of the model should positively impact the other aspect |
Abstract | Name ambiguity problem has raised urgent demands for efficient, high-quality named entity disambiguation methods. |
Abstract | In recent years, the increasing availability of large-scale, rich semantic knowledge sources (such as Wikipedia and WordNet) creates new opportunities to enhance the named entity disambiguation by developing algorithms which can exploit these knowledge sources at best. |
Abstract | This paper proposes a knowledge-based method, called Structural Semantic Relatedness (SSR), which can enhance the named entity disambiguation by capturing and leveraging the structural semantic knowledge in multiple knowledge sources. |
Introduction | So there is an urgent demand for efficient, high-quality named entity disambiguation methods. |
Introduction | Currently, the common methods for named entity disambiguation include name observation clustering (Bagga and Baldwin, 1998) and entity linking with knowledge base (McNa-mee and Dang, 2009). |
Introduction | Given a set of observations 0 = {01, 02, on} of the target name to be disambiguated, a named entity disambiguation system should group them into a set of clusters C = {c1, c2, cm}, with each resulting cluster corresponding to one specific entity. |
Cognitively Grounded Cost Modeling | These time tags indicate the time it took to annotate the respective phrase for named entity mentions of the types person, location, and organization. |
Experimental Design | In our study, we applied, for the first time ever to the best of our knowledge, eye-tracking to study the cognitive processes underlying the annotation of linguistic meta-data, named entities in particular. |
Experimental Design | These articles come already annotated with three types of named entities considered important in the newspaper domain, viz. |
Experimental Design | An example consists of a text document having one single annotation phrase highlighted which then had to be semantically annotated with respect to named entity mentions. |
Introduction | We have seen large-scale efforts in syntactic and semantic annotations in the past related to P08 tagging and parsing, on the one hand, and named entities and relations (propositions), on the other hand. |
Results | In this case, words were searched in the upper context, which according to their orthographic signals might refer to a named entity but which could not completely be resolved only relying on the information given by the annotation phrase itself and its embedding sentence. |
Summary and Conclusions | In this paper, we explored the use of eye-tracking technology to investigate the behavior of human annotators during the assignment of three types of named entities — persons, organizations and locations — based on the eye-mind assumption. |
Summary and Conclusions | of complex cognitive tasks (such as gaze duration and gaze movements for named entity annotation) to explanatory models (in our case, a time-based cost model for annotation) follows a much warranted avenue in research in NLP where feature farming becomes a theory-driven, explanatory process rather than a much deplored theory-blind engineering activity (cf. |
Indicators of linguistic quality | 3.2 Reference form: Named entities |
Indicators of linguistic quality | This set of features examines whether named entities have informative descriptions in the summary. |
Indicators of linguistic quality | We focus on named entities because they appear often in summaries of news documents and are often not known to the reader beforehand. |
Results and discussion | Named ent . |
Results and discussion | The classes of features specific to named entities and noun phrase syntax are the weakest predictors. |
Results and discussion | Named ent . |
Data Analysis | Not surprisingly, the most oft-repeated named entity in the text is I , referring to the narrator. |
Extracting Conversational Networks from Literature | In a conversational network, vertices represent characters (assumed to be named entities ) and edges indicate at least one instance of dialogue interaction between two characters over the course of the novel. |
Extracting Conversational Networks from Literature | For each named entity , we generate variations on the name that we would expect to see in a coreferent. |
Extracting Conversational Networks from Literature | For each named entity, we compile a list of other named entities that may be coreferents, either because they are identical or because one is an expected variation on the other. |
BabelNet | the general term concept to denote either a concept or a named entity . |
Conclusions | Firstly, the two resources contribute different kinds of lexical knowledge, one is concerned mostly with named entities , the other with concepts. |
Conclusions | Thus, even when they overlap, the two resources provide complementary information about the same named entities or concepts. |
Introduction | These include, among others, text summarization (Nastase, 2008), Named Entity Recognition (Bunescu and Pasca, 2006), Question Answering (Harabagiu et al., 2000) and text categorization (Gabrilovich and Markovitch, 2006). |
Introduction | Second, such resources are typically lexicographic, and thus contain mainly concepts and only a few named entities . |
Introduction | The result is an “encyclopedic dictionary”, that provides concepts and named entities lexical-ized in many languages and connected with large amounts of semantic relations. |
Methodology | A Wikipedia page (henceforth, Wikipage) presents the knowledge about a specific concept (e. g. BALLOON (AIRCRAFT)) or named entity (e.g. |
Annotation Proposal and Pilot Study | Phenomenon Occurrence Agreement Named Entity 91.67% 0.856 locative 17.62% 0.623 Numerical Quantity 14.05% 0.905 temporal 5 .48% 0.960 nominalization 4.05 % 0.245 implicit relation 1.90% 0.651 |
Annotation Proposal and Pilot Study | Phenomenon Occurrence Agreement missing argument 16.19% 0.763 missing relation 14.76% 0.708 excluding argument 10.48% 0.952 Named Entity mismatch 9.29% 0.921 excluding relation 5.00% 0.870 disconnected relation 4.52% 0.580 missing modifier 3.81% 0.465 disconnected argument 3.33% 0.764 Numeric Quant. |
Introduction | Tasks such as Named Entity and coreference resolution, syntactic and shallow semantic parsing, and information and relation extraction have been identified as worthwhile tasks and pursued by numerous researchers. |
NLP Insights from Textual Entailment | ported by their designers were the use of structured representations of shallow semantic content (such as augmented dependency parse trees and semantic role labels); the application of NLP resources such as Named Entity recognizers, syntactic and dependency parsers, and coreference resolvers; and the use of special-purpose ad-hoc modules designed to address specific entailment phenomena the researchers had identified, such as the need for numeric reasoning. |
Pilot RTE System Analysis | (b) All top-5 systems make consistent errors in cases where identifying a mismatch in named entities (NE) or numerical quantities (NQ) is important to make the right decision. |
Pilot RTE System Analysis | However, we believe that if a system could recognize key negation phenomena such as Named Entity mismatch, presence of Excluding arguments, etc. |
Introduction | Semantic class tagging has been the subject of previous research, primarily under the guises of named entity recognition (NER) and mention detection. |
Introduction | Named entity recognizers perform semantic tagging on proper name noun phrases, and |
Related Work | Semantic class tagging is most closely related to named entity recognition (NER), mention detection, and semantic lexicon induction. |
Related Work | NER systems (e.g., (Bikel et al., 1997; Collins and Singer, 1999; Cucerzan and Yarowsky, 1999; Fleischman and Hovy, 2002) identify proper named entities , such as people, organizations, and locations. |
Related Work | NER systems, however, do not identify nominal NP instances (e.g., “a software manufacturer” or “the beach”), or handle semantic classes that are not associated with proper named entities (e.g., symptoms).1 ACE |
Introduction | More recently, the OntoNotes project (Pradhan et al., 2007) released a one million word English corpus of newswire, broadcast news, and broadcast conversation that is annotated for Penn Treebank syntax, PropBank predicate argument structures, coreference, and named entities . |
MASC Annotations | words Token Validated 1 18 222472 Sentence Validated 1 18 222472 POS/lemma Validated 1 18 222472 Noun chunks Validated 1 18 222472 Verb chunks Validated 1 18 222472 Named entities Validated 1 18 222472 FrameNet frames Manual 21 17829 HSPG Validated 40* 30106 Discourse Manual 40* 30106 Penn Treebank Validated 97 873 83 PropB ank Validated 92 50165 Opinion Manual 97 47583 TimeB ank Validated 34 5434 Committed belief Manual 13 4614 Event Manual 13 4614 Coreference Manual 2 1 877 |
MASC Annotations | To derive maximal benefit from the semantic information provided by these resources, the entire corpus is also annotated and manually validated for shallow parses (noun and verb chunks) and named entities (person, location, organization, date and time). |
MASC Annotations | Automatically-produced annotations for sentence, token, part of speech, shallow parses (noun and verb chunks), and named entities (person, location, organization, date and time) are hand-validated by a team of students. |
Experiments | We used CRFs-based Japanese dependency parser (Imamura et al., 2007) and named entity recognizer (Suzuki et al., 2006) for sentiment extraction and constructing feature vectors for readability score, respectively. |
Optimizing Sentence Sequence | To this, we add named entity tags (e.g. |
Optimizing Sentence Sequence | We observe that the first sentence of a review of a restaurant frequently contains named entities indicating location. |
Optimizing Sentence Sequence | Note that our method learns w from texts automatically annotated by a POS tagger and a named entity tagger. |
Conclusion | The major areas of CCGbank’s grammar left to be improved are the analysis of comparatives, and the analysis of named entities . |
Conclusion | Named entities are also difficult to analyse, as many entity types obey their own specific grammars. |
Conclusion | This is another example of a phenomenon that could be analysed much better in CCGbank using an existing resource, the BBN named entity corpus. |