Base Models | Because we use a tree representation, it is easy to ensure that the features used in the NER model are identical to those in the joint parsing and named entity model, because the joint model (which we will discuss in Section 4.3) is also based on a tree representation where each entity corresponds to a single node in the tree. |
Base Models | The joint model shares the NER and parse features with the respective single-task models. |
Experiments and Discussion | We did not run this experiment on the CNN portion of the data, because the CNN data was already being used as the extra NER data. |
Experiments and Discussion | Looking at the smaller corpora (NBC and MNB) we see the largest gains, with both parse and NER performance improving by about 8% Fl. |
Experiments and Discussion | Our one negative result is in the PRI portion: parsing improves slightly, but NER performance decreases by almost 2%. |
Hierarchical Joint Learning | PARSE JOINT NER |
Hierarchical Joint Learning | There are separate base models for just parsing, just NER, and joint parsing and NER . |
Introduction | These high-level systems typically combine the outputs from many low-level systems, such as parsing, named entity recognition ( NER ) and coreference resolution. |
Abstract | We evaluate Brown clusters, Collobert and Weston (2008) embeddings, and HLBL (Mnih & Hinton, 2009) embeddings of words on both NER and chunking. |
Clustering-based word representations | Brown clusters have been used successfully in a variety of NLP applications: NER (Miller et al., 2004; Liang, 2005; Ratinov & Roth, 2009), PCFG parsing (Candito & Crabbe, 2009), dependency parsing (Koo et al., 2008; Suzuki et al., 2009), and semantic dependency parsing (Zhao et al., 2009). |
Distributional representations | It is not well-understood what settings are appropriate to induce distributional word representations for structured prediction tasks (like parsing and MT) and sequence labeling tasks (like chunking and NER ). |
Introduction | In this work, we compare different techniques for inducing word representations, evaluating them on the tasks of named entity recognition ( NER ) and chunking. |
Supervised evaluation tasks | Lin and Wu (2009) finds that the representations that are good for NER are poor for search query classification, and Vice-versa. |
Supervised evaluation tasks | We apply clustering and distributed representations to NER and chunking, which allows us to compare our semi-supervised models to those of Ando and Zhang (2005) and Suzuki and Isozaki (2008). |
Supervised evaluation tasks | NER is typically treated as a sequence prediction problem. |
Unlabled Data | For this reason, NER results that use RCVl word representations are a form of transductive learning. |
Unlabled Data | (b) NER results. |
Introduction | Semantic class tagging has been the subject of previous research, primarily under the guises of named entity recognition ( NER ) and mention detection. |
Related Work | Semantic class tagging is most closely related to named entity recognition ( NER ), mention detection, and semantic lexicon induction. |
Related Work | NER systems (e.g., (Bikel et al., 1997; Collins and Singer, 1999; Cucerzan and Yarowsky, 1999; Fleischman and Hovy, 2002) identify proper named entities, such as people, organizations, and locations. |
Related Work | Several bootstrapping methods for NER have been previously developed (e.g., (Collins and Singer, 1999; Niu et al., 2003)). |
Experimental Evaluation | 0 NER : named-entity recognition for Person, Organization, Location, Address, PhoneNumber, EmaiIAddress, URL and DateTime. |
Experimental Evaluation | We chose NER primarily because named-entity recognition is a well-studied problem and standard datasets are available for evaluation. |
Experimental Evaluation | 3To the best of our knowledge, ANNIE (Cunningham et al., 2002) is the only publicly available NER library implemented in a grammar-based system (JAPE in GATE). |