Abstract | To demonstrate the power and generality of this approach, we apply the method in two very different applications: named entity recognition and query classification. |
Conclusions | We demonstrated the power and generality of this approach on two very different applications: named entity recognition and query classification. |
Discussion and Related Work | The named entity problem in Section 3 and the query classification problem in Section 4 exemplify the two scenarios. |
Distributed K-Means clustering | For example, in the named entity recognition task, categorical clusters are more successful, whereas in query categorization, the topical clusters are much more beneficial. |
Introduction | They have become the workhorse in almost all subareas and components of NLP, including part-of-speech tagging, chunking, named entity recognition and parsing. |
Introduction | This method has been shown to be quite successful in named entity recognition (Miller et al. |
Introduction | We demonstrate the advantages of phrase-based clusters over word-based ones with experimental results from two distinct application domains: named entity recognition and query classification. |
Named Entity Recognition | Named entity recognition (NER) is one of the first steps in many applications of information extraction, information retrieval, question answering and other applications of NLP. |
Named Entity Recognition | Here, 5 denotes a position in the input sequence; ys is a label that indicates whether the token at position s is a named entity as well as its type; w“ |
Named Entity Recognition | The Top CoNLL 2003 systems all employed gazetteers or other types of specialized resources (e.g., lists of words that tend to co-occur with certain named entity types) in addition to part-of-speech tags. |
Conclusions | KARC partitions named entities based on their profiles constructed by an information extraction tool. |
Experiments | First, the attribute information about the person named entity includes first/middle/last names, gender, mention, etc. |
Experiments | In addition, AeroText extracts relationship information between named entities , such as Family, List, Employment, Ownership, Citizen-Resident-Religion-Ethnicity and so on, as specified in the ACE evaluation. |
Experiments | This permits inexact matching of named entities due to name |
Introduction | A named entity that represents a person, an organization or a geo-location may appear within and across documents in different forms. |
Introduction | Cross document coreference (CDC) is the task of consolidating named entities that appear in multiple documents according to their real referents. |
Methods 2.1 Document Level and Profile Based CDC | Document level CDC makes a simplifying assumption that a named entity (and its variants) in a document has one underlying real identity. |
Methods 2.1 Document Level and Profile Based CDC | The named entity President Bush is extracted from the sentence “President Bush addressed the nation from the Oval Office Monday.” |
Architecture | in sentences using a named entity tagger that labels persons, organizations and locations. |
Architecture | In the testing step, entities are again identified using the named entity tagger. |
Features | 5.3 Named entity tag features |
Features | Every feature contains, in addition to the content described above, named entity tags for the two entities. |
Features | We perform named entity tagging using the Stanford four-class named entity tagger (Finkel et al., 2005). |
Implementation | In preprocessing, consecutive words with the same named entity tag are ‘chunked’, so that Edwin/PERSON Hubble/PERSON becomes [Edwin Hubble] /PERSON. |
Previous work | For example, the DIPRE algorithm by Brin (1998) used string-based regular expressions in order to recognize relations such as author-book, while the SNOWBALL algorithm by Agichtein and Gravano (2000) learned similar regular expression patterns over words and named entity tags. |
Coreference Subtask Analysis | This section examines the role of three such subtasks — named entity recognition, anaphoricity determination, and coreference element detection — in the performance of an end-to-end coreference resolution system. |
Coreference Subtask Analysis | We employ the Stanford CRF-based Named Entity Recognizer (Finkel et al., 2004) for named entity tagging. |
Coreference Subtask Analysis | 3.3 Named Entities |
Coreference Task Definitions | However, the MUC definition excludes (l) “nested” named entities (NEs) (e.g. |
Introduction | With these in mind, Section 3 first examines three subproblems that play an important role in coreference resolution: named entity recognition, anaphoricity determination, and coreference element detection. |
Introduction | Our results suggest that the availability of accurate detectors for anaphoricity or coreference elements could substantially improve the performance of state-of-the-art resolvers, while improvements to named entity recognition likely offer little gains. |
Resolution Complexity | Proper Names: Three resolution classes cover CEs that are named entities (e.g. |
Abstract | We evaluate the effectiveness of our method in three applications: text chunking, named entity recognition, and part-of-speech tagging. |
Introduction | We evaluate the effectiveness of our method by using linear-chain conditional random fields (CRFs) and three traditional NLP tasks, namely, text chunking (shallow parsing), named entity recognition, and POS tagging. |
Log-Linear Models | The model is used for a variety of sequence labeling tasks such as POS tagging, chunking, and named entity recognition. |
Log-Linear Models | We evaluate the effectiveness our training algorithm using linear-chain CRF models and three NLP tasks: text chunking, named entity recognition, and POS tagging. |
Log-Linear Models | 4.2 Named Entity Recognition |
QG for Paraphrase Modeling | In addition to assuming that dependency parse trees for s and t are observable, we also assume each word 2121- comes with POS and named entity tags. |
QG for Paraphrase Modeling | Dependency label, POS, and named entity class The newly generated target word’s dependency label, POS, and named entity class drawn from multinomial distributions plab, ppos, and pm that condition, respectively, on the configuration and the POS and named entity class of the aligned source-tree word 3360-) (lines 9—11). |
QG for Paraphrase Modeling | We estimate the distributions over dependency labels, POS tags, and named entity classes using the transformed treebank (footnote 4). |
Experiment | In general, the NTCIR topics are general descriptive words such as “regenerative medicine”, “American economy after the 911 terrorist attacks”, and “lawsuit brought against Microsoft for monopolistic practices.” The TREC topics are more named-entity-like terms such as “CarmaX”, “Wikipedia primary source”, “Jiffy Lube”, “Starbucks”, and “Windows Vista.” We have experimentally shown that LSA is more suited to finding associations between general terms because its training documents are from a general domain.9 Our PMI measure utilizes a web search engine, which covers a variety of named entity terms. |
Term Weighting and Sentiment Analysis | The statistical approaches may suffer from data sparseness problems especially for named entity terms used in the query, and the proximal clues cannot sufficiently cover all term—query associations. |
Term Weighting and Sentiment Analysis | Mullen and Collier (2004) manually annotated named entities in their dataset (i.e. |
Introduction | The task of Named Entity (NE) translation is to translate a named entity from the source language to the target language, which plays an important role in machine translation and cross-language information retrieval (CLIR). |
Introduction | 3) Asymmetric alignment: When we extract the translation equivalent from the web pages, the traditional method should recognize the named entities in the target language sentence first, and then the extracted NEs will be aligned with the source ON. |
Introduction | However, the named entity recognition (NER) will always introduce some mistakes. |
Discussion | We once found many words, mostly named entities , were outside the lexicon. |
Discussion | Thus we managed to collect a named entity translation dictionary to enhance the original one. |
Treebank Translation and Dependency Transformation | As a bilingual lexicon is required for our task and none of existing lexicons are suitable for translating PTB, two lexicons, LDC Chinese-English Translation Lexicon Version 2.0 (LDC2002L27), and an English to Chinese lexicon in StarDith, are conflated, with some necessary manual extensions, to cover 99% words appearing in the PTB (the most part of the untranslated words are named entities .). |
An acoustics-based approach | Given two subpaths with nearly identical average similarity scores, we suggest that the longer of the two is more likely to refer to content of interest that is shared between two speech utterances, e.g., named entities . |
Experimental results | Although named entities and domain-specific terms are often highly relevant to the documents in which they are referenced, these types of words are often not included in ASR vocabularies, due to their relative global rarity. |
Introduction | This out-of-vocabulary (OOV) problem is unavoidable in the regular ASR framework, although it is more likely to happen on salient words such as named entities or domain-specific terms. |