Abstract | Name ambiguity problem has raised urgent demands for efficient, high-quality named entity disambiguation methods. |
Abstract | In recent years, the increasing availability of large-scale, rich semantic knowledge sources (such as Wikipedia and WordNet) creates new opportunities to enhance the named entity disambiguation by developing algorithms which can exploit these knowledge sources at best. |
Abstract | This paper proposes a knowledge-based method, called Structural Semantic Relatedness (SSR), which can enhance the named entity disambiguation by capturing and leveraging the structural semantic knowledge in multiple knowledge sources. |
Introduction | So there is an urgent demand for efficient, high-quality named entity disambiguation methods. |
Introduction | Currently, the common methods for named entity disambiguation include name observation clustering (Bagga and Baldwin, 1998) and entity linking with knowledge base (McNa-mee and Dang, 2009). |
Introduction | Given a set of observations 0 = {01, 02, on} of the target name to be disambiguated, a named entity disambiguation system should group them into a set of clusters C = {c1, c2, cm}, with each resulting cluster corresponding to one specific entity. |
Abstract | Experiments on joint parsing and named entity recognition, using the OntoNotes corpus, show that our hierarchical joint model can produce substantial gains over a joint model trained on only the jointly annotated data. |
Base Models | Our hierarchical joint model is composed of three separate models, one for just named entity recognition, one for just parsing, and one for joint parsing and named entity recognition. |
Base Models | 4.1 Semi-CRF for Named Entity Recognition |
Hierarchical Joint Learning | Our experiments are on a joint parsing and named entity task, but the technique is more general and only requires that the base models (the joint model and single-task models) share some features. |
Hierarchical Joint Learning | This section covers the general technique, and we will cover the details of the parsing, named entity , and joint models that we use in Section 4. |
Hierarchical Joint Learning | Features which don’t apply to a particular model type (e. g., parse features in the named entity model) will always be zero, so their weights have no impact on that model’s likelihood function. |
Introduction | These high-level systems typically combine the outputs from many low-level systems, such as parsing, named entity recognition (NER) and coreference resolution. |
Introduction | When trained separately, these single-task models can produce outputs which are inconsistent with one another, such as named entities which do not correspond to any nodes in the parse tree (see Figure l for an example). |
Introduction | Because a named entity should correspond to a node in the parse tree, strong evidence about either aspect of the model should positively impact the other aspect |
Cognitively Grounded Cost Modeling | These time tags indicate the time it took to annotate the respective phrase for named entity mentions of the types person, location, and organization. |
Experimental Design | In our study, we applied, for the first time ever to the best of our knowledge, eye-tracking to study the cognitive processes underlying the annotation of linguistic meta-data, named entities in particular. |
Experimental Design | These articles come already annotated with three types of named entities considered important in the newspaper domain, viz. |
Experimental Design | An example consists of a text document having one single annotation phrase highlighted which then had to be semantically annotated with respect to named entity mentions. |
Introduction | We have seen large-scale efforts in syntactic and semantic annotations in the past related to P08 tagging and parsing, on the one hand, and named entities and relations (propositions), on the other hand. |
Results | In this case, words were searched in the upper context, which according to their orthographic signals might refer to a named entity but which could not completely be resolved only relying on the information given by the annotation phrase itself and its embedding sentence. |
Summary and Conclusions | In this paper, we explored the use of eye-tracking technology to investigate the behavior of human annotators during the assignment of three types of named entities — persons, organizations and locations — based on the eye-mind assumption. |
Summary and Conclusions | of complex cognitive tasks (such as gaze duration and gaze movements for named entity annotation) to explanatory models (in our case, a time-based cost model for annotation) follows a much warranted avenue in research in NLP where feature farming becomes a theory-driven, explanatory process rather than a much deplored theory-blind engineering activity (cf. |
Introduction | (2012) presented a system called TwiCal which extracts an open-domain calendar of significant events represented by a 4-tuple set including a named entity , event phrase, calendar date, and event type from Twitter. |
Introduction | Here, event elements include named entities such as person, company, organization, date/time, location, and the relations among them. |
Methodology | Events extracted in our proposed framework are represented as a 4-tuple (y, d, l, k), where 3/ stands for a non-location named entity , d for a date, I for a location, and k for an event-related keyword. |
Methodology | Tweets are preprocessed by time expression recognition, named entity recognition, POS tagging and stemming. |
Methodology | Named Entity Recognition. |
Abstract | The challenges of Named Entities Recognition (NER) for tweets lie in the insufficient information in a tweet and the unavailability of training data. |
Experiments | We use the Twigg SDK 7 to crawl all tweets from April 20th 2010 to April 25th 2010, then drop non-English tweets and get about 11,371,389, from which 15,800 tweets are randomly sampled, and are then labeled by two independent annotators, so that the beginning and the end of each named entity are marked with <TYPE> and </TYPE>, respectively. |
Introduction | Named Entities Recognition (NER) is generally understood as the task of identifying mentions of rigid designators from text belonging to named-entity types such as persons, organizations and locations (Nadeau and Sekine, 2007). |
Related Work | (2010) use Amazons Mechanical Turk service 2 and CrowdFlower 3 to annotate named entities in tweets and train a CRF model to evaluate the effectiveness of human labeling. |
Related Work | In contrast, our work aims to build a system that can automatically identify named entities in tweets. |
Task Definition | Twitter users are interested in named entities , such |
Task Definition | Figure 1: Portion of different types of named entities in tweets. |
Task Definition | as person names, organization names and product names, as evidenced by the abundant named entities in tweets. |
Distant Supervision | As this classifier uses training data that is biased towards a specialized case (sentences containing the named entity types creators and characters), it does not generalize well to other S-relevance problems and thus yields lower performance on the full dataset. |
Distant Supervision | First, the classifier only sees a subset of examples that contain named entities , making generalization to other types of expressions difficult. |
Distant Supervision | Conditions 4-8 train supervised classifiers based on the labels from DSlabels+MinCut: (4) MaXEnt with named entities (NE); (5) MaXEnt with NE and semantic (SEM) features; (6) CRF with NE; (7) MaXEnt with NE and sequential (SQ) features; (8) MaXEnt with NE, SQ, and SEM. |
Features | 5.2 Named Entities |
Features | As standard named entity recognition (NER) systems do not capture categories that are relevant to the movie domain, we opt for a lexicon-based approach similar to (Zhuang et al., 2006). |
Features | If a capitalized word occurs, we check whether it is part of an already recognized named entity . |
Related Work | We follow their approach by using IMDb to define named entity features. |
Sentiment Relevance | An error analysis for the classifier trained on P&L shows that many sentences misclassified as S-relevant (fpSR) contain polar words; for example, Then, the situation turns M. In contrast, sentences misclassified as S-nonrelevant (fpSNR) contain named entities or plot and movie business vocabulary; for example, Ti_m Roth delivers the most impressive MM by getting the M language right. |
Comparable Question Mining | As a preprocessing step for detecting comparable relations, our extraction algorithm identifies all the named entities of interest in our corpus, keeping only questions that contain at least two entities. |
Comparable Question Mining | Answers containing two named entities , e. g. “Is #1 dating #2 ?”, our CRF tagger is trained to detect only comparable relations like “Who is prettier #1 or #2 ?”. |
Comparable Question Mining | Input: A news article Output: A sorted list of comparable questions 1: Identify all target named entities (NEs) in the article 2: Infer the distribution of LDA topics for the article 3: For each comparable relation R in the database, compute its relevance score to be the similarity between the topic distributions of R and the article 4: Rank all the relations according to their relevance score and pick the top M as relevant 5: for each relevant relation R in the order of relevance ranking do 6: Filter out all the target NEs that do not pass the single entity classifier for R 7: Generate all possible NE pairs from the those that passed the single classifier 8: Filter out all the generated NE pairs that do not pass the entity pair classifier for R 9: Pick up the top N pairs with positive classification score to be qualified for generation |
Motivation and Algorithmic Overview | Looking at the structure of comparable questions, we observed that a specific comparable relation, such as ‘better dad’ and ‘faster’, can usually be combined with named entities in several syntactic ways to construct a concrete question. |
Online Question Generation | For each relevant relation, we then generate concrete questions by picking generic templates that are applicable for this relation and instantiating them with pairs of named entities appearing in the article. |
Online Question Generation | To this end, we utilize two different broad-scale sources of information about named entities . |
Online Question Generation | The first is DBPedia3, which contains structured information on entries in Wikipedia, many of them are named entities that appear in news articles. |
Abstract | In this paper we propose a method to automatically label multilingual data with named entity tags. |
Data and task | Of these, we manually annotated 91 English-Bulgarian and 79 English-Korean sentence pairs with source and target named entities as well as word-alignment links among named entities in the two languages. |
Data and task | The named entity annotation scheme followed has the labels GPE (Geopolitical entity), PER (Person), ORG (Organization), and DATE. |
Data and task | The other is that the same information might be expressed using a named entity in one language, and using a nonentity phrase in the other language (e. g. “He is from Bulgaria” versus “He is Bulgarian”). |
Introduction | Named Entity Recognition (NER) is a frequently needed technology in NLP applications. |
Abstract | We show that using tweet specific feature (hashtag) and news specific feature ( named entities ) as well as temporal constraints, we are able to extract text-to-text correlations, and thus completes the semantic picture of a short text. |
Creating Text-to-text Relations via Twitter/News Features | 4.1 Hashtags and Named Entities |
Creating Text-to-text Relations via Twitter/News Features | Named entities are some of the most salient features in a news article. |
Creating Text-to-text Relations via Twitter/News Features | Directly applying Named Entity Recognition (NER) tools on news titles or |
Introduction | such as named entities in a document. |
Introduction | Named entities acquired from a news document, typically with high accuracy using Named Entity Recognition [NER] tools, may be particularly informative. |
Abstract | Some languages lack large knowledge bases and good discriminative features for Name Entity Recognition (NER) that can generalize to previously unseen named entities . |
Introduction | Named Entity Recognition (NER) is essential for a variety of Natural Language Processing (NLP) applications such as information extraction. |
Introduction | - Contextual features: Certain words are indicative of the existence of named entities . |
Introduction | For example, the word “said” is often preceded by a named entity of type “person” or “organization”. |
Abstract | We present a statistical model for canonicalizing named entity mentions into a table whose rows represent entities and whose columns are attributes (or parts of attributes). |
Experiments | We collected named entity mentions from two corpora: political blogs and sports news. |
Experiments | Due to the large size of the corpora, we uniformly sampled a subset of documents for each corpus and ran the Stanford NER tagger (Finkel et al., 2005), which tagged named entities mentions as person, location, and organization. |
Experiments | We used named entity of type person from the political blogs corpus, while we are interested in person and organization entities for the sports news corpus. |
Introduction | tributes through transductive learning from named entity mentions with a small number of seeds (see |
Introduction | by a named entity recognizer, along with their con- |
Introduction | As a result, the model discovers parts of names—(Mrs., Michelle, Obama)—while simultaneously performing coreference resolution for named entity mentions. |
Model | In our experiments these are obtained by running a named entity recognizer. |
Abstract | To demonstrate the power and generality of this approach, we apply the method in two very different applications: named entity recognition and query classification. |
Conclusions | We demonstrated the power and generality of this approach on two very different applications: named entity recognition and query classification. |
Discussion and Related Work | The named entity problem in Section 3 and the query classification problem in Section 4 exemplify the two scenarios. |
Distributed K-Means clustering | For example, in the named entity recognition task, categorical clusters are more successful, whereas in query categorization, the topical clusters are much more beneficial. |
Introduction | They have become the workhorse in almost all subareas and components of NLP, including part-of-speech tagging, chunking, named entity recognition and parsing. |
Introduction | This method has been shown to be quite successful in named entity recognition (Miller et al. |
Introduction | We demonstrate the advantages of phrase-based clusters over word-based ones with experimental results from two distinct application domains: named entity recognition and query classification. |
Named Entity Recognition | Named entity recognition (NER) is one of the first steps in many applications of information extraction, information retrieval, question answering and other applications of NLP. |
Named Entity Recognition | Here, 5 denotes a position in the input sequence; ys is a label that indicates whether the token at position s is a named entity as well as its type; w“ |
Named Entity Recognition | The Top CoNLL 2003 systems all employed gazetteers or other types of specialized resources (e.g., lists of words that tend to co-occur with certain named entity types) in addition to part-of-speech tags. |
Graph propagation | Figure 2: A DL from iteration 5 of Yarowsky on the named entity task. |
Graph propagation | The task of Collins and Singer (1999) is named entity classification on data from New York Times text.7 The data set was preprocessed by a statistical parser (Collins, 1997) and all noun phrases that are potential named entities were extracted from the parse tree. |
Graph propagation | The test data additionally contains some noise examples which are not in the three named entity categories. |
Problem Setting | In Figure 2, these relations are denoted by double-line boundary, including location, person, title, date and time; every tuple of these relations maps to a named entity mention.1 |
Problem Setting | Figure 3 demonstrates the correct mapping of named entity mentions to tuples, as well as tuple unification, for the example shown in Figure 1. |
Related Work | In addition to proper nouns ( named entity mentions) that are considered in this work, they also account for nominal and pronominal noun mentions. |
Related Work | Interestingly, several researchers have attempted to model label consistency and high-level relational constraints using state-of-the-art sequential models of named entity recognition (NER). |
Related Work | In the proposed framework we take a bottom-up approach to identifying entity mentions in text, where given a noisy set of candidate named entities , described using rich semantic and surface features, discriminative learning is applied to label these mentions. |
Seminar Extraction Task | We used a set of rules to extract candidate named entities per the types specified in Figure 2.4 The rules encode information typically used in NER, including content and contextual patterns, as well as lookups in available dictionaries (Finkel et al., 2005; Minkov et al., 2005). |
Seminar Extraction Task | recall for the named entities of type date and time is near perfect, and is estimated at 96%, 91% and 90% for location, speaker and title, respectively. |
Seminar Extraction Task | Another feature encodes the size of the most semantically detailed named entity that maps to a field; for example, the most detailed entity mention of type stime in Figure l is “3:30”, comprising of two attribute values, namely hour and minutes. |
Structured Learning | Named entity recognition. |
Structured Learning | The candidate tuples generated using this procedure are structured entities, constructed using typed named entity recognition, unification, and hierarchical assignment of field values (Figure 3). |
Abstract | Tweets represent a critical source of fresh information, in which named entities occur frequently with rich variations. |
Abstract | We study the problem of named entity normalization (N EN ) for tweets. |
Abstract | Two main challenges are the errors propagated from named entity recognition (NER) and the dearth of information in a single tweet. |
Introduction | As a result, the task of named entity recognition (NER) for tweets, which aims to identify mentions of rigid designators from tweets belonging to named-entity types such as persons, organizations and locations (2007), has attracted increasing research interest. |
Introduction | entities and then uses a distantly supervised approach based on LabeledLDA to classify named entities . |
Introduction | However, named entity normalization (NEN) for tweets, which transforms named entities mentioned in tweets to their unambiguous canonical forms, has not been well studied. |
Related Work | (2010) use Amazons Mechanical Turk service 3 and CrowdFlower 4 to annotate named entities in tweets and train a CRF model to evaluate the effectiveness of human labeling. |
Related Work | (2011) rebuild the NLP pipeline for tweets beginning with POS tagging, through chunking, to NER, which first exploits a CRF model to segment named entities and then uses a distantly supervised approach based on LabeledLDA to classify named entities . |
Related Work | Unlike this work, our work detects the boundary and type of a named entity simultaneously using sequential labeling techniques. |
Indicators of linguistic quality | 3.2 Reference form: Named entities |
Indicators of linguistic quality | This set of features examines whether named entities have informative descriptions in the summary. |
Indicators of linguistic quality | We focus on named entities because they appear often in summaries of news documents and are often not known to the reader beforehand. |
Results and discussion | Named ent . |
Results and discussion | The classes of features specific to named entities and noun phrase syntax are the weakest predictors. |
Results and discussion | Named ent . |
Background | (2013) where, in a given corpus, a combination of domain specific named entity tagging and clustering sentences (based on semantic predicates) were used to generate templates. |
Methodology | The DRS consists of semantic predicates and named entity tags. |
Methodology | In parallel, domain specific named entity tags are identified and, in conjunction with the semantic predicates, are used to create templates. |
Methodology | For example, in (2), using the templates in (le-f), the identified named entities are assigned to a clustered CuId (2ab). |
Data | Toponyms were annotated by a semiautomated process: a named entity recognizer identified toponyms, and then coordinates were assigned using simple rules and corrected by hand. |
Evaluation | when a named entity recognizer is used to identify toponyms. |
Evaluation | We primarily present results from experiments with gold toponyms but include an accuracy measure for comparability with results from experiments run on plain text with a named entity recognizer. |
Introduction | (2010) use relationships learned between people, organizations, and locations from Wikipedia to aid in toponym resolution when such named entities are present, but do not exploit any other textual context. |
Introduction | However, it is important to consider the utility of an end-to-end toponym identification and resolution system, so we also demonstrate that performance is still strong when toponyms are detected with a standard named entity recognizer. |
Results | The named entity recognizer is likely better at detecting common toponyms than rare toponyms due to the na- |
Results | We also measured the mean and median error distance for toponyms correctly identified by the named entity recognizer, and found that they tended to be 50-200km worse than for gold toponyms. |
Results | This also makes sense given the named entity recognizer’s tendency to detect common toponyms: common toponyms tend to be more ambiguous than others. |
Toponym Resolvers | To create the indirectly supervised training data for WISTR, the OpenNLP named entity recognizer detects toponyms in GEOWIKI, and candidate locations for each toponym are retrieved from GEONAMES. |
Abstract | We propose using large-scale clustering of dependency relations between verbs and multi-word nouns (MN 5) to construct a gazetteer for named entity recognition (N ER). |
Experiments | We obtained 11,892 sentences13 with 18,677 named entities . |
Experiments | We define “# e-matches” as the number of matches that also match a boundary of a named entity in the training set, and “# optimal” as the optimal number of “# e-matches” that can be achieved when we know the |
Experiments | These gazetteers cover 40 - 50% of the named entities , and the cluster gazetteers have relatively wider coverage than the Wikipedia gazetteer has. |
Gazetteer Induction 2.1 Induction by MN Clustering | Figure 2: Clean MN clusters with named entity entries (Left: car brand names. |
Introduction | Gazetteers, or entity dictionaries, are important for performing named entity recognition (NER) accurately. |
Introduction | from a multi-word noun (MN)1 to named entity categories such as “Tokyo Stock Exchange —> {ORGANIZATION}”.2 However, since the correspondence between the labels and the NE categories can be learned by tagging models, a gazetteer will be useful as long as it returns consistent labels even if those returned are not the NE categories. |
Introduction | Therefore, performing the clustering with a vocabulary that is large enough to cover the many named entities required to improve the accuracy of NER is difficult. |
Related Work and Discussion | To cover most of the named entities in the data, we need much larger gazetteers. |
Discussion | Improved named entity translation accuracy as measured by the NEWA metric in general, and a reduction in dropped names in particular is clearly valuable to the human reader of machine translated documents as well as for systems using machine translation for further information processing. |
End-to-End results | Table 2: Name translation accuracy with respect to BBN and re-annotated Gold Standard on 1730 named entities in 637 sentences. |
Evaluation | General MT metrics such as BLEU, TER, METEOR are not suitable for evaluating named entity translation and transliteration, because they are not focused on named entities (NEs). |
Evaluation | The general idea of the Named Entity Weak Accuracy (NEWA) metric is to |
Evaluation | BBN kindly provided us with an annotated Arabic text corpus, in which named entities were marked up with their type (e. g. GPE for Geopolitical Entity) and one or more English translations. |
Introduction | 0 Not all named entities should be transliterated. |
Introduction | Many named entities require a mix of transliteration and translation. |
Introduction | We ask: what percentage of source-language named entities are translated correctly? |
Learning what to transliterate | As already mentioned in the introduction, named entity (NE) identification followed by MT is a bad idea. |
Abstract | First, named entities are linked to candidate Wikipedia pages by a generic annotation engine. |
Abstract | Then, the algorithm re—ranks candidate links according to mutual relations between all the named entities found in the document. |
Experiments and Results | In this task, mentions of entities found in a document collection must be linked to entities in a reference KB, or to new named entities discovered in the collection. |
Experiments and Results | We observe that the complete algorithm (co-references, named entity labels and MDP) provides the best results on PER NE links. |
Introduction | The Entity Linking (EL) task consists in linking name mentions of named entities (NEs) found in a document to their corresponding entities in a reference Knowledge Base (KB). |
Introduction | Various approaches have been proposed to solve the named entity disambiguation (NED) problem. |
Introduction | In the context of the Named Entity Recognition (NER) task, such resources can be generic and generative. |
Proposed Algorithm | 3.2 Named Entity Label Correction |
Abstract | Translated bi-texts contain complementary language cues, and previous work on Named Entity Recognition (NER) has demonstrated improvements in performance over monolingual taggers by promoting agreement of tagging decisions between the two languages. |
Experimental Setup | We train the two CRF models on all portions of the OntoNotes corpus that are annotated with named entity tags, except the parallel-aligned portion which we reserve for development and test purposes. |
Experimental Setup | Out of the 18 named entity types that are annotated in OntoNotes, which include person, location, date, money, and so on, we select the four most commonly seen named entity types for evaluation. |
Introduction | We study the problem of Named Entity Recognition (NER) in a bilingual context, where the goal is to annotate parallel bi-texts with named entity tags. |
Introduction | We can also automatically construct a named entity translation lexicon by annotating and extracting entities from bi-texts, and use it to improve MT performance (Huang and Vogel, 2002; Al-Onaizan and Knight, 2002). |
Introduction | As a result, we can find complementary cues in the two languages that help to disambiguate named entity mentions (Brown et al., 1991). |
Related Work | set of heuristic rules to expand a candidate named entity set generated by monolingual taggers, and then rank those candidates using a bilingual named entity dictionary. |
Abstract | We present a novel hierarchical prior structure for supervised transfer learning in named entity recognition, motivated by the common structure of feature spaces for this task across natural language data sets. |
Conclusions, related & future work | In this work we have introduced hierarchical feature tree priors for use in transfer learning on named entity extraction tasks. |
Introduction | Consider the task of named entity recognition (NER). |
Introduction | Having successfully trained a named entity classifier on this news data, now consider the problem of learning to classify tokens as names in email data. |
Introduction | In particular, we develop a novel prior for named entity recognition that exploits the hierarchical feature space often found in natural language domains (§l.2) and allows for the transfer of information from labeled datasets in other domains (§l.3). |
Investigation | The goal of our experiments was to see to what degree named entity recognition problems naturally conformed to hierarchical methods, and not just to achieve the highest performance possible. |
Abstract | In this paper, we describe a system by which the multilingual characteristics of Wikipedia can be utilized to annotate a large corpus of text with Named Entity Recognition (NER) tags requiring minimal human intervention and no linguistic expertise. |
Abstract | We show how the Wikipedia format can be used to identify possible named entities and discuss in detail the process by which we use the Category structure inherent to Wikipedia to determine the named entity type of a proposed entity. |
Conclusions | In conclusion, we have demonstrated that Wikipedia can be used to create a Named Entity Recognition system with performance comparable to one developed from 15-40,000 words of human-anno-tated newswire, while not requiring any linguistic expertise on the part of the user. |
Introduction | Named Entity Recognition (NER) has long been a major task of natural language processing. |
Training Data Generation | We elected to use the ACE Named Entity types PERSON, GPE (GeoPolitical Entities), ORGANIZATION, VEHICLE, WEAPON, LOCATION, FACILITY, DATE, TIME, MONEY, and PERCENT. |
Training Data Generation | Other categories can reliably be used to determine that the article does not refer to a named entity , such as “CategoryzEndangered species.” We manually derived a relatively small set of key phrases, the most important of which are shown in Table 1. |
Wikipedia 2.1 Structure | Toral and Munoz (2006) used Wikipedia to create lists of named entities . |
Wikipedia 2.1 Structure | Cucerzan (2007), by contrast to the above, used Wikipedia primarily for Named Entity Disambiguation, following the path of Bunescu and Pasca (2006). |
Conclusions | KARC partitions named entities based on their profiles constructed by an information extraction tool. |
Experiments | First, the attribute information about the person named entity includes first/middle/last names, gender, mention, etc. |
Experiments | In addition, AeroText extracts relationship information between named entities , such as Family, List, Employment, Ownership, Citizen-Resident-Religion-Ethnicity and so on, as specified in the ACE evaluation. |
Experiments | This permits inexact matching of named entities due to name |
Introduction | A named entity that represents a person, an organization or a geo-location may appear within and across documents in different forms. |
Introduction | Cross document coreference (CDC) is the task of consolidating named entities that appear in multiple documents according to their real referents. |
Methods 2.1 Document Level and Profile Based CDC | Document level CDC makes a simplifying assumption that a named entity (and its variants) in a document has one underlying real identity. |
Methods 2.1 Document Level and Profile Based CDC | The named entity President Bush is extracted from the sentence “President Bush addressed the nation from the Oval Office Monday.” |
Architecture | in sentences using a named entity tagger that labels persons, organizations and locations. |
Architecture | In the testing step, entities are again identified using the named entity tagger. |
Features | 5.3 Named entity tag features |
Features | Every feature contains, in addition to the content described above, named entity tags for the two entities. |
Features | We perform named entity tagging using the Stanford four-class named entity tagger (Finkel et al., 2005). |
Implementation | In preprocessing, consecutive words with the same named entity tag are ‘chunked’, so that Edwin/PERSON Hubble/PERSON becomes [Edwin Hubble] /PERSON. |
Previous work | For example, the DIPRE algorithm by Brin (1998) used string-based regular expressions in order to recognize relations such as author-book, while the SNOWBALL algorithm by Agichtein and Gravano (2000) learned similar regular expression patterns over words and named entity tags. |
Data Analysis | Not surprisingly, the most oft-repeated named entity in the text is I , referring to the narrator. |
Extracting Conversational Networks from Literature | In a conversational network, vertices represent characters (assumed to be named entities ) and edges indicate at least one instance of dialogue interaction between two characters over the course of the novel. |
Extracting Conversational Networks from Literature | For each named entity , we generate variations on the name that we would expect to see in a coreferent. |
Extracting Conversational Networks from Literature | For each named entity, we compile a list of other named entities that may be coreferents, either because they are identical or because one is an expected variation on the other. |
Coreference Subtask Analysis | This section examines the role of three such subtasks — named entity recognition, anaphoricity determination, and coreference element detection — in the performance of an end-to-end coreference resolution system. |
Coreference Subtask Analysis | We employ the Stanford CRF-based Named Entity Recognizer (Finkel et al., 2004) for named entity tagging. |
Coreference Subtask Analysis | 3.3 Named Entities |
Coreference Task Definitions | However, the MUC definition excludes (l) “nested” named entities (NEs) (e.g. |
Introduction | With these in mind, Section 3 first examines three subproblems that play an important role in coreference resolution: named entity recognition, anaphoricity determination, and coreference element detection. |
Introduction | Our results suggest that the availability of accurate detectors for anaphoricity or coreference elements could substantially improve the performance of state-of-the-art resolvers, while improvements to named entity recognition likely offer little gains. |
Resolution Complexity | Proper Names: Three resolution classes cover CEs that are named entities (e.g. |
Abstract | We evaluate the effectiveness of our method in three applications: text chunking, named entity recognition, and part-of-speech tagging. |
Introduction | We evaluate the effectiveness of our method by using linear-chain conditional random fields (CRFs) and three traditional NLP tasks, namely, text chunking (shallow parsing), named entity recognition, and POS tagging. |
Log-Linear Models | The model is used for a variety of sequence labeling tasks such as POS tagging, chunking, and named entity recognition. |
Log-Linear Models | We evaluate the effectiveness our training algorithm using linear-chain CRF models and three NLP tasks: text chunking, named entity recognition, and POS tagging. |
Log-Linear Models | 4.2 Named Entity Recognition |
Experiments | All these systems incorporated lexical semantics features derived from WordNet and named entity features. |
Experiments | Features used in the experiments can be categorized into six types: identical word matching (I), lemma matching (L), WordNet (WN), enhanced Lexical Semantics (LS), Named Entity matching (NE) and Answer type checking (Ans). |
Experiments | Named entity matching (NE) checks whether two words are individually part of some named entities with the same type. |
Lexical Semantic Models | For instance, when a word refers to a named entity , the particular sense and meaning is often not encoded. |
Background | (2007) proposed indexing text with their semantic roles and named entities . |
Background | Queries then include constraints of semantic roles and named entities for the predicate and its arguments in the question. |
Experiments | sentence-segmented and word-tokenized by NLTK (Bird and Loper, 2004), dependency-parsed by the Stanford Parser (Klein and Manning, 2003), and NER-tagged by the Illinois Named Entity Tagger (Ratinov and Roth, 2009) with an 18-label type set. |
Introduction | Moreover, this approach is more robust against, e.g., entity recognition errors, because answer typing knowledge is learned from how the data was actually labeled, not from how the data was assumed to be labeled (e. g., manual templates usually assume perfect labeling of named entities , but often it is not the case |
Introduction | This will be our off-the-shelf QA system, which recognizes the association between question type and expected answer types through various features based on e.g., part-of-speech tagging (POS) and named entity recognition (NER). |
Introduction | Moreover, our approach extends easily beyond fixed answer types such as named entities : we are already using POS tags as a demonstration. |
Method | 5Ogilvie (2010) showed in chapter 4.3 that keyword and named entities based retrieval actually outperformed SRL—based structured retrieval in MAP for the answer-bearing sentence retrieval task in their setting. |
Abstract | Named Entity Disambiguation (NED) refers to the task of mapping different named entity mentions in running text to their correct interpretations in a specific knowledge base (KB). |
Introduction | Named entities (NEs) have received much attention over the last two decades (Nadeau and Sekine, 2007), mostly focused on recognizing the boundaries of textual NE mentions and classifying them as, e.g., Person, Organization or Location. |
Introduction | Named Entity Disambiguation |
Related Work | The second line of approach is collective named entity disambiguation (CNED), where all mentions of entities in the document are disambiguated jointly. |
Solution Graph | cos: The cosine similarity between the named entity textual mention and the KB entry title. |
Solution Graph | Named Entity Selection: The simplest approach is to select the highest ranked entity in the list for each mention mi according to equation 5, where R could refer to Rm or R5. |
Solution Graph | Our results show that Page-Rank in conjunction with re-ranking by initial confidence score can be used as an effective approach to collectively disambiguate named entity textual mentions in a document. |
BabelNet | the general term concept to denote either a concept or a named entity . |
Conclusions | Firstly, the two resources contribute different kinds of lexical knowledge, one is concerned mostly with named entities , the other with concepts. |
Conclusions | Thus, even when they overlap, the two resources provide complementary information about the same named entities or concepts. |
Introduction | These include, among others, text summarization (Nastase, 2008), Named Entity Recognition (Bunescu and Pasca, 2006), Question Answering (Harabagiu et al., 2000) and text categorization (Gabrilovich and Markovitch, 2006). |
Introduction | Second, such resources are typically lexicographic, and thus contain mainly concepts and only a few named entities . |
Introduction | The result is an “encyclopedic dictionary”, that provides concepts and named entities lexical-ized in many languages and connected with large amounts of semantic relations. |
Methodology | A Wikipedia page (henceforth, Wikipage) presents the knowledge about a specific concept (e. g. BALLOON (AIRCRAFT)) or named entity (e.g. |
Concept Identification | o NER: 1 if the named entity tagger marked the span as an entity, 0 otherwise. |
Concept Identification | clex also has a set of rules for generating concept fragments for named entities and time expressions. |
Concept Identification | It generates a concept fragment for any entity recognized by the named entity tagger, as well as for any word sequence matching a regular expression for a time expression. |
Notation and Overview | Each stage is a discriminatively-trained linear structured predictor with rich features that make use of part-of-speech tagging, named entity tagging, and dependency parsing. |
Training | 0 Input: X, a sentence annotated with named entities (person, organization, location, mis-ciscellaneous) from the Illinois Named Entity Tagger (Ratinov and Roth, 2009), and part-of-speech tags and basic dependencies from the Stanford Parser (Klein and Manning, 2003; de Mameffe et al., 2006). |
Training | l. ( Named Entity ) Applies to name concepts and their opn children. |
Training | (Fuzzy Named Entity ) Applies to name concepts and their opn children. |
Related works | The task of phrase lemmatisation bears a close resemblance to a more popular task, namely lemmatisation of named entities . |
Related works | type of named entities considered, those two may be solved using similar or significantly different methodologies. |
Related works | Hence, the main challenge is to define a similarity metric between named entities (Piskorski et al., 2009; Kocon and Piasecki, 2012), which can be used to match different mentions of the same names. |
Experiments and Results | Other features: 1) Lucene’s best ranking of the date 2) Number of times where the date is absolute in the text 3) Number of times where the date is relative (but normalized) in the text 4) Total number of keywords of the query in the title, sentence and named entities of retrieved documents 5) Number of times where the date modifies a reported speech verb or is extracted from reported speech. |
Related Work | (Swan and Allen, 2000) present an approach to generating graphical timelines that involves extracting clusters of noun phrases and named entities . |
Temporal and Linguistic Processing | It also performs named entity recog- |
Temporal and Linguistic Processing | nition (NER) of the most usual named entity categories and recognizes temporal expressions. |
Temporal and Linguistic Processing | 4.2 Named Entity Recognition |
Annotation Proposal and Pilot Study | Phenomenon Occurrence Agreement Named Entity 91.67% 0.856 locative 17.62% 0.623 Numerical Quantity 14.05% 0.905 temporal 5 .48% 0.960 nominalization 4.05 % 0.245 implicit relation 1.90% 0.651 |
Annotation Proposal and Pilot Study | Phenomenon Occurrence Agreement missing argument 16.19% 0.763 missing relation 14.76% 0.708 excluding argument 10.48% 0.952 Named Entity mismatch 9.29% 0.921 excluding relation 5.00% 0.870 disconnected relation 4.52% 0.580 missing modifier 3.81% 0.465 disconnected argument 3.33% 0.764 Numeric Quant. |
Introduction | Tasks such as Named Entity and coreference resolution, syntactic and shallow semantic parsing, and information and relation extraction have been identified as worthwhile tasks and pursued by numerous researchers. |
NLP Insights from Textual Entailment | ported by their designers were the use of structured representations of shallow semantic content (such as augmented dependency parse trees and semantic role labels); the application of NLP resources such as Named Entity recognizers, syntactic and dependency parsers, and coreference resolvers; and the use of special-purpose ad-hoc modules designed to address specific entailment phenomena the researchers had identified, such as the need for numeric reasoning. |
Pilot RTE System Analysis | (b) All top-5 systems make consistent errors in cases where identifying a mismatch in named entities (NE) or numerical quantities (NQ) is important to make the right decision. |
Pilot RTE System Analysis | However, we believe that if a system could recognize key negation phenomena such as Named Entity mismatch, presence of Excluding arguments, etc. |
Experiments | We assume that attribute values should be either name entities or terms following @ and #. |
Experiments | Name entities are extracted using Ritter et al.’s NER system (2011). |
Experiments | Consecutive tokens with the same named entity tag are chunked (Mintz et al., 2009). |
Model | 0 Token-level: for each token t E 6, word identity, word shape, part of speech tags, name entity tags. |
Introduction | Semantic class tagging has been the subject of previous research, primarily under the guises of named entity recognition (NER) and mention detection. |
Introduction | Named entity recognizers perform semantic tagging on proper name noun phrases, and |
Related Work | Semantic class tagging is most closely related to named entity recognition (NER), mention detection, and semantic lexicon induction. |
Related Work | NER systems (e.g., (Bikel et al., 1997; Collins and Singer, 1999; Cucerzan and Yarowsky, 1999; Fleischman and Hovy, 2002) identify proper named entities , such as people, organizations, and locations. |
Related Work | NER systems, however, do not identify nominal NP instances (e.g., “a software manufacturer” or “the beach”), or handle semantic classes that are not associated with proper named entities (e.g., symptoms).1 ACE |
Discussion | Table 6: Sample Chinese freestyle named entities that are usernames. |
Discussion | Another major group of errors come from what we term freestyle named entities as exemplified in Table 6; i.e., person names in the form of user IDs and nicknames, that have less constraint on form in terms of length, canonical structure (not surnames with given names; as is standard in Chinese names) and may mix alphabetic characters. |
Discussion | Most of these belong to the category of Person Name (PER), as defined in CoNLL-200311 Named Entity Recognition shared task. |
Methodology | In addition, we employ additional online word lists3 to distinguish named entities and function words from potential informal words. |
Abstract | This paper studies named entity translation and proposes “selective temporality” as a new feature, as using temporal features may be harmful for translating “atemporal” entities. |
Introduction | Named entity translation discovery aims at mapping entity names for people, locations, etc. |
Introduction | As many new named entities appear every day in newspapers and web sites, their translations are nontrivial yet essential. |
Introduction | Early efforts of named entity translation have focused on using phonetic feature (called PH) to estimate a phonetic similarity between two names (Knight and Graehl, 1998; Li et al., 2004; Virga and Khudanpur, 2003). |
Preliminaries | To identify entities, we use a CRF-based named entity tagger (Finkel et al., 2005) and a Chinese word breaker (Gao et al., 2003) for English and Chinese corpora, respectively. |
Experiments | Named entities which co-occur at least 6 times with a morph query in the same topic are selected as its target candidates. |
Related Work | Other similar research lines are the TAC-KBP Entity Linking (EL) (Ji et al., 2010; Ji et al., 2011), which links a named entity in news and web documents to an appropriate knowledge base (KB) entry, the task of mining name translation pairs from comparable corpora (Udupa et al., 2009; Ji, 2009; Fung and Yee, 1998; Rapp, 1999; Shao and Ng, 2004; Hassan et al., 2007) and the link prediction problem (Adamic and Adar, 2001; Liben-Nowell and Kleinberg, 2003; Sun et al., 2011b; |
Target Candidate Identification | However, obviously we cannot consider all of the named entities in these sources as target candidates due to the sheer volume of information. |
Target Candidate Identification | In addition, morphs are not limited to named entity forms. |
Target Candidate Ranking | Then we apply a hierarchical Hidden Markov Model (HMM) based Chinese lexical analyzer ICTCLAS (Zhang et al., 2003) to extract named entities , noun phrases and events. |
Experiments | In Wikipedia articles, named entities were identified by anchor text linking to another article and starting with a capital letter (Yan et al., 2009). |
Experiments | than one named entity . |
Knowledge-based Distant Supervision | An entity is mentioned as a named entity in text. |
Knowledge-based Distant Supervision | Since two entities mentioned in a sentence do not always have a relation, we select entity pairs from a corpus when: (i) the path of the dependency parse tree between the corresponding two named entities in the sentence is no longer than 4 and (ii) the path does not contain a sentence-like boundary, such as a relative clause1 (Banko et al., 2007; Banko and Etzioni, 2008). |
Wrong Label Reduction | 2If we use a standard named entity tagger, the entity types are Person, Location, and Organization. |
Information Extraction: Slot Filling | Finally, we filter extractions whose WordNet or named entity label does not match the learned slot’s type (e.g., a Location does not match a Person). |
Learning Templates from Raw Text | We also tag the corpus with an NER system and allow patterns to include named entity types, e.g., ‘kidnaszERSON’. |
Previous Work | Classifiers rely on the labeled examples’ surrounding context for features such as nearby tokens, document position, syntax, named entities , semantic classes, and discourse relations (Maslennikov and Chua, 2007). |
Previous Work | (2006) integrate named entities into pattern learning (PERSON won) to approximate unknown semantic roles. |
Previous Work | However, the limitations to their approach are that (l) redundant documents about specific events are required, (2) relations are binary, and (3) only slots with named entities are learned. |
Abstract | We use search engine results to address a particularly difficult cross-domain language processing task, the adaptation of named entity recognition (NER) from news text to web queries. |
Experimental data | As out-of-domain newswire evaluation data3 we use the development test data from the NIST 1999 IEER named entity corpus, a dataset of 50,000 tokens of New York Times (NYT) and Associated Press Weekly news.4 This corpus is annotated with person, location, organization, cardinal, duration, measure, and date labels. |
Introduction | For example, a named entity (NE) recognizer trained on news text may tag the NE London in an out-of-domain web query like London Klondike gold rush as a location. |
Piggyback features | The value of the feature URL-MI is the average difference between the MI of PER and the other named entities . |
Piggyback features | The feature LEX-MI interprets words occurring before or after so as indicators of named entitihood . |
Abstract | Sequential modeling has been widely used in a variety of important applications including named entity recognition and shallow parsing. |
Experiments | We apply 1-best and k-best sequential decoding algorithms to five NLP tagging tasks: Penn TreeBank (PTB) POS tagging, CoNLLZOOO joint POS tagging and chunking, CoNLL 2003 joint POS tagging, chunking and named entity tagging, HPSG supertag-ging (Matsuzaki et al., 2007) and a search query named entity recognition (NER) dataset. |
Experiments | Similarly we combine the POS tags, chunk tags, and named entity tags to form joint tags for CoNLL 2003 dataset, e.g., PRP$|I-NP|O. |
Experiments | Note that by such tag joining, we are able to offer different tag decodings (for example, chunking and named entity tagging) simultaneously. |
Introduction | For example to pursue a high recall ratio, a named entity recognition system may have to adopt k-best sequences in case the true entities are not recognized at the best one. |
Approach | In addition to this shallow parsing method, we also use named entity recognition (NER) to identify more entities. |
Approach | We use the Stanford Named Entity Recognizer (Finkel et al., 2005) for this purpose. |
Evaluation | 7) We only use named entity recognition to identify entity targets; i.e. |
Evaluation | Although using both named entity recognition (NER) and noun phrase chunking achieves better results, it |
Related Work | In our work, we extract as targets frequent noun phrases and named entities that are used by two or more different discussants. |
Experiments | We used CRFs-based Japanese dependency parser (Imamura et al., 2007) and named entity recognizer (Suzuki et al., 2006) for sentiment extraction and constructing feature vectors for readability score, respectively. |
Optimizing Sentence Sequence | To this, we add named entity tags (e.g. |
Optimizing Sentence Sequence | We observe that the first sentence of a review of a restaurant frequently contains named entities indicating location. |
Optimizing Sentence Sequence | Note that our method learns w from texts automatically annotated by a POS tagger and a named entity tagger. |
Introduction | More recently, the OntoNotes project (Pradhan et al., 2007) released a one million word English corpus of newswire, broadcast news, and broadcast conversation that is annotated for Penn Treebank syntax, PropBank predicate argument structures, coreference, and named entities . |
MASC Annotations | words Token Validated 1 18 222472 Sentence Validated 1 18 222472 POS/lemma Validated 1 18 222472 Noun chunks Validated 1 18 222472 Verb chunks Validated 1 18 222472 Named entities Validated 1 18 222472 FrameNet frames Manual 21 17829 HSPG Validated 40* 30106 Discourse Manual 40* 30106 Penn Treebank Validated 97 873 83 PropB ank Validated 92 50165 Opinion Manual 97 47583 TimeB ank Validated 34 5434 Committed belief Manual 13 4614 Event Manual 13 4614 Coreference Manual 2 1 877 |
MASC Annotations | To derive maximal benefit from the semantic information provided by these resources, the entire corpus is also annotated and manually validated for shallow parses (noun and verb chunks) and named entities (person, location, organization, date and time). |
MASC Annotations | Automatically-produced annotations for sentence, token, part of speech, shallow parses (noun and verb chunks), and named entities (person, location, organization, date and time) are hand-validated by a team of students. |
Abstract | Statistical machine learning methods are employed to train a Named Entity Recognizer from annotated data. |
Abstract | A number of word similarity measures are proposed for clustering words for the Named Entity Recognition task. |
Introduction | Named Entity Recognition (NER) involves locating and classifying the names in a text. |
Maximum Entropy Based Model for Hindi NER | This corpus has been manually annotated and contains about 16,491 Named Entities (NEs). |
NER features | Named entity recognition (NER) provides information that is particularly relevant for NP parsing, simply because entities are nouns. |
NER features | coordinate structure in biological named entities . |
NER features | We identify constituents that dominate tokens that all have the same NE tag, as these nodes will not cause a “crossing bracket” with the named entity . |
Conclusions | Specifically, we plan to investigate higher coverage inventories such as BabelNet (Navigli and Ponzetto, 2012a), which will handle texts with named entities and rare senses that are not in WordNet, and will also enable cross-lingual semantic similarity. |
Experiment 1: Textual Similarity | The TLsyn system also uses Google Book Ngrams, as well as dependency parsing and named entity recognition. |
Experiment 1: Textual Similarity | Additionally, because the texts often contain named entities which are not present in WordNet, we incorporated the similarity values produced by four string-based measures, which were used by other teams in the STS task: (1) longest common substring which takes into account the length of the longest overlapping contiguous sequence of characters (substring) across two strings (Gusfield, 1997), (2) longest common subsequence which, instead, finds the longest overlapping subsequence of two strings (Allison and Dix, 1986), (3) Greedy String Tiling which allows reordering in strings (Wise, 1993), and (4) the character/word n-gram similarity proposed by Barron-Cedefio et al. |
Experiment 1: Textual Similarity | Named entity features used by the TLsim system could be the reason for its better performance on the MSRpar dataset, which contains a large number of named entities . |
Evaluation | The problem in this example is incorrect segmentation of a named entity . |
Introduction | Known entities are recognized and mapped to the KB using a recent tool for named entity disambiguation (Hoffart 2011). |
Related Work | Tagging mentions of named entities with lexical types has been pursued in previous work. |
Related Work | Most well-known is the Stanford named entity recognition (NER) tagger (Finkel 2005) which assigns coarse-grained types like person, organization, location, and other to noun phrases that are likely to denote entities. |
Introduction* | The task of Name Entity (NE) translation is to translate a name entity from source language to target language, which plays an important role in machine translation and cross-language information retrieval (CLIR). |
Statistical Transliteration Model | Because the process of named entity recognition may lose some NEs, we will reserve all the words in web corpus without any filtering. |
Statistical Transliteration Model | Then for every Tik, we use the named entity recognition (NER) software to determine whether rci is a NE or not. |
Statistical Transliteration Model | The training corpus for statistical transliteration model comes from the corpus of Chinese <-> English Name Entity Lists v 1.0 (LDC2005T34). |
Distant Supervised Relation Extraction | We enforce the Named Entity type of entity and value to match a expected type, predefined for the relation. |
Document Representation | There are three families of types: Events (verbs that describe an action, annotated with tense, polarity and aspect); standardized Time Expressions; and Named Entities , with additional annotations such as gender or age. |
Document Representation | 1Most chunks consist in one word; we join words into a chunk (and a node) in two cases: a multi—word named entity and a verb and its auxiliaries. |
Document Representation | The processing includes dependency parsing, named entity recognition and coreference resolution, done with the Stanford CoreNLP software (Klein and Manning, 2003); and events and temporal information extraction, via the TARSQI Toolkit (Verhagen et al., 2005). |
Experimental Evaluation | several potential named entities it could refer to, even if the vast majority of references were to only a single entity. |
Introduction | Several NLP tasks, such as word sense disambiguation, word sense induction, and named entity disambiguation, address this ambiguity problem to varying degrees. |
Related Work | the well studied problems of named entity disambiguation (NED) and word sense disambiguation (WSD). |
Related Work | Both named entity and word sense disambiguation are extensively studied, and surveys on each are available (Nadeau and Sekine, 2007; Navigli, 2009). |
Experiments | For this purpose we used the Freebase Search API (Freebase, 2013a).A11 named entities 8 in a question were sent to this API, which returned a ranked list of relevant topics. |
Experiments | 8When no named entities are detected, we fall back to noun phrases. |
Graph Features | We simply apply a named entity recognizer to find the question topic. |
Graph Features | (special case) if a qtopic node was tagged as a named entity, then replace this node with its its named entity form, e.g., bieber —> qtopic=person; |
Experiments | We extract dependency paths for each pair of named entities in one sentence. |
Experiments | To determine entity types, we link named entities to Wikipedia pages using the Wikifier (Rati-nov et al., 2011) package and extract categories from the Wikipedia page. |
Introduction | Many relation discovery methods rely exclusively on the notion of either shallow or syntactic patterns that appear between two named entities (B ollegala et al., 2010; Lin and Pantel, 2001). |
Related Work | (2004) cluster pairs of named entities according to the similarity of context words intervening between them. |
Abstract | We conduct experiments on named entity recognition data and find that our approach can provide a significant improvement over the standard model when annotation errors are present. |
Experiments | We now consider the problem of named entity recognition (NER) to evaluate how our model performs in a large-scale prediction task. |
Experiments | In traditional NER, the goal is to determine whether each word is a person, organization, location, or not a named entity (‘other’). |
Experiments | (This task does not trivially reduce to finding the capitalized words, as the model must distinguish between people and other named entities like organizations). |
Evaluation | In the absence of a sizable number of linguistically interesting terms (like wicked) that are known to be geographically variable, we consider the proxy of estimating the named entities evoked by specific terms in different geographical regions. |
Evaluation | As noted above, geographic terms like city provide one such example: in Massachusetts we expect the term city to be more strongly connected to grounded named entities like Boston than to other US cities. |
Introduction | This information enables learning models of word meaning that are sensitive to such factors, allowing us to distinguish, for example, between the usage of wicked in Massachusetts from the usage of that word elsewhere, and letting us better associate geographically grounded named entities (e.g, Boston) with their hypemyms (city) in their respective regions. |
Detailed generative story | (c) If wdk is a named entity type (PERSON, PLACE, ORG, . |
Inference by Block Gibbs Sampling | Each context word and each named entity is associated with a latent topic. |
Inference by Block Gibbs Sampling | This process treats all topics as exchangeable, including those associated with named entities . |
Background | It works well for named entities . |
Related Work | Word embeddings have been successfully applied in many applications, such as in sentiment analysis (Socher et al., 2011b), paraphrase detection (Socher et al., 2011a), chunking, and named entity recognition (Turian et al., 2010; Collobert et al., 2011). |
Results and Analysis 5.1 Varying the Amount of Clusters | The red paths refer to the relations between the named entity and its hypernyms extracted using the web mining method (Fu et al., 2013). |
Introduction | We present a new dataset of Twitter messages that use FactBank predicates (e.g., claim, say, insist) to scope the claims of named entity sources. |
Modeling factuality judgments | 0 Source: represented by the named entity or username in the source field (see Figure 4) Journalist: represented by their Twitter ID Claim: represented by a bag-of-words vector from the claim field (Figure 4) |
Text data | 0 Finally, we restrict consideration to tweets in which the source contains a named entity or twitter usemame. |
Data | We also map all named entities (e.g., “downtown” and “28X”) to their semantic types (resp. |
Data | We map named entities to their semantic types, apply stemming, and remove stop words.3 The corpus we use contains approximately 2, 000 dialogue sessions or 80, 000 conversation utterances. |
Latent Structure in Dialogues | 3We used regular expression to map named entities , and Porter stemmer in NLTK to stem all tokens. |
Baselines | > Basic features: the named entity and its type in the focus candidate; relative position of the focus candidate to the negative expression (before or after). |
Baselines | Along with negation focus annotation, this corpus also contains other annotations, such as POS tag, named entity , chunk, constituent tree, dependency tree, and semantic role. |
Baselines | > Named Entity Recognizer: We employ the Stanford NER5 (Finkel et al., 2005) to obtain named entities . |
Experiments | Three kinds of features, namely, lexical, syntactic and named entity tag features, were extracted from relation mentions. |
Introduction | As we cannot tell what kinds of features are effective in advance, we have to use NLP toolkits, such as Stanford CoreNLPIO, to extract a variety of textual features, e.g., named entity tags, part-of-speech tags and lexicalized dependency paths. |
Related Work | Other literatures (Takamatsu et al., 2012; Min et al., 2013; Zhang et al., 2013; Xu et al., 2013) addressed more specific issues, like how to construct the negative class in learning or how to adopt more information, such as name entity tags, to improve the performance. |
Abstract | Moreover, the success of joint bilingual learning may lend itself to many inherent multilingual NLP tasks such as POS tagging (Yarowsky and Ngai, 2001), name entity recognition (Yarowsky et al., 2001), sentiment analysis (Wan, 2009), and semantic role labeling (Sebastian and Lapata, 2009) etc. |
Abstract | It has been successfully applied to many NLP applications, such as POS tagging (Engelson and Dagan, 1996; Ringger et al., 2007), word sense disambiguation (Chan and Ng, 2007; Zhu and Hovy, 2007), sentiment detection (Brew et al., 2010; Li et al., 2012), syntactical parsing (Hwa, 2004; Osborne and Baldridge, 2004), and named entity recognition (Shen et al., 2004; Tomanek et al., 2007; Tomanek and Hahn, 2009) etc. |
Abstract | named entity and syntactic parse tree). |
Experiments & Results 4.1 Experimental Setup | From the oovs, we exclude numbers as well as named entities . |
Experiments & Results 4.1 Experimental Setup | We apply a simple heuristic to detect named entities: basically words that are capitalized in the original deV/test set that do not appear at the beginning of a sentence are named entities . |
Introduction | Although this is helpful in translating a small fraction of oovs such as named entities for languages with same writing systems, it harms the translation in other types of oovs and distant language pairs. |
Introduction | Many tweet related researches are inspired, from named entity recognition (Liu et al., 2012), topic detection (Mathioudakis and Koudas, 2010), clustering (Rosa et al., 2010), to event extraction (Grinev et al., 2009). |
Introduction | (2012), on average a named entity has 3.3 different surface forms in tweets. |
Task Definition | First, we assume that mentions are given, e.g., identified by some named entity recognition system. |
Abstract | This paper studies the problem of mining named entity translations from comparable corpora with some “asymmetry”. |
Abstract | Our experimental results on English-Chinese corpora show that our selective propagation approach outperforms the previous approaches in named entity translation in terms of the mean reciprocal rank by up to 0.16 for organization names, and 0.14 in a low comparability case. |
Introduction | This task is more challenging in the presence of multilingual text, because translating named entities (NEs), such as persons, locations, or organizations, is a nontrivial task. |
Experiments | Depending on the document category, we found some variations as to which hierarchy was learned in each setting, but we noticed that parameters starting with right and left gramtypes often produced quite good hierarchies: for instance right gramtype —> left gramtype —> same sentence —> right named entity type. |
Introduction | The main question we raise is, given a set of indicators (such as grammatical types, distance between two mentions, or named entity types), how to best partition the pool of mention pair examples in order to best discriminate coreferential pairs from non coreferential ones. |
System description | We used classical features that can be found in details in (Bengston and Roth, 2008) and (Rah-man and Ng, 2011): grammatical type and subtype of mentions, string match and substring, apposition and copula, distance (number of separating mentions/sentences/words), gender/number match, synonymy/hypemym and animacy (using WordNet), family name (based on lists), named entity types, syntactic features (gold parse) and anaphoricity detection. |
Timestamp Classifiers | Named entities are important to document dating due to the nature of people and places coming in and out of the news at precise moments in time. |
Timestamp Classifiers | Named Entities : Although not directly related to time expressions, we also include n— grams of tokens that are labeled by an NER system using Person, Organization, or Location. |
Timestamp Classifiers | Collecting named entity mentions will differentiate between an article discussing a bill and one discussing the US President, Bill Clinton. |
Disputant relation-based method | A named entity combined with a topic particle or a subject particle is identified as the subject of these quotes. |
Disputant relation-based method | We detect the name of an organization, person, or country using the Korean Named Entity Recognizer (Lee et al. |
Introduction | We observe that the method achieves acceptable performance for practical use with basic language resources and tools, i.e., Named Entity Recognizer (Lee et al. |
Mention Extraction System | NE tags We automatically annotate the sentences with named entity (NE) tags using the named entity tagger of (Ratinov and Roth, 2009). |
Mention Extraction System | From a sentence, we gather the following as candidate mentions: all nouns and possessive pronouns, all named entities annotated by the the NE tagger (Ratinov and Roth, 2009), all base noun phrase (NP) chunks, all chunks satisfying the pattern: NP (PP NP)+, all NP constituents in the syntactic parse tree, and from each of these constituents, all substrings consisting of two or more words, provided the sub-strings do not start nor end on punctuation marks. |
Syntactico-Semantic Structures | lw: last word in the mention; Bc(w): the brown cluster bit string representing w; NE: named entity |
Introduction | Automatic markup of textual documents with linguistic annotations such as part-of-speech tags, sentence constituents, named entities , or semantic roles is a common practice in natural language processing (NLP). |
Joint Query Annotation | Many query annotations that are useful for IR can be represented using this simple form, including capitalization, POS tagging, phrase chunking, named entity recognition, and stopword indicators, to name just a few. |
Related Work | These approaches have been shown to be successful for tasks such as parsing and named entity recognition in newswire data (Finkel and Manning, 2009) or semantic role labeling in the Penn Treebank and Brown corpus (Toutanova et al., 2008). |
Conclusion | The major areas of CCGbank’s grammar left to be improved are the analysis of comparatives, and the analysis of named entities . |
Conclusion | Named entities are also difficult to analyse, as many entity types obey their own specific grammars. |
Conclusion | This is another example of a phenomenon that could be analysed much better in CCGbank using an existing resource, the BBN named entity corpus. |
An acoustics-based approach | Given two subpaths with nearly identical average similarity scores, we suggest that the longer of the two is more likely to refer to content of interest that is shared between two speech utterances, e.g., named entities . |
Experimental results | Although named entities and domain-specific terms are often highly relevant to the documents in which they are referenced, these types of words are often not included in ASR vocabularies, due to their relative global rarity. |
Introduction | This out-of-vocabulary (OOV) problem is unavoidable in the regular ASR framework, although it is more likely to happen on salient words such as named entities or domain-specific terms. |
Discussion | We once found many words, mostly named entities , were outside the lexicon. |
Discussion | Thus we managed to collect a named entity translation dictionary to enhance the original one. |
Treebank Translation and Dependency Transformation | As a bilingual lexicon is required for our task and none of existing lexicons are suitable for translating PTB, two lexicons, LDC Chinese-English Translation Lexicon Version 2.0 (LDC2002L27), and an English to Chinese lexicon in StarDith, are conflated, with some necessary manual extensions, to cover 99% words appearing in the PTB (the most part of the untranslated words are named entities .). |
Introduction | The task of Named Entity (NE) translation is to translate a named entity from the source language to the target language, which plays an important role in machine translation and cross-language information retrieval (CLIR). |
Introduction | 3) Asymmetric alignment: When we extract the translation equivalent from the web pages, the traditional method should recognize the named entities in the target language sentence first, and then the extracted NEs will be aligned with the source ON. |
Introduction | However, the named entity recognition (NER) will always introduce some mistakes. |
Experiment | In general, the NTCIR topics are general descriptive words such as “regenerative medicine”, “American economy after the 911 terrorist attacks”, and “lawsuit brought against Microsoft for monopolistic practices.” The TREC topics are more named-entity-like terms such as “CarmaX”, “Wikipedia primary source”, “Jiffy Lube”, “Starbucks”, and “Windows Vista.” We have experimentally shown that LSA is more suited to finding associations between general terms because its training documents are from a general domain.9 Our PMI measure utilizes a web search engine, which covers a variety of named entity terms. |
Term Weighting and Sentiment Analysis | The statistical approaches may suffer from data sparseness problems especially for named entity terms used in the query, and the proximal clues cannot sufficiently cover all term—query associations. |
Term Weighting and Sentiment Analysis | Mullen and Collier (2004) manually annotated named entities in their dataset (i.e. |
QG for Paraphrase Modeling | In addition to assuming that dependency parse trees for s and t are observable, we also assume each word 2121- comes with POS and named entity tags. |
QG for Paraphrase Modeling | Dependency label, POS, and named entity class The newly generated target word’s dependency label, POS, and named entity class drawn from multinomial distributions plab, ppos, and pm that condition, respectively, on the configuration and the POS and named entity class of the aligned source-tree word 3360-) (lines 9—11). |
QG for Paraphrase Modeling | We estimate the distributions over dependency labels, POS tags, and named entity classes using the transformed treebank (footnote 4). |
Background: Chinese Abbreviations | While the abbreviations mostly originate from noun phrases (in particular, named entities ), other general phrases are also abbreviatable. |
Unsupervised Translation Induction for Chinese Abbreviations | One may use a named entity tagger to obtain such a list. |
Unsupervised Translation Induction for Chinese Abbreviations | However, this relies on the existence of a Chinese named entity tagger with high-precision. |
Conclusion | While annotating with named entity data or a lexical type supertagger were also found to increase coverage, the POS tagger had the greatest effect with up to 45% coverage increase on unseen text. |
Unknown Word Handling | Since the parser has the means to accept named entity (NE) information in the input, we also experimented with using generic lexical items generated from NE data. |
Unknown Word Handling | It is possible that another named entity tagger would give better results, and this may be looked at in future experiments. |