Abstract | Some languages lack large knowledge bases and good discriminative features for Name Entity Recognition (NER) that can generalize to previously unseen named entities . |
Introduction | Named Entity Recognition (NER) is essential for a variety of Natural Language Processing (NLP) applications such as information extraction. |
Introduction | - Contextual features: Certain words are indicative of the existence of named entities . |
Introduction | For example, the word “said” is often preceded by a named entity of type “person” or “organization”. |
Abstract | We show that using tweet specific feature (hashtag) and news specific feature ( named entities ) as well as temporal constraints, we are able to extract text-to-text correlations, and thus completes the semantic picture of a short text. |
Creating Text-to-text Relations via Twitter/News Features | 4.1 Hashtags and Named Entities |
Creating Text-to-text Relations via Twitter/News Features | Named entities are some of the most salient features in a news article. |
Creating Text-to-text Relations via Twitter/News Features | Directly applying Named Entity Recognition (NER) tools on news titles or |
Introduction | such as named entities in a document. |
Introduction | Named entities acquired from a news document, typically with high accuracy using Named Entity Recognition [NER] tools, may be particularly informative. |
Distant Supervision | As this classifier uses training data that is biased towards a specialized case (sentences containing the named entity types creators and characters), it does not generalize well to other S-relevance problems and thus yields lower performance on the full dataset. |
Distant Supervision | First, the classifier only sees a subset of examples that contain named entities , making generalization to other types of expressions difficult. |
Distant Supervision | Conditions 4-8 train supervised classifiers based on the labels from DSlabels+MinCut: (4) MaXEnt with named entities (NE); (5) MaXEnt with NE and semantic (SEM) features; (6) CRF with NE; (7) MaXEnt with NE and sequential (SQ) features; (8) MaXEnt with NE, SQ, and SEM. |
Features | 5.2 Named Entities |
Features | As standard named entity recognition (NER) systems do not capture categories that are relevant to the movie domain, we opt for a lexicon-based approach similar to (Zhuang et al., 2006). |
Features | If a capitalized word occurs, we check whether it is part of an already recognized named entity . |
Related Work | We follow their approach by using IMDb to define named entity features. |
Sentiment Relevance | An error analysis for the classifier trained on P&L shows that many sentences misclassified as S-relevant (fpSR) contain polar words; for example, Then, the situation turns M. In contrast, sentences misclassified as S-nonrelevant (fpSNR) contain named entities or plot and movie business vocabulary; for example, Ti_m Roth delivers the most impressive MM by getting the M language right. |
Comparable Question Mining | As a preprocessing step for detecting comparable relations, our extraction algorithm identifies all the named entities of interest in our corpus, keeping only questions that contain at least two entities. |
Comparable Question Mining | Answers containing two named entities , e. g. “Is #1 dating #2 ?”, our CRF tagger is trained to detect only comparable relations like “Who is prettier #1 or #2 ?”. |
Comparable Question Mining | Input: A news article Output: A sorted list of comparable questions 1: Identify all target named entities (NEs) in the article 2: Infer the distribution of LDA topics for the article 3: For each comparable relation R in the database, compute its relevance score to be the similarity between the topic distributions of R and the article 4: Rank all the relations according to their relevance score and pick the top M as relevant 5: for each relevant relation R in the order of relevance ranking do 6: Filter out all the target NEs that do not pass the single entity classifier for R 7: Generate all possible NE pairs from the those that passed the single classifier 8: Filter out all the generated NE pairs that do not pass the entity pair classifier for R 9: Pick up the top N pairs with positive classification score to be qualified for generation |
Motivation and Algorithmic Overview | Looking at the structure of comparable questions, we observed that a specific comparable relation, such as ‘better dad’ and ‘faster’, can usually be combined with named entities in several syntactic ways to construct a concrete question. |
Online Question Generation | For each relevant relation, we then generate concrete questions by picking generic templates that are applicable for this relation and instantiating them with pairs of named entities appearing in the article. |
Online Question Generation | To this end, we utilize two different broad-scale sources of information about named entities . |
Online Question Generation | The first is DBPedia3, which contains structured information on entries in Wikipedia, many of them are named entities that appear in news articles. |
Background | (2013) where, in a given corpus, a combination of domain specific named entity tagging and clustering sentences (based on semantic predicates) were used to generate templates. |
Methodology | The DRS consists of semantic predicates and named entity tags. |
Methodology | In parallel, domain specific named entity tags are identified and, in conjunction with the semantic predicates, are used to create templates. |
Methodology | For example, in (2), using the templates in (le-f), the identified named entities are assigned to a clustered CuId (2ab). |
Data | Toponyms were annotated by a semiautomated process: a named entity recognizer identified toponyms, and then coordinates were assigned using simple rules and corrected by hand. |
Evaluation | when a named entity recognizer is used to identify toponyms. |
Evaluation | We primarily present results from experiments with gold toponyms but include an accuracy measure for comparability with results from experiments run on plain text with a named entity recognizer. |
Introduction | (2010) use relationships learned between people, organizations, and locations from Wikipedia to aid in toponym resolution when such named entities are present, but do not exploit any other textual context. |
Introduction | However, it is important to consider the utility of an end-to-end toponym identification and resolution system, so we also demonstrate that performance is still strong when toponyms are detected with a standard named entity recognizer. |
Results | The named entity recognizer is likely better at detecting common toponyms than rare toponyms due to the na- |
Results | We also measured the mean and median error distance for toponyms correctly identified by the named entity recognizer, and found that they tended to be 50-200km worse than for gold toponyms. |
Results | This also makes sense given the named entity recognizer’s tendency to detect common toponyms: common toponyms tend to be more ambiguous than others. |
Toponym Resolvers | To create the indirectly supervised training data for WISTR, the OpenNLP named entity recognizer detects toponyms in GEOWIKI, and candidate locations for each toponym are retrieved from GEONAMES. |
Abstract | Translated bi-texts contain complementary language cues, and previous work on Named Entity Recognition (NER) has demonstrated improvements in performance over monolingual taggers by promoting agreement of tagging decisions between the two languages. |
Experimental Setup | We train the two CRF models on all portions of the OntoNotes corpus that are annotated with named entity tags, except the parallel-aligned portion which we reserve for development and test purposes. |
Experimental Setup | Out of the 18 named entity types that are annotated in OntoNotes, which include person, location, date, money, and so on, we select the four most commonly seen named entity types for evaluation. |
Introduction | We study the problem of Named Entity Recognition (NER) in a bilingual context, where the goal is to annotate parallel bi-texts with named entity tags. |
Introduction | We can also automatically construct a named entity translation lexicon by annotating and extracting entities from bi-texts, and use it to improve MT performance (Huang and Vogel, 2002; Al-Onaizan and Knight, 2002). |
Introduction | As a result, we can find complementary cues in the two languages that help to disambiguate named entity mentions (Brown et al., 1991). |
Related Work | set of heuristic rules to expand a candidate named entity set generated by monolingual taggers, and then rank those candidates using a bilingual named entity dictionary. |
Experiments | All these systems incorporated lexical semantics features derived from WordNet and named entity features. |
Experiments | Features used in the experiments can be categorized into six types: identical word matching (I), lemma matching (L), WordNet (WN), enhanced Lexical Semantics (LS), Named Entity matching (NE) and Answer type checking (Ans). |
Experiments | Named entity matching (NE) checks whether two words are individually part of some named entities with the same type. |
Lexical Semantic Models | For instance, when a word refers to a named entity , the particular sense and meaning is often not encoded. |
Background | (2007) proposed indexing text with their semantic roles and named entities . |
Background | Queries then include constraints of semantic roles and named entities for the predicate and its arguments in the question. |
Experiments | sentence-segmented and word-tokenized by NLTK (Bird and Loper, 2004), dependency-parsed by the Stanford Parser (Klein and Manning, 2003), and NER-tagged by the Illinois Named Entity Tagger (Ratinov and Roth, 2009) with an 18-label type set. |
Introduction | Moreover, this approach is more robust against, e.g., entity recognition errors, because answer typing knowledge is learned from how the data was actually labeled, not from how the data was assumed to be labeled (e. g., manual templates usually assume perfect labeling of named entities , but often it is not the case |
Introduction | This will be our off-the-shelf QA system, which recognizes the association between question type and expected answer types through various features based on e.g., part-of-speech tagging (POS) and named entity recognition (NER). |
Introduction | Moreover, our approach extends easily beyond fixed answer types such as named entities : we are already using POS tags as a demonstration. |
Method | 5Ogilvie (2010) showed in chapter 4.3 that keyword and named entities based retrieval actually outperformed SRL—based structured retrieval in MAP for the answer-bearing sentence retrieval task in their setting. |
Related works | The task of phrase lemmatisation bears a close resemblance to a more popular task, namely lemmatisation of named entities . |
Related works | type of named entities considered, those two may be solved using similar or significantly different methodologies. |
Related works | Hence, the main challenge is to define a similarity metric between named entities (Piskorski et al., 2009; Kocon and Piasecki, 2012), which can be used to match different mentions of the same names. |
Discussion | Table 6: Sample Chinese freestyle named entities that are usernames. |
Discussion | Another major group of errors come from what we term freestyle named entities as exemplified in Table 6; i.e., person names in the form of user IDs and nicknames, that have less constraint on form in terms of length, canonical structure (not surnames with given names; as is standard in Chinese names) and may mix alphabetic characters. |
Discussion | Most of these belong to the category of Person Name (PER), as defined in CoNLL-200311 Named Entity Recognition shared task. |
Methodology | In addition, we employ additional online word lists3 to distinguish named entities and function words from potential informal words. |
Experiments | Named entities which co-occur at least 6 times with a morph query in the same topic are selected as its target candidates. |
Related Work | Other similar research lines are the TAC-KBP Entity Linking (EL) (Ji et al., 2010; Ji et al., 2011), which links a named entity in news and web documents to an appropriate knowledge base (KB) entry, the task of mining name translation pairs from comparable corpora (Udupa et al., 2009; Ji, 2009; Fung and Yee, 1998; Rapp, 1999; Shao and Ng, 2004; Hassan et al., 2007) and the link prediction problem (Adamic and Adar, 2001; Liben-Nowell and Kleinberg, 2003; Sun et al., 2011b; |
Target Candidate Identification | However, obviously we cannot consider all of the named entities in these sources as target candidates due to the sheer volume of information. |
Target Candidate Identification | In addition, morphs are not limited to named entity forms. |
Target Candidate Ranking | Then we apply a hierarchical Hidden Markov Model (HMM) based Chinese lexical analyzer ICTCLAS (Zhang et al., 2003) to extract named entities , noun phrases and events. |
Abstract | This paper studies named entity translation and proposes “selective temporality” as a new feature, as using temporal features may be harmful for translating “atemporal” entities. |
Introduction | Named entity translation discovery aims at mapping entity names for people, locations, etc. |
Introduction | As many new named entities appear every day in newspapers and web sites, their translations are nontrivial yet essential. |
Introduction | Early efforts of named entity translation have focused on using phonetic feature (called PH) to estimate a phonetic similarity between two names (Knight and Graehl, 1998; Li et al., 2004; Virga and Khudanpur, 2003). |
Preliminaries | To identify entities, we use a CRF-based named entity tagger (Finkel et al., 2005) and a Chinese word breaker (Gao et al., 2003) for English and Chinese corpora, respectively. |
Experimental Evaluation | several potential named entities it could refer to, even if the vast majority of references were to only a single entity. |
Introduction | Several NLP tasks, such as word sense disambiguation, word sense induction, and named entity disambiguation, address this ambiguity problem to varying degrees. |
Related Work | the well studied problems of named entity disambiguation (NED) and word sense disambiguation (WSD). |
Related Work | Both named entity and word sense disambiguation are extensively studied, and surveys on each are available (Nadeau and Sekine, 2007; Navigli, 2009). |
Conclusions | Specifically, we plan to investigate higher coverage inventories such as BabelNet (Navigli and Ponzetto, 2012a), which will handle texts with named entities and rare senses that are not in WordNet, and will also enable cross-lingual semantic similarity. |
Experiment 1: Textual Similarity | The TLsyn system also uses Google Book Ngrams, as well as dependency parsing and named entity recognition. |
Experiment 1: Textual Similarity | Additionally, because the texts often contain named entities which are not present in WordNet, we incorporated the similarity values produced by four string-based measures, which were used by other teams in the STS task: (1) longest common substring which takes into account the length of the longest overlapping contiguous sequence of characters (substring) across two strings (Gusfield, 1997), (2) longest common subsequence which, instead, finds the longest overlapping subsequence of two strings (Allison and Dix, 1986), (3) Greedy String Tiling which allows reordering in strings (Wise, 1993), and (4) the character/word n-gram similarity proposed by Barron-Cedefio et al. |
Experiment 1: Textual Similarity | Named entity features used by the TLsim system could be the reason for its better performance on the MSRpar dataset, which contains a large number of named entities . |
Evaluation | The problem in this example is incorrect segmentation of a named entity . |
Introduction | Known entities are recognized and mapped to the KB using a recent tool for named entity disambiguation (Hoffart 2011). |
Related Work | Tagging mentions of named entities with lexical types has been pursued in previous work. |
Related Work | Most well-known is the Stanford named entity recognition (NER) tagger (Finkel 2005) which assigns coarse-grained types like person, organization, location, and other to noun phrases that are likely to denote entities. |
Experiments & Results 4.1 Experimental Setup | From the oovs, we exclude numbers as well as named entities . |
Experiments & Results 4.1 Experimental Setup | We apply a simple heuristic to detect named entities: basically words that are capitalized in the original deV/test set that do not appear at the beginning of a sentence are named entities . |
Introduction | Although this is helpful in translating a small fraction of oovs such as named entities for languages with same writing systems, it harms the translation in other types of oovs and distant language pairs. |
Introduction | Many tweet related researches are inspired, from named entity recognition (Liu et al., 2012), topic detection (Mathioudakis and Koudas, 2010), clustering (Rosa et al., 2010), to event extraction (Grinev et al., 2009). |
Introduction | (2012), on average a named entity has 3.3 different surface forms in tweets. |
Task Definition | First, we assume that mentions are given, e.g., identified by some named entity recognition system. |
Abstract | This paper studies the problem of mining named entity translations from comparable corpora with some “asymmetry”. |
Abstract | Our experimental results on English-Chinese corpora show that our selective propagation approach outperforms the previous approaches in named entity translation in terms of the mean reciprocal rank by up to 0.16 for organization names, and 0.14 in a low comparability case. |
Introduction | This task is more challenging in the presence of multilingual text, because translating named entities (NEs), such as persons, locations, or organizations, is a nontrivial task. |
Experiments | Depending on the document category, we found some variations as to which hierarchy was learned in each setting, but we noticed that parameters starting with right and left gramtypes often produced quite good hierarchies: for instance right gramtype —> left gramtype —> same sentence —> right named entity type. |
Introduction | The main question we raise is, given a set of indicators (such as grammatical types, distance between two mentions, or named entity types), how to best partition the pool of mention pair examples in order to best discriminate coreferential pairs from non coreferential ones. |
System description | We used classical features that can be found in details in (Bengston and Roth, 2008) and (Rah-man and Ng, 2011): grammatical type and subtype of mentions, string match and substring, apposition and copula, distance (number of separating mentions/sentences/words), gender/number match, synonymy/hypemym and animacy (using WordNet), family name (based on lists), named entity types, syntactic features (gold parse) and anaphoricity detection. |