Index of papers in Proc. ACL that mention
  • named entity
Han, Xianpei and Zhao, Jun
Abstract
Name ambiguity problem has raised urgent demands for efficient, high-quality named entity disambiguation methods.
Abstract
In recent years, the increasing availability of large-scale, rich semantic knowledge sources (such as Wikipedia and WordNet) creates new opportunities to enhance the named entity disambiguation by developing algorithms which can exploit these knowledge sources at best.
Abstract
This paper proposes a knowledge-based method, called Structural Semantic Relatedness (SSR), which can enhance the named entity disambiguation by capturing and leveraging the structural semantic knowledge in multiple knowledge sources.
Introduction
So there is an urgent demand for efficient, high-quality named entity disambiguation methods.
Introduction
Currently, the common methods for named entity disambiguation include name observation clustering (Bagga and Baldwin, 1998) and entity linking with knowledge base (McNa-mee and Dang, 2009).
Introduction
Given a set of observations 0 = {01, 02, on} of the target name to be disambiguated, a named entity disambiguation system should group them into a set of clusters C = {c1, c2, cm}, with each resulting cluster corresponding to one specific entity.
named entity is mentioned in 37 sentences in this paper.
Topics mentioned in this paper:
Finkel, Jenny Rose and Manning, Christopher D.
Abstract
Experiments on joint parsing and named entity recognition, using the OntoNotes corpus, show that our hierarchical joint model can produce substantial gains over a joint model trained on only the jointly annotated data.
Base Models
Our hierarchical joint model is composed of three separate models, one for just named entity recognition, one for just parsing, and one for joint parsing and named entity recognition.
Base Models
4.1 Semi-CRF for Named Entity Recognition
Hierarchical Joint Learning
Our experiments are on a joint parsing and named entity task, but the technique is more general and only requires that the base models (the joint model and single-task models) share some features.
Hierarchical Joint Learning
This section covers the general technique, and we will cover the details of the parsing, named entity , and joint models that we use in Section 4.
Hierarchical Joint Learning
Features which don’t apply to a particular model type (e. g., parse features in the named entity model) will always be zero, so their weights have no impact on that model’s likelihood function.
Introduction
These high-level systems typically combine the outputs from many low-level systems, such as parsing, named entity recognition (NER) and coreference resolution.
Introduction
When trained separately, these single-task models can produce outputs which are inconsistent with one another, such as named entities which do not correspond to any nodes in the parse tree (see Figure l for an example).
Introduction
Because a named entity should correspond to a node in the parse tree, strong evidence about either aspect of the model should positively impact the other aspect
named entity is mentioned in 26 sentences in this paper.
Topics mentioned in this paper:
Tomanek, Katrin and Hahn, Udo and Lohmann, Steffen and Ziegler, Jürgen
Cognitively Grounded Cost Modeling
These time tags indicate the time it took to annotate the respective phrase for named entity mentions of the types person, location, and organization.
Experimental Design
In our study, we applied, for the first time ever to the best of our knowledge, eye-tracking to study the cognitive processes underlying the annotation of linguistic meta-data, named entities in particular.
Experimental Design
These articles come already annotated with three types of named entities considered important in the newspaper domain, viz.
Experimental Design
An example consists of a text document having one single annotation phrase highlighted which then had to be semantically annotated with respect to named entity mentions.
Introduction
We have seen large-scale efforts in syntactic and semantic annotations in the past related to P08 tagging and parsing, on the one hand, and named entities and relations (propositions), on the other hand.
Results
In this case, words were searched in the upper context, which according to their orthographic signals might refer to a named entity but which could not completely be resolved only relying on the information given by the annotation phrase itself and its embedding sentence.
Summary and Conclusions
In this paper, we explored the use of eye-tracking technology to investigate the behavior of human annotators during the assignment of three types of named entities — persons, organizations and locations — based on the eye-mind assumption.
Summary and Conclusions
of complex cognitive tasks (such as gaze duration and gaze movements for named entity annotation) to explanatory models (in our case, a time-based cost model for annotation) follows a much warranted avenue in research in NLP where feature farming becomes a theory-driven, explanatory process rather than a much deplored theory-blind engineering activity (cf.
named entity is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Zhou, Deyu and Chen, Liangyu and He, Yulan
Introduction
(2012) presented a system called TwiCal which extracts an open-domain calendar of significant events represented by a 4-tuple set including a named entity , event phrase, calendar date, and event type from Twitter.
Introduction
Here, event elements include named entities such as person, company, organization, date/time, location, and the relations among them.
Methodology
Events extracted in our proposed framework are represented as a 4-tuple (y, d, l, k), where 3/ stands for a non-location named entity , d for a date, I for a location, and k for an event-related keyword.
Methodology
Tweets are preprocessed by time expression recognition, named entity recognition, POS tagging and stemming.
Methodology
Named Entity Recognition.
named entity is mentioned in 25 sentences in this paper.
Topics mentioned in this paper:
LIU, Xiaohua and ZHANG, Shaodian and WEI, Furu and ZHOU, Ming
Abstract
The challenges of Named Entities Recognition (NER) for tweets lie in the insufficient information in a tweet and the unavailability of training data.
Experiments
We use the Twigg SDK 7 to crawl all tweets from April 20th 2010 to April 25th 2010, then drop non-English tweets and get about 11,371,389, from which 15,800 tweets are randomly sampled, and are then labeled by two independent annotators, so that the beginning and the end of each named entity are marked with <TYPE> and </TYPE>, respectively.
Introduction
Named Entities Recognition (NER) is generally understood as the task of identifying mentions of rigid designators from text belonging to named-entity types such as persons, organizations and locations (Nadeau and Sekine, 2007).
Related Work
(2010) use Amazons Mechanical Turk service 2 and CrowdFlower 3 to annotate named entities in tweets and train a CRF model to evaluate the effectiveness of human labeling.
Related Work
In contrast, our work aims to build a system that can automatically identify named entities in tweets.
Task Definition
Twitter users are interested in named entities , such
Task Definition
Figure 1: Portion of different types of named entities in tweets.
Task Definition
as person names, organization names and product names, as evidenced by the abundant named entities in tweets.
named entity is mentioned in 15 sentences in this paper.
Topics mentioned in this paper:
Scheible, Christian and Schütze, Hinrich
Distant Supervision
As this classifier uses training data that is biased towards a specialized case (sentences containing the named entity types creators and characters), it does not generalize well to other S-relevance problems and thus yields lower performance on the full dataset.
Distant Supervision
First, the classifier only sees a subset of examples that contain named entities , making generalization to other types of expressions difficult.
Distant Supervision
Conditions 4-8 train supervised classifiers based on the labels from DSlabels+MinCut: (4) MaXEnt with named entities (NE); (5) MaXEnt with NE and semantic (SEM) features; (6) CRF with NE; (7) MaXEnt with NE and sequential (SQ) features; (8) MaXEnt with NE, SQ, and SEM.
Features
5.2 Named Entities
Features
As standard named entity recognition (NER) systems do not capture categories that are relevant to the movie domain, we opt for a lexicon-based approach similar to (Zhuang et al., 2006).
Features
If a capitalized word occurs, we check whether it is part of an already recognized named entity .
Related Work
We follow their approach by using IMDb to define named entity features.
Sentiment Relevance
An error analysis for the classifier trained on P&L shows that many sentences misclassified as S-relevant (fpSR) contain polar words; for example, Then, the situation turns M. In contrast, sentences misclassified as S-nonrelevant (fpSNR) contain named entities or plot and movie business vocabulary; for example, Ti_m Roth delivers the most impressive MM by getting the M language right.
named entity is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Rokhlenko, Oleg and Szpektor, Idan
Comparable Question Mining
As a preprocessing step for detecting comparable relations, our extraction algorithm identifies all the named entities of interest in our corpus, keeping only questions that contain at least two entities.
Comparable Question Mining
Answers containing two named entities , e. g. “Is #1 dating #2 ?”, our CRF tagger is trained to detect only comparable relations like “Who is prettier #1 or #2 ?”.
Comparable Question Mining
Input: A news article Output: A sorted list of comparable questions 1: Identify all target named entities (NEs) in the article 2: Infer the distribution of LDA topics for the article 3: For each comparable relation R in the database, compute its relevance score to be the similarity between the topic distributions of R and the article 4: Rank all the relations according to their relevance score and pick the top M as relevant 5: for each relevant relation R in the order of relevance ranking do 6: Filter out all the target NEs that do not pass the single entity classifier for R 7: Generate all possible NE pairs from the those that passed the single classifier 8: Filter out all the generated NE pairs that do not pass the entity pair classifier for R 9: Pick up the top N pairs with positive classification score to be qualified for generation
Motivation and Algorithmic Overview
Looking at the structure of comparable questions, we observed that a specific comparable relation, such as ‘better dad’ and ‘faster’, can usually be combined with named entities in several syntactic ways to construct a concrete question.
Online Question Generation
For each relevant relation, we then generate concrete questions by picking generic templates that are applicable for this relation and instantiating them with pairs of named entities appearing in the article.
Online Question Generation
To this end, we utilize two different broad-scale sources of information about named entities .
Online Question Generation
The first is DBPedia3, which contains structured information on entries in Wikipedia, many of them are named entities that appear in news articles.
named entity is mentioned in 22 sentences in this paper.
Topics mentioned in this paper:
Kim, Sungchul and Toutanova, Kristina and Yu, Hwanjo
Abstract
In this paper we propose a method to automatically label multilingual data with named entity tags.
Data and task
Of these, we manually annotated 91 English-Bulgarian and 79 English-Korean sentence pairs with source and target named entities as well as word-alignment links among named entities in the two languages.
Data and task
The named entity annotation scheme followed has the labels GPE (Geopolitical entity), PER (Person), ORG (Organization), and DATE.
Data and task
The other is that the same information might be expressed using a named entity in one language, and using a nonentity phrase in the other language (e. g. “He is from Bulgaria” versus “He is Bulgarian”).
Introduction
Named Entity Recognition (NER) is a frequently needed technology in NLP applications.
named entity is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Guo, Weiwei and Li, Hao and Ji, Heng and Diab, Mona
Abstract
We show that using tweet specific feature (hashtag) and news specific feature ( named entities ) as well as temporal constraints, we are able to extract text-to-text correlations, and thus completes the semantic picture of a short text.
Creating Text-to-text Relations via Twitter/News Features
4.1 Hashtags and Named Entities
Creating Text-to-text Relations via Twitter/News Features
Named entities are some of the most salient features in a news article.
Creating Text-to-text Relations via Twitter/News Features
Directly applying Named Entity Recognition (NER) tools on news titles or
Introduction
such as named entities in a document.
Introduction
Named entities acquired from a news document, typically with high accuracy using Named Entity Recognition [NER] tools, may be particularly informative.
named entity is mentioned in 19 sentences in this paper.
Topics mentioned in this paper:
Darwish, Kareem
Abstract
Some languages lack large knowledge bases and good discriminative features for Name Entity Recognition (NER) that can generalize to previously unseen named entities .
Introduction
Named Entity Recognition (NER) is essential for a variety of Natural Language Processing (NLP) applications such as information extraction.
Introduction
- Contextual features: Certain words are indicative of the existence of named entities .
Introduction
For example, the word “said” is often preceded by a named entity of type “person” or “organization”.
named entity is mentioned in 26 sentences in this paper.
Topics mentioned in this paper:
Yogatama, Dani and Sim, Yanchuan and Smith, Noah A.
Abstract
We present a statistical model for canonicalizing named entity mentions into a table whose rows represent entities and whose columns are attributes (or parts of attributes).
Experiments
We collected named entity mentions from two corpora: political blogs and sports news.
Experiments
Due to the large size of the corpora, we uniformly sampled a subset of documents for each corpus and ran the Stanford NER tagger (Finkel et al., 2005), which tagged named entities mentions as person, location, and organization.
Experiments
We used named entity of type person from the political blogs corpus, while we are interested in person and organization entities for the sports news corpus.
Introduction
tributes through transductive learning from named entity mentions with a small number of seeds (see
Introduction
by a named entity recognizer, along with their con-
Introduction
As a result, the model discovers parts of names—(Mrs., Michelle, Obama)—while simultaneously performing coreference resolution for named entity mentions.
Model
In our experiments these are obtained by running a named entity recognizer.
named entity is mentioned in 14 sentences in this paper.
Topics mentioned in this paper:
Lin, Dekang and Wu, Xiaoyun
Abstract
To demonstrate the power and generality of this approach, we apply the method in two very different applications: named entity recognition and query classification.
Conclusions
We demonstrated the power and generality of this approach on two very different applications: named entity recognition and query classification.
Discussion and Related Work
The named entity problem in Section 3 and the query classification problem in Section 4 exemplify the two scenarios.
Distributed K-Means clustering
For example, in the named entity recognition task, categorical clusters are more successful, whereas in query categorization, the topical clusters are much more beneficial.
Introduction
They have become the workhorse in almost all subareas and components of NLP, including part-of-speech tagging, chunking, named entity recognition and parsing.
Introduction
This method has been shown to be quite successful in named entity recognition (Miller et al.
Introduction
We demonstrate the advantages of phrase-based clusters over word-based ones with experimental results from two distinct application domains: named entity recognition and query classification.
Named Entity Recognition
Named entity recognition (NER) is one of the first steps in many applications of information extraction, information retrieval, question answering and other applications of NLP.
Named Entity Recognition
Here, 5 denotes a position in the input sequence; ys is a label that indicates whether the token at position s is a named entity as well as its type; w“
Named Entity Recognition
The Top CoNLL 2003 systems all employed gazetteers or other types of specialized resources (e.g., lists of words that tend to co-occur with certain named entity types) in addition to part-of-speech tags.
named entity is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Whitney, Max and Sarkar, Anoop
Graph propagation
Figure 2: A DL from iteration 5 of Yarowsky on the named entity task.
Graph propagation
The task of Collins and Singer (1999) is named entity classification on data from New York Times text.7 The data set was preprocessed by a statistical parser (Collins, 1997) and all noun phrases that are potential named entities were extracted from the parse tree.
Graph propagation
The test data additionally contains some noise examples which are not in the three named entity categories.
named entity is mentioned in 12 sentences in this paper.
Topics mentioned in this paper:
Minkov, Einat and Zettlemoyer, Luke
Problem Setting
In Figure 2, these relations are denoted by double-line boundary, including location, person, title, date and time; every tuple of these relations maps to a named entity mention.1
Problem Setting
Figure 3 demonstrates the correct mapping of named entity mentions to tuples, as well as tuple unification, for the example shown in Figure 1.
Related Work
In addition to proper nouns ( named entity mentions) that are considered in this work, they also account for nominal and pronominal noun mentions.
Related Work
Interestingly, several researchers have attempted to model label consistency and high-level relational constraints using state-of-the-art sequential models of named entity recognition (NER).
Related Work
In the proposed framework we take a bottom-up approach to identifying entity mentions in text, where given a noisy set of candidate named entities , described using rich semantic and surface features, discriminative learning is applied to label these mentions.
Seminar Extraction Task
We used a set of rules to extract candidate named entities per the types specified in Figure 2.4 The rules encode information typically used in NER, including content and contextual patterns, as well as lookups in available dictionaries (Finkel et al., 2005; Minkov et al., 2005).
Seminar Extraction Task
recall for the named entities of type date and time is near perfect, and is estimated at 96%, 91% and 90% for location, speaker and title, respectively.
Seminar Extraction Task
Another feature encodes the size of the most semantically detailed named entity that maps to a field; for example, the most detailed entity mention of type stime in Figure l is “3:30”, comprising of two attribute values, namely hour and minutes.
Structured Learning
Named entity recognition.
Structured Learning
The candidate tuples generated using this procedure are structured entities, constructed using typed named entity recognition, unification, and hierarchical assignment of field values (Figure 3).
named entity is mentioned in 15 sentences in this paper.
Topics mentioned in this paper:
Liu, Xiaohua and Zhou, Ming and Zhou, Xiangyang and Fu, Zhongyang and Wei, Furu
Abstract
Tweets represent a critical source of fresh information, in which named entities occur frequently with rich variations.
Abstract
We study the problem of named entity normalization (N EN ) for tweets.
Abstract
Two main challenges are the errors propagated from named entity recognition (NER) and the dearth of information in a single tweet.
Introduction
As a result, the task of named entity recognition (NER) for tweets, which aims to identify mentions of rigid designators from tweets belonging to named-entity types such as persons, organizations and locations (2007), has attracted increasing research interest.
Introduction
entities and then uses a distantly supervised approach based on LabeledLDA to classify named entities .
Introduction
However, named entity normalization (NEN) for tweets, which transforms named entities mentioned in tweets to their unambiguous canonical forms, has not been well studied.
Related Work
(2010) use Amazons Mechanical Turk service 3 and CrowdFlower 4 to annotate named entities in tweets and train a CRF model to evaluate the effectiveness of human labeling.
Related Work
(2011) rebuild the NLP pipeline for tweets beginning with POS tagging, through chunking, to NER, which first exploits a CRF model to segment named entities and then uses a distantly supervised approach based on LabeledLDA to classify named entities .
Related Work
Unlike this work, our work detects the boundary and type of a named entity simultaneously using sequential labeling techniques.
named entity is mentioned in 14 sentences in this paper.
Topics mentioned in this paper:
Pitler, Emily and Louis, Annie and Nenkova, Ani
Indicators of linguistic quality
3.2 Reference form: Named entities
Indicators of linguistic quality
This set of features examines whether named entities have informative descriptions in the summary.
Indicators of linguistic quality
We focus on named entities because they appear often in summaries of news documents and are often not known to the reader beforehand.
Results and discussion
Named ent .
Results and discussion
The classes of features specific to named entities and noun phrase syntax are the weakest predictors.
Results and discussion
Named ent .
named entity is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Kondadadi, Ravi and Howald, Blake and Schilder, Frank
Background
(2013) where, in a given corpus, a combination of domain specific named entity tagging and clustering sentences (based on semantic predicates) were used to generate templates.
Methodology
The DRS consists of semantic predicates and named entity tags.
Methodology
In parallel, domain specific named entity tags are identified and, in conjunction with the semantic predicates, are used to create templates.
Methodology
For example, in (2), using the templates in (le-f), the identified named entities are assigned to a clustered CuId (2ab).
named entity is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Speriosu, Michael and Baldridge, Jason
Data
Toponyms were annotated by a semiautomated process: a named entity recognizer identified toponyms, and then coordinates were assigned using simple rules and corrected by hand.
Evaluation
when a named entity recognizer is used to identify toponyms.
Evaluation
We primarily present results from experiments with gold toponyms but include an accuracy measure for comparability with results from experiments run on plain text with a named entity recognizer.
Introduction
(2010) use relationships learned between people, organizations, and locations from Wikipedia to aid in toponym resolution when such named entities are present, but do not exploit any other textual context.
Introduction
However, it is important to consider the utility of an end-to-end toponym identification and resolution system, so we also demonstrate that performance is still strong when toponyms are detected with a standard named entity recognizer.
Results
The named entity recognizer is likely better at detecting common toponyms than rare toponyms due to the na-
Results
We also measured the mean and median error distance for toponyms correctly identified by the named entity recognizer, and found that they tended to be 50-200km worse than for gold toponyms.
Results
This also makes sense given the named entity recognizer’s tendency to detect common toponyms: common toponyms tend to be more ambiguous than others.
Toponym Resolvers
To create the indirectly supervised training data for WISTR, the OpenNLP named entity recognizer detects toponyms in GEOWIKI, and candidate locations for each toponym are retrieved from GEONAMES.
named entity is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Kazama, Jun'ichi and Torisawa, Kentaro
Abstract
We propose using large-scale clustering of dependency relations between verbs and multi-word nouns (MN 5) to construct a gazetteer for named entity recognition (N ER).
Experiments
We obtained 11,892 sentences13 with 18,677 named entities .
Experiments
We define “# e-matches” as the number of matches that also match a boundary of a named entity in the training set, and “# optimal” as the optimal number of “# e-matches” that can be achieved when we know the
Experiments
These gazetteers cover 40 - 50% of the named entities , and the cluster gazetteers have relatively wider coverage than the Wikipedia gazetteer has.
Gazetteer Induction 2.1 Induction by MN Clustering
Figure 2: Clean MN clusters with named entity entries (Left: car brand names.
Introduction
Gazetteers, or entity dictionaries, are important for performing named entity recognition (NER) accurately.
Introduction
from a multi-word noun (MN)1 to named entity categories such as “Tokyo Stock Exchange —> {ORGANIZATION}”.2 However, since the correspondence between the labels and the NE categories can be learned by tagging models, a gazetteer will be useful as long as it returns consistent labels even if those returned are not the NE categories.
Introduction
Therefore, performing the clustering with a vocabulary that is large enough to cover the many named entities required to improve the accuracy of NER is difficult.
Related Work and Discussion
To cover most of the named entities in the data, we need much larger gazetteers.
named entity is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Hermjakob, Ulf and Knight, Kevin and Daumé III, Hal
Discussion
Improved named entity translation accuracy as measured by the NEWA metric in general, and a reduction in dropped names in particular is clearly valuable to the human reader of machine translated documents as well as for systems using machine translation for further information processing.
End-to-End results
Table 2: Name translation accuracy with respect to BBN and re-annotated Gold Standard on 1730 named entities in 637 sentences.
Evaluation
General MT metrics such as BLEU, TER, METEOR are not suitable for evaluating named entity translation and transliteration, because they are not focused on named entities (NEs).
Evaluation
The general idea of the Named Entity Weak Accuracy (NEWA) metric is to
Evaluation
BBN kindly provided us with an annotated Arabic text corpus, in which named entities were marked up with their type (e. g. GPE for Geopolitical Entity) and one or more English translations.
Introduction
0 Not all named entities should be transliterated.
Introduction
Many named entities require a mix of transliteration and translation.
Introduction
We ask: what percentage of source-language named entities are translated correctly?
Learning what to transliterate
As already mentioned in the introduction, named entity (NE) identification followed by MT is a bad idea.
named entity is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Charton, Eric and Meurs, Marie-Jean and Jean-Louis, Ludovic and Gagnon, Michel
Abstract
First, named entities are linked to candidate Wikipedia pages by a generic annotation engine.
Abstract
Then, the algorithm re—ranks candidate links according to mutual relations between all the named entities found in the document.
Experiments and Results
In this task, mentions of entities found in a document collection must be linked to entities in a reference KB, or to new named entities discovered in the collection.
Experiments and Results
We observe that the complete algorithm (co-references, named entity labels and MDP) provides the best results on PER NE links.
Introduction
The Entity Linking (EL) task consists in linking name mentions of named entities (NEs) found in a document to their corresponding entities in a reference Knowledge Base (KB).
Introduction
Various approaches have been proposed to solve the named entity disambiguation (NED) problem.
Introduction
In the context of the Named Entity Recognition (NER) task, such resources can be generic and generative.
Proposed Algorithm
3.2 Named Entity Label Correction
named entity is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Wang, Mengqiu and Che, Wanxiang and Manning, Christopher D.
Abstract
Translated bi-texts contain complementary language cues, and previous work on Named Entity Recognition (NER) has demonstrated improvements in performance over monolingual taggers by promoting agreement of tagging decisions between the two languages.
Experimental Setup
We train the two CRF models on all portions of the OntoNotes corpus that are annotated with named entity tags, except the parallel-aligned portion which we reserve for development and test purposes.
Experimental Setup
Out of the 18 named entity types that are annotated in OntoNotes, which include person, location, date, money, and so on, we select the four most commonly seen named entity types for evaluation.
Introduction
We study the problem of Named Entity Recognition (NER) in a bilingual context, where the goal is to annotate parallel bi-texts with named entity tags.
Introduction
We can also automatically construct a named entity translation lexicon by annotating and extracting entities from bi-texts, and use it to improve MT performance (Huang and Vogel, 2002; Al-Onaizan and Knight, 2002).
Introduction
As a result, we can find complementary cues in the two languages that help to disambiguate named entity mentions (Brown et al., 1991).
Related Work
set of heuristic rules to expand a candidate named entity set generated by monolingual taggers, and then rank those candidates using a bilingual named entity dictionary.
named entity is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Arnold, Andrew and Nallapati, Ramesh and Cohen, William W.
Abstract
We present a novel hierarchical prior structure for supervised transfer learning in named entity recognition, motivated by the common structure of feature spaces for this task across natural language data sets.
Conclusions, related & future work
In this work we have introduced hierarchical feature tree priors for use in transfer learning on named entity extraction tasks.
Introduction
Consider the task of named entity recognition (NER).
Introduction
Having successfully trained a named entity classifier on this news data, now consider the problem of learning to classify tokens as names in email data.
Introduction
In particular, we develop a novel prior for named entity recognition that exploits the hierarchical feature space often found in natural language domains (§l.2) and allows for the transfer of information from labeled datasets in other domains (§l.3).
Investigation
The goal of our experiments was to see to what degree named entity recognition problems naturally conformed to hierarchical methods, and not just to achieve the highest performance possible.
named entity is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Richman, Alexander E. and Schone, Patrick
Abstract
In this paper, we describe a system by which the multilingual characteristics of Wikipedia can be utilized to annotate a large corpus of text with Named Entity Recognition (NER) tags requiring minimal human intervention and no linguistic expertise.
Abstract
We show how the Wikipedia format can be used to identify possible named entities and discuss in detail the process by which we use the Category structure inherent to Wikipedia to determine the named entity type of a proposed entity.
Conclusions
In conclusion, we have demonstrated that Wikipedia can be used to create a Named Entity Recognition system with performance comparable to one developed from 15-40,000 words of human-anno-tated newswire, while not requiring any linguistic expertise on the part of the user.
Introduction
Named Entity Recognition (NER) has long been a major task of natural language processing.
Training Data Generation
We elected to use the ACE Named Entity types PERSON, GPE (GeoPolitical Entities), ORGANIZATION, VEHICLE, WEAPON, LOCATION, FACILITY, DATE, TIME, MONEY, and PERCENT.
Training Data Generation
Other categories can reliably be used to determine that the article does not refer to a named entity , such as “CategoryzEndangered species.” We manually derived a relatively small set of key phrases, the most important of which are shown in Table 1.
Wikipedia 2.1 Structure
Toral and Munoz (2006) used Wikipedia to create lists of named entities .
Wikipedia 2.1 Structure
Cucerzan (2007), by contrast to the above, used Wikipedia primarily for Named Entity Disambiguation, following the path of Bunescu and Pasca (2006).
named entity is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Huang, Jian and Taylor, Sarah M. and Smith, Jonathan L. and Fotiadis, Konstantinos A. and Giles, C. Lee
Conclusions
KARC partitions named entities based on their profiles constructed by an information extraction tool.
Experiments
First, the attribute information about the person named entity includes first/middle/last names, gender, mention, etc.
Experiments
In addition, AeroText extracts relationship information between named entities , such as Family, List, Employment, Ownership, Citizen-Resident-Religion-Ethnicity and so on, as specified in the ACE evaluation.
Experiments
This permits inexact matching of named entities due to name
Introduction
A named entity that represents a person, an organization or a geo-location may appear within and across documents in different forms.
Introduction
Cross document coreference (CDC) is the task of consolidating named entities that appear in multiple documents according to their real referents.
Methods 2.1 Document Level and Profile Based CDC
Document level CDC makes a simplifying assumption that a named entity (and its variants) in a document has one underlying real identity.
Methods 2.1 Document Level and Profile Based CDC
The named entity President Bush is extracted from the sentence “President Bush addressed the nation from the Oval Office Monday.”
named entity is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Mintz, Mike and Bills, Steven and Snow, Rion and Jurafsky, Daniel
Architecture
in sentences using a named entity tagger that labels persons, organizations and locations.
Architecture
In the testing step, entities are again identified using the named entity tagger.
Features
5.3 Named entity tag features
Features
Every feature contains, in addition to the content described above, named entity tags for the two entities.
Features
We perform named entity tagging using the Stanford four-class named entity tagger (Finkel et al., 2005).
Implementation
In preprocessing, consecutive words with the same named entity tag are ‘chunked’, so that Edwin/PERSON Hubble/PERSON becomes [Edwin Hubble] /PERSON.
Previous work
For example, the DIPRE algorithm by Brin (1998) used string-based regular expressions in order to recognize relations such as author-book, while the SNOWBALL algorithm by Agichtein and Gravano (2000) learned similar regular expression patterns over words and named entity tags.
named entity is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Elson, David and Dames, Nicholas and McKeown, Kathleen
Data Analysis
Not surprisingly, the most oft-repeated named entity in the text is I , referring to the narrator.
Extracting Conversational Networks from Literature
In a conversational network, vertices represent characters (assumed to be named entities ) and edges indicate at least one instance of dialogue interaction between two characters over the course of the novel.
Extracting Conversational Networks from Literature
For each named entity , we generate variations on the name that we would expect to see in a coreferent.
Extracting Conversational Networks from Literature
For each named entity, we compile a list of other named entities that may be coreferents, either because they are identical or because one is an expected variation on the other.
named entity is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Stoyanov, Veselin and Gilbert, Nathan and Cardie, Claire and Riloff, Ellen
Coreference Subtask Analysis
This section examines the role of three such subtasks — named entity recognition, anaphoricity determination, and coreference element detection — in the performance of an end-to-end coreference resolution system.
Coreference Subtask Analysis
We employ the Stanford CRF-based Named Entity Recognizer (Finkel et al., 2004) for named entity tagging.
Coreference Subtask Analysis
3.3 Named Entities
Coreference Task Definitions
However, the MUC definition excludes (l) “nested” named entities (NEs) (e.g.
Introduction
With these in mind, Section 3 first examines three subproblems that play an important role in coreference resolution: named entity recognition, anaphoricity determination, and coreference element detection.
Introduction
Our results suggest that the availability of accurate detectors for anaphoricity or coreference elements could substantially improve the performance of state-of-the-art resolvers, while improvements to named entity recognition likely offer little gains.
Resolution Complexity
Proper Names: Three resolution classes cover CEs that are named entities (e.g.
named entity is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Tsuruoka, Yoshimasa and Tsujii, Jun'ichi and Ananiadou, Sophia
Abstract
We evaluate the effectiveness of our method in three applications: text chunking, named entity recognition, and part-of-speech tagging.
Introduction
We evaluate the effectiveness of our method by using linear-chain conditional random fields (CRFs) and three traditional NLP tasks, namely, text chunking (shallow parsing), named entity recognition, and POS tagging.
Log-Linear Models
The model is used for a variety of sequence labeling tasks such as POS tagging, chunking, and named entity recognition.
Log-Linear Models
We evaluate the effectiveness our training algorithm using linear-chain CRF models and three NLP tasks: text chunking, named entity recognition, and POS tagging.
Log-Linear Models
4.2 Named Entity Recognition
named entity is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Yih, Wen-tau and Chang, Ming-Wei and Meek, Christopher and Pastusiak, Andrzej
Experiments
All these systems incorporated lexical semantics features derived from WordNet and named entity features.
Experiments
Features used in the experiments can be categorized into six types: identical word matching (I), lemma matching (L), WordNet (WN), enhanced Lexical Semantics (LS), Named Entity matching (NE) and Answer type checking (Ans).
Experiments
Named entity matching (NE) checks whether two words are individually part of some named entities with the same type.
Lexical Semantic Models
For instance, when a word refers to a named entity , the particular sense and meaning is often not encoded.
named entity is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Yao, Xuchen and Van Durme, Benjamin and Clark, Peter
Background
(2007) proposed indexing text with their semantic roles and named entities .
Background
Queries then include constraints of semantic roles and named entities for the predicate and its arguments in the question.
Experiments
sentence-segmented and word-tokenized by NLTK (Bird and Loper, 2004), dependency-parsed by the Stanford Parser (Klein and Manning, 2003), and NER-tagged by the Illinois Named Entity Tagger (Ratinov and Roth, 2009) with an 18-label type set.
Introduction
Moreover, this approach is more robust against, e.g., entity recognition errors, because answer typing knowledge is learned from how the data was actually labeled, not from how the data was assumed to be labeled (e. g., manual templates usually assume perfect labeling of named entities , but often it is not the case
Introduction
This will be our off-the-shelf QA system, which recognizes the association between question type and expected answer types through various features based on e.g., part-of-speech tagging (POS) and named entity recognition (NER).
Introduction
Moreover, our approach extends easily beyond fixed answer types such as named entities : we are already using POS tags as a demonstration.
Method
5Ogilvie (2010) showed in chapter 4.3 that keyword and named entities based retrieval actually outperformed SRL—based structured retrieval in MAP for the answer-bearing sentence retrieval task in their setting.
named entity is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Alhelbawy, Ayman and Gaizauskas, Robert
Abstract
Named Entity Disambiguation (NED) refers to the task of mapping different named entity mentions in running text to their correct interpretations in a specific knowledge base (KB).
Introduction
Named entities (NEs) have received much attention over the last two decades (Nadeau and Sekine, 2007), mostly focused on recognizing the boundaries of textual NE mentions and classifying them as, e.g., Person, Organization or Location.
Introduction
Named Entity Disambiguation
Related Work
The second line of approach is collective named entity disambiguation (CNED), where all mentions of entities in the document are disambiguated jointly.
Solution Graph
cos: The cosine similarity between the named entity textual mention and the KB entry title.
Solution Graph
Named Entity Selection: The simplest approach is to select the highest ranked entity in the list for each mention mi according to equation 5, where R could refer to Rm or R5.
Solution Graph
Our results show that Page-Rank in conjunction with re-ranking by initial confidence score can be used as an effective approach to collectively disambiguate named entity textual mentions in a document.
named entity is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Navigli, Roberto and Ponzetto, Simone Paolo
BabelNet
the general term concept to denote either a concept or a named entity .
Conclusions
Firstly, the two resources contribute different kinds of lexical knowledge, one is concerned mostly with named entities , the other with concepts.
Conclusions
Thus, even when they overlap, the two resources provide complementary information about the same named entities or concepts.
Introduction
These include, among others, text summarization (Nastase, 2008), Named Entity Recognition (Bunescu and Pasca, 2006), Question Answering (Harabagiu et al., 2000) and text categorization (Gabrilovich and Markovitch, 2006).
Introduction
Second, such resources are typically lexicographic, and thus contain mainly concepts and only a few named entities .
Introduction
The result is an “encyclopedic dictionary”, that provides concepts and named entities lexical-ized in many languages and connected with large amounts of semantic relations.
Methodology
A Wikipedia page (henceforth, Wikipage) presents the knowledge about a specific concept (e. g. BALLOON (AIRCRAFT)) or named entity (e.g.
named entity is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Flanigan, Jeffrey and Thomson, Sam and Carbonell, Jaime and Dyer, Chris and Smith, Noah A.
Concept Identification
o NER: 1 if the named entity tagger marked the span as an entity, 0 otherwise.
Concept Identification
clex also has a set of rules for generating concept fragments for named entities and time expressions.
Concept Identification
It generates a concept fragment for any entity recognized by the named entity tagger, as well as for any word sequence matching a regular expression for a time expression.
Notation and Overview
Each stage is a discriminatively-trained linear structured predictor with rich features that make use of part-of-speech tagging, named entity tagging, and dependency parsing.
Training
0 Input: X, a sentence annotated with named entities (person, organization, location, mis-ciscellaneous) from the Illinois Named Entity Tagger (Ratinov and Roth, 2009), and part-of-speech tags and basic dependencies from the Stanford Parser (Klein and Manning, 2003; de Mameffe et al., 2006).
Training
l. ( Named Entity ) Applies to name concepts and their opn children.
Training
(Fuzzy Named Entity ) Applies to name concepts and their opn children.
named entity is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Radziszewski, Adam
Related works
The task of phrase lemmatisation bears a close resemblance to a more popular task, namely lemmatisation of named entities .
Related works
type of named entities considered, those two may be solved using similar or significantly different methodologies.
Related works
Hence, the main challenge is to define a similarity metric between named entities (Piskorski et al., 2009; Kocon and Piasecki, 2012), which can be used to match different mentions of the same names.
named entity is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Kessler, Rémy and Tannier, Xavier and Hagège, Caroline and Moriceau, Véronique and Bittar, André
Experiments and Results
Other features: 1) Lucene’s best ranking of the date 2) Number of times where the date is absolute in the text 3) Number of times where the date is relative (but normalized) in the text 4) Total number of keywords of the query in the title, sentence and named entities of retrieved documents 5) Number of times where the date modifies a reported speech verb or is extracted from reported speech.
Related Work
(Swan and Allen, 2000) present an approach to generating graphical timelines that involves extracting clusters of noun phrases and named entities .
Temporal and Linguistic Processing
It also performs named entity recog-
Temporal and Linguistic Processing
nition (NER) of the most usual named entity categories and recognizes temporal expressions.
Temporal and Linguistic Processing
4.2 Named Entity Recognition
named entity is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Sammons, Mark and Vydiswaran, V.G.Vinod and Roth, Dan
Annotation Proposal and Pilot Study
Phenomenon Occurrence Agreement Named Entity 91.67% 0.856 locative 17.62% 0.623 Numerical Quantity 14.05% 0.905 temporal 5 .48% 0.960 nominalization 4.05 % 0.245 implicit relation 1.90% 0.651
Annotation Proposal and Pilot Study
Phenomenon Occurrence Agreement missing argument 16.19% 0.763 missing relation 14.76% 0.708 excluding argument 10.48% 0.952 Named Entity mismatch 9.29% 0.921 excluding relation 5.00% 0.870 disconnected relation 4.52% 0.580 missing modifier 3.81% 0.465 disconnected argument 3.33% 0.764 Numeric Quant.
Introduction
Tasks such as Named Entity and coreference resolution, syntactic and shallow semantic parsing, and information and relation extraction have been identified as worthwhile tasks and pursued by numerous researchers.
NLP Insights from Textual Entailment
ported by their designers were the use of structured representations of shallow semantic content (such as augmented dependency parse trees and semantic role labels); the application of NLP resources such as Named Entity recognizers, syntactic and dependency parsers, and coreference resolvers; and the use of special-purpose ad-hoc modules designed to address specific entailment phenomena the researchers had identified, such as the need for numeric reasoning.
Pilot RTE System Analysis
(b) All top-5 systems make consistent errors in cases where identifying a mismatch in named entities (NE) or numerical quantities (NQ) is important to make the right decision.
Pilot RTE System Analysis
However, we believe that if a system could recognize key negation phenomena such as Named Entity mismatch, presence of Excluding arguments, etc.
named entity is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Li, Jiwei and Ritter, Alan and Hovy, Eduard
Experiments
We assume that attribute values should be either name entities or terms following @ and #.
Experiments
Name entities are extracted using Ritter et al.’s NER system (2011).
Experiments
Consecutive tokens with the same named entity tag are chunked (Mintz et al., 2009).
Model
0 Token-level: for each token t E 6, word identity, word shape, part of speech tags, name entity tags.
named entity is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Huang, Ruihong and Riloff, Ellen
Introduction
Semantic class tagging has been the subject of previous research, primarily under the guises of named entity recognition (NER) and mention detection.
Introduction
Named entity recognizers perform semantic tagging on proper name noun phrases, and
Related Work
Semantic class tagging is most closely related to named entity recognition (NER), mention detection, and semantic lexicon induction.
Related Work
NER systems (e.g., (Bikel et al., 1997; Collins and Singer, 1999; Cucerzan and Yarowsky, 1999; Fleischman and Hovy, 2002) identify proper named entities , such as people, organizations, and locations.
Related Work
NER systems, however, do not identify nominal NP instances (e.g., “a software manufacturer” or “the beach”), or handle semantic classes that are not associated with proper named entities (e.g., symptoms).1 ACE
named entity is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Wang, Aobo and Kan, Min-Yen
Discussion
Table 6: Sample Chinese freestyle named entities that are usernames.
Discussion
Another major group of errors come from what we term freestyle named entities as exemplified in Table 6; i.e., person names in the form of user IDs and nicknames, that have less constraint on form in terms of length, canonical structure (not surnames with given names; as is standard in Chinese names) and may mix alphabetic characters.
Discussion
Most of these belong to the category of Person Name (PER), as defined in CoNLL-200311 Named Entity Recognition shared task.
Methodology
In addition, we employ additional online word lists3 to distinguish named entities and function words from potential informal words.
named entity is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
You, Gae-won and Cha, Young-rok and Kim, Jinhan and Hwang, Seung-won
Abstract
This paper studies named entity translation and proposes “selective temporality” as a new feature, as using temporal features may be harmful for translating “atemporal” entities.
Introduction
Named entity translation discovery aims at mapping entity names for people, locations, etc.
Introduction
As many new named entities appear every day in newspapers and web sites, their translations are nontrivial yet essential.
Introduction
Early efforts of named entity translation have focused on using phonetic feature (called PH) to estimate a phonetic similarity between two names (Knight and Graehl, 1998; Li et al., 2004; Virga and Khudanpur, 2003).
Preliminaries
To identify entities, we use a CRF-based named entity tagger (Finkel et al., 2005) and a Chinese word breaker (Gao et al., 2003) for English and Chinese corpora, respectively.
named entity is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Huang, Hongzhao and Wen, Zhen and Yu, Dian and Ji, Heng and Sun, Yizhou and Han, Jiawei and Li, He
Experiments
Named entities which co-occur at least 6 times with a morph query in the same topic are selected as its target candidates.
Related Work
Other similar research lines are the TAC-KBP Entity Linking (EL) (Ji et al., 2010; Ji et al., 2011), which links a named entity in news and web documents to an appropriate knowledge base (KB) entry, the task of mining name translation pairs from comparable corpora (Udupa et al., 2009; Ji, 2009; Fung and Yee, 1998; Rapp, 1999; Shao and Ng, 2004; Hassan et al., 2007) and the link prediction problem (Adamic and Adar, 2001; Liben-Nowell and Kleinberg, 2003; Sun et al., 2011b;
Target Candidate Identification
However, obviously we cannot consider all of the named entities in these sources as target candidates due to the sheer volume of information.
Target Candidate Identification
In addition, morphs are not limited to named entity forms.
Target Candidate Ranking
Then we apply a hierarchical Hidden Markov Model (HMM) based Chinese lexical analyzer ICTCLAS (Zhang et al., 2003) to extract named entities , noun phrases and events.
named entity is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Takamatsu, Shingo and Sato, Issei and Nakagawa, Hiroshi
Experiments
In Wikipedia articles, named entities were identified by anchor text linking to another article and starting with a capital letter (Yan et al., 2009).
Experiments
than one named entity .
Knowledge-based Distant Supervision
An entity is mentioned as a named entity in text.
Knowledge-based Distant Supervision
Since two entities mentioned in a sentence do not always have a relation, we select entity pairs from a corpus when: (i) the path of the dependency parse tree between the corresponding two named entities in the sentence is no longer than 4 and (ii) the path does not contain a sentence-like boundary, such as a relative clause1 (Banko et al., 2007; Banko and Etzioni, 2008).
Wrong Label Reduction
2If we use a standard named entity tagger, the entity types are Person, Location, and Organization.
named entity is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Chambers, Nathanael and Jurafsky, Dan
Information Extraction: Slot Filling
Finally, we filter extractions whose WordNet or named entity label does not match the learned slot’s type (e.g., a Location does not match a Person).
Learning Templates from Raw Text
We also tag the corpus with an NER system and allow patterns to include named entity types, e.g., ‘kidnaszERSON’.
Previous Work
Classifiers rely on the labeled examples’ surrounding context for features such as nearby tokens, document position, syntax, named entities , semantic classes, and discourse relations (Maslennikov and Chua, 2007).
Previous Work
(2006) integrate named entities into pattern learning (PERSON won) to approximate unknown semantic roles.
Previous Work
However, the limitations to their approach are that (l) redundant documents about specific events are required, (2) relations are binary, and (3) only slots with named entities are learned.
named entity is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Rüd, Stefan and Ciaramita, Massimiliano and Müller, Jens and Schütze, Hinrich
Abstract
We use search engine results to address a particularly difficult cross-domain language processing task, the adaptation of named entity recognition (NER) from news text to web queries.
Experimental data
As out-of-domain newswire evaluation data3 we use the development test data from the NIST 1999 IEER named entity corpus, a dataset of 50,000 tokens of New York Times (NYT) and Associated Press Weekly news.4 This corpus is annotated with person, location, organization, cardinal, duration, measure, and date labels.
Introduction
For example, a named entity (NE) recognizer trained on news text may tag the NE London in an out-of-domain web query like London Klondike gold rush as a location.
Piggyback features
The value of the feature URL-MI is the average difference between the MI of PER and the other named entities .
Piggyback features
The feature LEX-MI interprets words occurring before or after so as indicators of named entitihood .
named entity is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Huang, Zhiheng and Chang, Yi and Long, Bo and Crespo, Jean-Francois and Dong, Anlei and Keerthi, Sathiya and Wu, Su-Lin
Abstract
Sequential modeling has been widely used in a variety of important applications including named entity recognition and shallow parsing.
Experiments
We apply 1-best and k-best sequential decoding algorithms to five NLP tagging tasks: Penn TreeBank (PTB) POS tagging, CoNLLZOOO joint POS tagging and chunking, CoNLL 2003 joint POS tagging, chunking and named entity tagging, HPSG supertag-ging (Matsuzaki et al., 2007) and a search query named entity recognition (NER) dataset.
Experiments
Similarly we combine the POS tags, chunk tags, and named entity tags to form joint tags for CoNLL 2003 dataset, e.g., PRP$|I-NP|O.
Experiments
Note that by such tag joining, we are able to offer different tag decodings (for example, chunking and named entity tagging) simultaneously.
Introduction
For example to pursue a high recall ratio, a named entity recognition system may have to adopt k-best sequences in case the true entities are not recognized at the best one.
named entity is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Abu-Jbara, Amjad and Dasigi, Pradeep and Diab, Mona and Radev, Dragomir
Approach
In addition to this shallow parsing method, we also use named entity recognition (NER) to identify more entities.
Approach
We use the Stanford Named Entity Recognizer (Finkel et al., 2005) for this purpose.
Evaluation
7) We only use named entity recognition to identify entity targets; i.e.
Evaluation
Although using both named entity recognition (NER) and noun phrase chunking achieves better results, it
Related Work
In our work, we extract as targets frequent noun phrases and named entities that are used by two or more different discussants.
named entity is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Nishikawa, Hitoshi and Hasegawa, Takaaki and Matsuo, Yoshihiro and Kikui, Genichiro
Experiments
We used CRFs-based Japanese dependency parser (Imamura et al., 2007) and named entity recognizer (Suzuki et al., 2006) for sentiment extraction and constructing feature vectors for readability score, respectively.
Optimizing Sentence Sequence
To this, we add named entity tags (e.g.
Optimizing Sentence Sequence
We observe that the first sentence of a review of a restaurant frequently contains named entities indicating location.
Optimizing Sentence Sequence
Note that our method learns w from texts automatically annotated by a POS tagger and a named entity tagger.
named entity is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Ide, Nancy and Baker, Collin and Fellbaum, Christiane and Passonneau, Rebecca
Introduction
More recently, the OntoNotes project (Pradhan et al., 2007) released a one million word English corpus of newswire, broadcast news, and broadcast conversation that is annotated for Penn Treebank syntax, PropBank predicate argument structures, coreference, and named entities .
MASC Annotations
words Token Validated 1 18 222472 Sentence Validated 1 18 222472 POS/lemma Validated 1 18 222472 Noun chunks Validated 1 18 222472 Verb chunks Validated 1 18 222472 Named entities Validated 1 18 222472 FrameNet frames Manual 21 17829 HSPG Validated 40* 30106 Discourse Manual 40* 30106 Penn Treebank Validated 97 873 83 PropB ank Validated 92 50165 Opinion Manual 97 47583 TimeB ank Validated 34 5434 Committed belief Manual 13 4614 Event Manual 13 4614 Coreference Manual 2 1 877
MASC Annotations
To derive maximal benefit from the semantic information provided by these resources, the entire corpus is also annotated and manually validated for shallow parses (noun and verb chunks) and named entities (person, location, organization, date and time).
MASC Annotations
Automatically-produced annotations for sentence, token, part of speech, shallow parses (noun and verb chunks), and named entities (person, location, organization, date and time) are hand-validated by a team of students.
named entity is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Saha, Sujan Kumar and Mitra, Pabitra and Sarkar, Sudeshna
Abstract
Statistical machine learning methods are employed to train a Named Entity Recognizer from annotated data.
Abstract
A number of word similarity measures are proposed for clustering words for the Named Entity Recognition task.
Introduction
Named Entity Recognition (NER) involves locating and classifying the names in a text.
Maximum Entropy Based Model for Hindi NER
This corpus has been manually annotated and contains about 16,491 Named Entities (NEs).
named entity is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Vadas, David and Curran, James R.
NER features
Named entity recognition (NER) provides information that is particularly relevant for NP parsing, simply because entities are nouns.
NER features
coordinate structure in biological named entities .
NER features
We identify constituents that dominate tokens that all have the same NE tag, as these nodes will not cause a “crossing bracket” with the named entity .
named entity is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Pilehvar, Mohammad Taher and Jurgens, David and Navigli, Roberto
Conclusions
Specifically, we plan to investigate higher coverage inventories such as BabelNet (Navigli and Ponzetto, 2012a), which will handle texts with named entities and rare senses that are not in WordNet, and will also enable cross-lingual semantic similarity.
Experiment 1: Textual Similarity
The TLsyn system also uses Google Book Ngrams, as well as dependency parsing and named entity recognition.
Experiment 1: Textual Similarity
Additionally, because the texts often contain named entities which are not present in WordNet, we incorporated the similarity values produced by four string-based measures, which were used by other teams in the STS task: (1) longest common substring which takes into account the length of the longest overlapping contiguous sequence of characters (substring) across two strings (Gusfield, 1997), (2) longest common subsequence which, instead, finds the longest overlapping subsequence of two strings (Allison and Dix, 1986), (3) Greedy String Tiling which allows reordering in strings (Wise, 1993), and (4) the character/word n-gram similarity proposed by Barron-Cedefio et al.
Experiment 1: Textual Similarity
Named entity features used by the TLsim system could be the reason for its better performance on the MSRpar dataset, which contains a large number of named entities .
named entity is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Nakashole, Ndapandula and Tylenda, Tomasz and Weikum, Gerhard
Evaluation
The problem in this example is incorrect segmentation of a named entity .
Introduction
Known entities are recognized and mapped to the KB using a recent tool for named entity disambiguation (Hoffart 2011).
Related Work
Tagging mentions of named entities with lexical types has been pursued in previous work.
Related Work
Most well-known is the Stanford named entity recognition (NER) tagger (Finkel 2005) which assigns coarse-grained types like person, organization, location, and other to noun phrases that are likely to denote entities.
named entity is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Yang, Fan and Zhao, Jun and Zou, Bo and Liu, Kang and Liu, Feifan
Introduction*
The task of Name Entity (NE) translation is to translate a name entity from source language to target language, which plays an important role in machine translation and cross-language information retrieval (CLIR).
Statistical Transliteration Model
Because the process of named entity recognition may lose some NEs, we will reserve all the words in web corpus without any filtering.
Statistical Transliteration Model
Then for every Tik, we use the named entity recognition (NER) software to determine whether rci is a NE or not.
Statistical Transliteration Model
The training corpus for statistical transliteration model comes from the corpus of Chinese <-> English Name Entity Lists v 1.0 (LDC2005T34).
named entity is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Garrido, Guillermo and Peñas, Anselmo and Cabaleiro, Bernardo and Rodrigo, Álvaro
Distant Supervised Relation Extraction
We enforce the Named Entity type of entity and value to match a expected type, predefined for the relation.
Document Representation
There are three families of types: Events (verbs that describe an action, annotated with tense, polarity and aspect); standardized Time Expressions; and Named Entities , with additional annotations such as gender or age.
Document Representation
1Most chunks consist in one word; we join words into a chunk (and a node) in two cases: a multi—word named entity and a verb and its auxiliaries.
Document Representation
The processing includes dependency parsing, named entity recognition and coreference resolution, done with the Stanford CoreNLP software (Klein and Manning, 2003); and events and temporal information extraction, via the TARSQI Toolkit (Verhagen et al., 2005).
named entity is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Baldwin, Tyler and Li, Yunyao and Alexe, Bogdan and Stanoi, Ioana R.
Experimental Evaluation
several potential named entities it could refer to, even if the vast majority of references were to only a single entity.
Introduction
Several NLP tasks, such as word sense disambiguation, word sense induction, and named entity disambiguation, address this ambiguity problem to varying degrees.
Related Work
the well studied problems of named entity disambiguation (NED) and word sense disambiguation (WSD).
Related Work
Both named entity and word sense disambiguation are extensively studied, and surveys on each are available (Nadeau and Sekine, 2007; Navigli, 2009).
named entity is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Yao, Xuchen and Van Durme, Benjamin
Experiments
For this purpose we used the Freebase Search API (Freebase, 2013a).A11 named entities 8 in a question were sent to this API, which returned a ranked list of relevant topics.
Experiments
8When no named entities are detected, we fall back to noun phrases.
Graph Features
We simply apply a named entity recognizer to find the question topic.
Graph Features
(special case) if a qtopic node was tagged as a named entity, then replace this node with its its named entity form, e.g., bieber —> qtopic=person;
named entity is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Yao, Limin and Riedel, Sebastian and McCallum, Andrew
Experiments
We extract dependency paths for each pair of named entities in one sentence.
Experiments
To determine entity types, we link named entities to Wikipedia pages using the Wikifier (Rati-nov et al., 2011) package and extract categories from the Wikipedia page.
Introduction
Many relation discovery methods rely exclusively on the notion of either shallow or syntactic patterns that appear between two named entities (B ollegala et al., 2010; Lin and Pantel, 2001).
Related Work
(2004) cluster pairs of named entities according to the similarity of context words intervening between them.
named entity is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Tibshirani, Julie and Manning, Christopher D.
Abstract
We conduct experiments on named entity recognition data and find that our approach can provide a significant improvement over the standard model when annotation errors are present.
Experiments
We now consider the problem of named entity recognition (NER) to evaluate how our model performs in a large-scale prediction task.
Experiments
In traditional NER, the goal is to determine whether each word is a person, organization, location, or not a named entity (‘other’).
Experiments
(This task does not trivially reduce to finding the capitalized words, as the model must distinguish between people and other named entities like organizations).
named entity is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Bamman, David and Dyer, Chris and Smith, Noah A.
Evaluation
In the absence of a sizable number of linguistically interesting terms (like wicked) that are known to be geographically variable, we consider the proxy of estimating the named entities evoked by specific terms in different geographical regions.
Evaluation
As noted above, geographic terms like city provide one such example: in Massachusetts we expect the term city to be more strongly connected to grounded named entities like Boston than to other US cities.
Introduction
This information enables learning models of word meaning that are sensitive to such factors, allowing us to distinguish, for example, between the usage of wicked in Massachusetts from the usage of that word elsewhere, and letting us better associate geographically grounded named entities (e.g, Boston) with their hypemyms (city) in their respective regions.
named entity is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Andrews, Nicholas and Eisner, Jason and Dredze, Mark
Detailed generative story
(c) If wdk is a named entity type (PERSON, PLACE, ORG, .
Inference by Block Gibbs Sampling
Each context word and each named entity is associated with a latent topic.
Inference by Block Gibbs Sampling
This process treats all topics as exchangeable, including those associated with named entities .
named entity is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Fu, Ruiji and Guo, Jiang and Qin, Bing and Che, Wanxiang and Wang, Haifeng and Liu, Ting
Background
It works well for named entities .
Related Work
Word embeddings have been successfully applied in many applications, such as in sentiment analysis (Socher et al., 2011b), paraphrase detection (Socher et al., 2011a), chunking, and named entity recognition (Turian et al., 2010; Collobert et al., 2011).
Results and Analysis 5.1 Varying the Amount of Clusters
The red paths refer to the relations between the named entity and its hypernyms extracted using the web mining method (Fu et al., 2013).
named entity is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Soni, Sandeep and Mitra, Tanushree and Gilbert, Eric and Eisenstein, Jacob
Introduction
We present a new dataset of Twitter messages that use FactBank predicates (e.g., claim, say, insist) to scope the claims of named entity sources.
Modeling factuality judgments
0 Source: represented by the named entity or username in the source field (see Figure 4) Journalist: represented by their Twitter ID Claim: represented by a bag-of-words vector from the claim field (Figure 4)
Text data
0 Finally, we restrict consideration to tweets in which the source contains a named entity or twitter usemame.
named entity is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Zhai, Ke and Williams, Jason D
Data
We also map all named entities (e.g., “downtown” and “28X”) to their semantic types (resp.
Data
We map named entities to their semantic types, apply stemming, and remove stop words.3 The corpus we use contains approximately 2, 000 dialogue sessions or 80, 000 conversation utterances.
Latent Structure in Dialogues
3We used regular expression to map named entities , and Porter stemmer in NLTK to stem all tokens.
named entity is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Zou, Bowei and Zhou, Guodong and Zhu, Qiaoming
Baselines
> Basic features: the named entity and its type in the focus candidate; relative position of the focus candidate to the negative expression (before or after).
Baselines
Along with negation focus annotation, this corpus also contains other annotations, such as POS tag, named entity , chunk, constituent tree, dependency tree, and semantic role.
Baselines
> Named Entity Recognizer: We employ the Stanford NER5 (Finkel et al., 2005) to obtain named entities .
named entity is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Fan, Miao and Zhao, Deli and Zhou, Qiang and Liu, Zhiyuan and Zheng, Thomas Fang and Chang, Edward Y.
Experiments
Three kinds of features, namely, lexical, syntactic and named entity tag features, were extracted from relation mentions.
Introduction
As we cannot tell what kinds of features are effective in advance, we have to use NLP toolkits, such as Stanford CoreNLPIO, to extract a variety of textual features, e.g., named entity tags, part-of-speech tags and lexicalized dependency paths.
Related Work
Other literatures (Takamatsu et al., 2012; Min et al., 2013; Zhang et al., 2013; Xu et al., 2013) addressed more specific issues, like how to construct the negative class in learning or how to adopt more information, such as name entity tags, to improve the performance.
named entity is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Qian, Longhua and Hui, Haotian and Hu, Ya'nan and Zhou, Guodong and Zhu, Qiaoming
Abstract
Moreover, the success of joint bilingual learning may lend itself to many inherent multilingual NLP tasks such as POS tagging (Yarowsky and Ngai, 2001), name entity recognition (Yarowsky et al., 2001), sentiment analysis (Wan, 2009), and semantic role labeling (Sebastian and Lapata, 2009) etc.
Abstract
It has been successfully applied to many NLP applications, such as POS tagging (Engelson and Dagan, 1996; Ringger et al., 2007), word sense disambiguation (Chan and Ng, 2007; Zhu and Hovy, 2007), sentiment detection (Brew et al., 2010; Li et al., 2012), syntactical parsing (Hwa, 2004; Osborne and Baldridge, 2004), and named entity recognition (Shen et al., 2004; Tomanek et al., 2007; Tomanek and Hahn, 2009) etc.
Abstract
named entity and syntactic parse tree).
named entity is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Razmara, Majid and Siahbani, Maryam and Haffari, Reza and Sarkar, Anoop
Experiments & Results 4.1 Experimental Setup
From the oovs, we exclude numbers as well as named entities .
Experiments & Results 4.1 Experimental Setup
We apply a simple heuristic to detect named entities: basically words that are capitalized in the original deV/test set that do not appear at the beginning of a sentence are named entities .
Introduction
Although this is helpful in translating a small fraction of oovs such as named entities for languages with same writing systems, it harms the translation in other types of oovs and distant language pairs.
named entity is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Liu, Xiaohua and Li, Yitong and Wu, Haocheng and Zhou, Ming and Wei, Furu and Lu, Yi
Introduction
Many tweet related researches are inspired, from named entity recognition (Liu et al., 2012), topic detection (Mathioudakis and Koudas, 2010), clustering (Rosa et al., 2010), to event extraction (Grinev et al., 2009).
Introduction
(2012), on average a named entity has 3.3 different surface forms in tweets.
Task Definition
First, we assume that mentions are given, e.g., identified by some named entity recognition system.
named entity is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Lee, Taesung and Hwang, Seung-won
Abstract
This paper studies the problem of mining named entity translations from comparable corpora with some “asymmetry”.
Abstract
Our experimental results on English-Chinese corpora show that our selective propagation approach outperforms the previous approaches in named entity translation in terms of the mean reciprocal rank by up to 0.16 for organization names, and 0.14 in a low comparability case.
Introduction
This task is more challenging in the presence of multilingual text, because translating named entities (NEs), such as persons, locations, or organizations, is a nontrivial task.
named entity is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Lassalle, Emmanuel and Denis, Pascal
Experiments
Depending on the document category, we found some variations as to which hierarchy was learned in each setting, but we noticed that parameters starting with right and left gramtypes often produced quite good hierarchies: for instance right gramtype —> left gramtype —> same sentence —> right named entity type.
Introduction
The main question we raise is, given a set of indicators (such as grammatical types, distance between two mentions, or named entity types), how to best partition the pool of mention pair examples in order to best discriminate coreferential pairs from non coreferential ones.
System description
We used classical features that can be found in details in (Bengston and Roth, 2008) and (Rah-man and Ng, 2011): grammatical type and subtype of mentions, string match and substring, apposition and copula, distance (number of separating mentions/sentences/words), gender/number match, synonymy/hypemym and animacy (using WordNet), family name (based on lists), named entity types, syntactic features (gold parse) and anaphoricity detection.
named entity is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Chambers, Nathanael
Timestamp Classifiers
Named entities are important to document dating due to the nature of people and places coming in and out of the news at precise moments in time.
Timestamp Classifiers
Named Entities : Although not directly related to time expressions, we also include n— grams of tokens that are labeled by an NER system using Person, Organization, or Location.
Timestamp Classifiers
Collecting named entity mentions will differentiate between an article discussing a bill and one discussing the US President, Bill Clinton.
named entity is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Park, Souneil and Lee, Kyung Soon and Song, Junehwa
Disputant relation-based method
A named entity combined with a topic particle or a subject particle is identified as the subject of these quotes.
Disputant relation-based method
We detect the name of an organization, person, or country using the Korean Named Entity Recognizer (Lee et al.
Introduction
We observe that the method achieves acceptable performance for practical use with basic language resources and tools, i.e., Named Entity Recognizer (Lee et al.
named entity is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Chan, Yee Seng and Roth, Dan
Mention Extraction System
NE tags We automatically annotate the sentences with named entity (NE) tags using the named entity tagger of (Ratinov and Roth, 2009).
Mention Extraction System
From a sentence, we gather the following as candidate mentions: all nouns and possessive pronouns, all named entities annotated by the the NE tagger (Ratinov and Roth, 2009), all base noun phrase (NP) chunks, all chunks satisfying the pattern: NP (PP NP)+, all NP constituents in the syntactic parse tree, and from each of these constituents, all substrings consisting of two or more words, provided the sub-strings do not start nor end on punctuation marks.
Syntactico-Semantic Structures
lw: last word in the mention; Bc(w): the brown cluster bit string representing w; NE: named entity
named entity is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Bendersky, Michael and Croft, W. Bruce and Smith, David A.
Introduction
Automatic markup of textual documents with linguistic annotations such as part-of-speech tags, sentence constituents, named entities , or semantic roles is a common practice in natural language processing (NLP).
Joint Query Annotation
Many query annotations that are useful for IR can be represented using this simple form, including capitalization, POS tagging, phrase chunking, named entity recognition, and stopword indicators, to name just a few.
Related Work
These approaches have been shown to be successful for tasks such as parsing and named entity recognition in newswire data (Finkel and Manning, 2009) or semantic role labeling in the Penn Treebank and Brown corpus (Toutanova et al., 2008).
named entity is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Honnibal, Matthew and Curran, James R. and Bos, Johan
Conclusion
The major areas of CCGbank’s grammar left to be improved are the analysis of comparatives, and the analysis of named entities .
Conclusion
Named entities are also difficult to analyse, as many entity types obey their own specific grammars.
Conclusion
This is another example of a phenomenon that could be analysed much better in CCGbank using an existing resource, the BBN named entity corpus.
named entity is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Zhu, Xiaodan and Penn, Gerald and Rudzicz, Frank
An acoustics-based approach
Given two subpaths with nearly identical average similarity scores, we suggest that the longer of the two is more likely to refer to content of interest that is shared between two speech utterances, e.g., named entities .
Experimental results
Although named entities and domain-specific terms are often highly relevant to the documents in which they are referenced, these types of words are often not included in ASR vocabularies, due to their relative global rarity.
Introduction
This out-of-vocabulary (OOV) problem is unavoidable in the regular ASR framework, although it is more likely to happen on salient words such as named entities or domain-specific terms.
named entity is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Zhao, Hai and Song, Yan and Kit, Chunyu and Zhou, Guodong
Discussion
We once found many words, mostly named entities , were outside the lexicon.
Discussion
Thus we managed to collect a named entity translation dictionary to enhance the original one.
Treebank Translation and Dependency Transformation
As a bilingual lexicon is required for our task and none of existing lexicons are suitable for translating PTB, two lexicons, LDC Chinese-English Translation Lexicon Version 2.0 (LDC2002L27), and an English to Chinese lexicon in StarDith, are conflated, with some necessary manual extensions, to cover 99% words appearing in the PTB (the most part of the untranslated words are named entities .).
named entity is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Yang, Fan and Zhao, Jun and Liu, Kang
Introduction
The task of Named Entity (NE) translation is to translate a named entity from the source language to the target language, which plays an important role in machine translation and cross-language information retrieval (CLIR).
Introduction
3) Asymmetric alignment: When we extract the translation equivalent from the web pages, the traditional method should recognize the named entities in the target language sentence first, and then the extracted NEs will be aligned with the source ON.
Introduction
However, the named entity recognition (NER) will always introduce some mistakes.
named entity is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Kim, Jungi and Li, Jin-Ji and Lee, Jong-Hyeok
Experiment
In general, the NTCIR topics are general descriptive words such as “regenerative medicine”, “American economy after the 911 terrorist attacks”, and “lawsuit brought against Microsoft for monopolistic practices.” The TREC topics are more named-entity-like terms such as “CarmaX”, “Wikipedia primary source”, “Jiffy Lube”, “Starbucks”, and “Windows Vista.” We have experimentally shown that LSA is more suited to finding associations between general terms because its training documents are from a general domain.9 Our PMI measure utilizes a web search engine, which covers a variety of named entity terms.
Term Weighting and Sentiment Analysis
The statistical approaches may suffer from data sparseness problems especially for named entity terms used in the query, and the proximal clues cannot sufficiently cover all term—query associations.
Term Weighting and Sentiment Analysis
Mullen and Collier (2004) manually annotated named entities in their dataset (i.e.
named entity is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Das, Dipanjan and Smith, Noah A.
QG for Paraphrase Modeling
In addition to assuming that dependency parse trees for s and t are observable, we also assume each word 2121- comes with POS and named entity tags.
QG for Paraphrase Modeling
Dependency label, POS, and named entity class The newly generated target word’s dependency label, POS, and named entity class drawn from multinomial distributions plab, ppos, and pm that condition, respectively, on the configuration and the POS and named entity class of the aligned source-tree word 3360-) (lines 9—11).
QG for Paraphrase Modeling
We estimate the distributions over dependency labels, POS tags, and named entity classes using the transformed treebank (footnote 4).
named entity is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Li, Zhifei and Yarowsky, David
Background: Chinese Abbreviations
While the abbreviations mostly originate from noun phrases (in particular, named entities ), other general phrases are also abbreviatable.
Unsupervised Translation Induction for Chinese Abbreviations
One may use a named entity tagger to obtain such a list.
Unsupervised Translation Induction for Chinese Abbreviations
However, this relies on the existence of a Chinese named entity tagger with high-precision.
named entity is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Dridan, Rebecca and Kordoni, Valia and Nicholson, Jeremy
Conclusion
While annotating with named entity data or a lexical type supertagger were also found to increase coverage, the POS tagger had the greatest effect with up to 45% coverage increase on unseen text.
Unknown Word Handling
Since the parser has the means to accept named entity (NE) information in the input, we also experimented with using generic lexical items generated from NE data.
Unknown Word Handling
It is possible that another named entity tagger would give better results, and this may be looked at in future experiments.
named entity is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: