Discussion | Improved named entity translation accuracy as measured by the NEWA metric in general, and a reduction in dropped names in particular is clearly valuable to the human reader of machine translated documents as well as for systems using machine translation for further information processing. |
End-to-End results | Table 2: Name translation accuracy with respect to BBN and re-annotated Gold Standard on 1730 named entities in 637 sentences. |
Evaluation | General MT metrics such as BLEU, TER, METEOR are not suitable for evaluating named entity translation and transliteration, because they are not focused on named entities (NEs). |
Evaluation | The general idea of the Named Entity Weak Accuracy (NEWA) metric is to |
Evaluation | BBN kindly provided us with an annotated Arabic text corpus, in which named entities were marked up with their type (e. g. GPE for Geopolitical Entity) and one or more English translations. |
Introduction | 0 Not all named entities should be transliterated. |
Introduction | Many named entities require a mix of transliteration and translation. |
Introduction | We ask: what percentage of source-language named entities are translated correctly? |
Learning what to transliterate | As already mentioned in the introduction, named entity (NE) identification followed by MT is a bad idea. |
Abstract | We propose using large-scale clustering of dependency relations between verbs and multi-word nouns (MN 5) to construct a gazetteer for named entity recognition (N ER). |
Experiments | We obtained 11,892 sentences13 with 18,677 named entities . |
Experiments | We define “# e-matches” as the number of matches that also match a boundary of a named entity in the training set, and “# optimal” as the optimal number of “# e-matches” that can be achieved when we know the |
Experiments | These gazetteers cover 40 - 50% of the named entities , and the cluster gazetteers have relatively wider coverage than the Wikipedia gazetteer has. |
Gazetteer Induction 2.1 Induction by MN Clustering | Figure 2: Clean MN clusters with named entity entries (Left: car brand names. |
Introduction | Gazetteers, or entity dictionaries, are important for performing named entity recognition (NER) accurately. |
Introduction | from a multi-word noun (MN)1 to named entity categories such as “Tokyo Stock Exchange —> {ORGANIZATION}”.2 However, since the correspondence between the labels and the NE categories can be learned by tagging models, a gazetteer will be useful as long as it returns consistent labels even if those returned are not the NE categories. |
Introduction | Therefore, performing the clustering with a vocabulary that is large enough to cover the many named entities required to improve the accuracy of NER is difficult. |
Related Work and Discussion | To cover most of the named entities in the data, we need much larger gazetteers. |
Abstract | We present a novel hierarchical prior structure for supervised transfer learning in named entity recognition, motivated by the common structure of feature spaces for this task across natural language data sets. |
Conclusions, related & future work | In this work we have introduced hierarchical feature tree priors for use in transfer learning on named entity extraction tasks. |
Introduction | Consider the task of named entity recognition (NER). |
Introduction | Having successfully trained a named entity classifier on this news data, now consider the problem of learning to classify tokens as names in email data. |
Introduction | In particular, we develop a novel prior for named entity recognition that exploits the hierarchical feature space often found in natural language domains (§l.2) and allows for the transfer of information from labeled datasets in other domains (§l.3). |
Investigation | The goal of our experiments was to see to what degree named entity recognition problems naturally conformed to hierarchical methods, and not just to achieve the highest performance possible. |
Abstract | In this paper, we describe a system by which the multilingual characteristics of Wikipedia can be utilized to annotate a large corpus of text with Named Entity Recognition (NER) tags requiring minimal human intervention and no linguistic expertise. |
Abstract | We show how the Wikipedia format can be used to identify possible named entities and discuss in detail the process by which we use the Category structure inherent to Wikipedia to determine the named entity type of a proposed entity. |
Conclusions | In conclusion, we have demonstrated that Wikipedia can be used to create a Named Entity Recognition system with performance comparable to one developed from 15-40,000 words of human-anno-tated newswire, while not requiring any linguistic expertise on the part of the user. |
Introduction | Named Entity Recognition (NER) has long been a major task of natural language processing. |
Training Data Generation | We elected to use the ACE Named Entity types PERSON, GPE (GeoPolitical Entities), ORGANIZATION, VEHICLE, WEAPON, LOCATION, FACILITY, DATE, TIME, MONEY, and PERCENT. |
Training Data Generation | Other categories can reliably be used to determine that the article does not refer to a named entity , such as “CategoryzEndangered species.” We manually derived a relatively small set of key phrases, the most important of which are shown in Table 1. |
Wikipedia 2.1 Structure | Toral and Munoz (2006) used Wikipedia to create lists of named entities . |
Wikipedia 2.1 Structure | Cucerzan (2007), by contrast to the above, used Wikipedia primarily for Named Entity Disambiguation, following the path of Bunescu and Pasca (2006). |
Abstract | Statistical machine learning methods are employed to train a Named Entity Recognizer from annotated data. |
Abstract | A number of word similarity measures are proposed for clustering words for the Named Entity Recognition task. |
Introduction | Named Entity Recognition (NER) involves locating and classifying the names in a text. |
Maximum Entropy Based Model for Hindi NER | This corpus has been manually annotated and contains about 16,491 Named Entities (NEs). |
NER features | Named entity recognition (NER) provides information that is particularly relevant for NP parsing, simply because entities are nouns. |
NER features | coordinate structure in biological named entities . |
NER features | We identify constituents that dominate tokens that all have the same NE tag, as these nodes will not cause a “crossing bracket” with the named entity . |
Introduction* | The task of Name Entity (NE) translation is to translate a name entity from source language to target language, which plays an important role in machine translation and cross-language information retrieval (CLIR). |
Statistical Transliteration Model | Because the process of named entity recognition may lose some NEs, we will reserve all the words in web corpus without any filtering. |
Statistical Transliteration Model | Then for every Tik, we use the named entity recognition (NER) software to determine whether rci is a NE or not. |
Statistical Transliteration Model | The training corpus for statistical transliteration model comes from the corpus of Chinese <-> English Name Entity Lists v 1.0 (LDC2005T34). |
Conclusion | While annotating with named entity data or a lexical type supertagger were also found to increase coverage, the POS tagger had the greatest effect with up to 45% coverage increase on unseen text. |
Unknown Word Handling | Since the parser has the means to accept named entity (NE) information in the input, we also experimented with using generic lexical items generated from NE data. |
Unknown Word Handling | It is possible that another named entity tagger would give better results, and this may be looked at in future experiments. |
Background: Chinese Abbreviations | While the abbreviations mostly originate from noun phrases (in particular, named entities ), other general phrases are also abbreviatable. |
Unsupervised Translation Induction for Chinese Abbreviations | One may use a named entity tagger to obtain such a list. |
Unsupervised Translation Induction for Chinese Abbreviations | However, this relies on the existence of a Chinese named entity tagger with high-precision. |