Abstract | The challenges of Named Entities Recognition ( NER ) for tweets lie in the insufficient information in a tweet and the unavailability of training data. |
Introduction | Named Entities Recognition ( NER ) is generally understood as the task of identifying mentions of rigid designators from text belonging to named-entity types such as persons, organizations and locations (Nadeau and Sekine, 2007). |
Introduction | Proposed solutions to NER fall into three categories: 1) The rule-based (Krupka and Hausman, 1998); 2) the machine learning based (Finkel and Manning, 2009; Singh et al., 2010) ; and 3) hybrid methods (J ansche and Abney, 2002). |
Introduction | However, current NER mainly focuses on formal text such as news articles (Mccallum and Li, 2003; Etzioni et al., 2005). |
Related Work | Related work can be roughly divided into three categories: NER on tweets, NER on non-tweets (e.g., news, biological medicine, and clinical notes), and semi-supervised learning for NER . |
Related Work | 2.1 NER on Tweets |
Abstract | We use search engine results to address a particularly difficult cross-domain language processing task, the adaptation of named entity recognition ( NER ) from news text to web queries. |
Abstract | We achieve strong gains in NER performance on news, in-domain and out-of-domain, and on web queries. |
Introduction | In this paper, we use piggyback features to address a particularly hard cross-domain problem, the application of an NER system trained on news to web queries. |
Introduction | Thus, applying NER systems trained on news to web queries requires a robust cross-domain approach. |
Introduction | The lack of context and capitalization, and the noisiness of real-world web queries (to-kenization irregularities and misspellings) all make NER hard. |