Recognizing Named Entities in Tweets
LIU, Xiaohua and ZHANG, Shaodian and WEI, Furu and ZHOU, Ming

Article Structure

Abstract

The challenges of Named Entities Recognition (NER) for tweets lie in the insufficient information in a tweet and the unavailability of training data.

Introduction

Named Entities Recognition (NER) is generally understood as the task of identifying mentions of rigid designators from text belonging to named-entity types such as persons, organizations and locations (Nadeau and Sekine, 2007).

Related Work

Related work can be roughly divided into three categories: NER on tweets, NER on non-tweets (e.g., news, biological medicine, and clinical notes), and semi-supervised learning for NER.

Task Definition

We first introduce some background about tweets, then give a formal definition of the task.

Our Method

Now we present our solution to the challenging task of NER for tweets.

Experiments

In this section, we evaluate our method on a manually annotated data set and show that our system

Conclusions and Future work

We propose a novel NER system for tweets, which combines a KNN classifier with a CRF labeler under a semi-supervised learning framework.

Topics

CRF

Appears in 35 sentences as: =crf (1) CRF (37) CRF++ (1)
In Recognizing Named Entities in Tweets
  1. We propose to combine a K-Nearest Neighbors (KNN) classifier with a linear Conditional Random Fields ( CRF ) model under a semi-supervised learning framework to tackle these challenges.
    Page 1, “Abstract”
  2. The KNN based classifier conducts pre-labeling to collect global coarse evidence across tweets while the CRF model conducts sequential labeling to capture fine-grained information encoded in a tweet.
    Page 1, “Abstract”
  3. Following the two-stage prediction aggregation methods (Krishnan and Manning, 2006), such pre-labeled results, together with other conventional features used by the state-of-the-art NER systems, are fed into a linear Conditional Random Fields ( CRF ) (Lafferty et al., 2001) model, which conducts fine-grained tweet level NER.
    Page 2, “Introduction”
  4. Furthermore, the KNN and CRF model are repeatedly retrained with an incrementally augmented training set, into which high confidently labeled tweets are added.
    Page 2, “Introduction”
  5. Indeed, it is the combination of KNN and CRF under a semi-supervised learning framework that differentiates ours from the existing.
    Page 2, “Introduction”
  6. It is also demonstrated that integrating KNN classified results into the CRF model and semi-supervised learning considerably boost the performance.
    Page 2, “Introduction”
  7. We propose to a novel method that combines a KNN classifier with a conventional CRF based labeler under a semi-supervised learning framework to combat the lack of information in tweet and the unavailability of training data.
    Page 2, “Introduction”
  8. (2010) use Amazons Mechanical Turk service 2 and CrowdFlower 3 to annotate named entities in tweets and train a CRF model to evaluate the effectiveness of human labeling.
    Page 2, “Related Work”
  9. To achieve this, a KNN classifier with a CRF model is combined to leverage cross tweets information, and the semi-supervised learning is adopted to leverage unlabeled tweets.
    Page 2, “Related Work”
  10. (2005) use CRF to train a sequential NE labeler, in which the BIO (meaning Beginning, the Inside and the Outside of
    Page 2, “Related Work”
  11. A data set is manually annotated and a linear CRF model is trained, which achieves an F-score of 81.48% on their test data set; Downey et al.
    Page 3, “Related Work”

See all papers in Proc. ACL 2011 that mention CRF.

See all papers in Proc. ACL that mention CRF.

Back to top.

NER

Appears in 32 sentences as: NER (35)
In Recognizing Named Entities in Tweets
  1. The challenges of Named Entities Recognition ( NER ) for tweets lie in the insufficient information in a tweet and the unavailability of training data.
    Page 1, “Abstract”
  2. Named Entities Recognition ( NER ) is generally understood as the task of identifying mentions of rigid designators from text belonging to named-entity types such as persons, organizations and locations (Nadeau and Sekine, 2007).
    Page 1, “Introduction”
  3. Proposed solutions to NER fall into three categories: 1) The rule-based (Krupka and Hausman, 1998); 2) the machine learning based (Finkel and Manning, 2009; Singh et al., 2010) ; and 3) hybrid methods (J ansche and Abney, 2002).
    Page 1, “Introduction”
  4. However, current NER mainly focuses on formal text such as news articles (Mccallum and Li, 2003; Etzioni et al., 2005).
    Page 1, “Introduction”
  5. For example, the average Fl of the Stanford NER (Finkel et al., 2005) , which is trained on the CoNLL03 shared task data set and achieves state-of-the-art performance on that task, drops from 90.8% (Ratinov and Roth, 2009) to 45.8% on tweets.
    Page 1, “Introduction”
  6. Thus, building a domain specific NER for tweets is necessary, which requires a lot of annotated tweets or rules.
    Page 1, “Introduction”
  7. (2010), which introduces a high-level rule language, called NERL, to build the general and domain specific NER systems; and 2) semi-supervised learning, which aims to use the abundant unlabeled data to compensate for the lack of annotated data.
    Page 1, “Introduction”
  8. We propose a novel NER system to address these challenges.
    Page 2, “Introduction”
  9. Following the two-stage prediction aggregation methods (Krishnan and Manning, 2006), such pre-labeled results, together with other conventional features used by the state-of-the-art NER systems, are fed into a linear Conditional Random Fields (CRF) (Lafferty et al., 2001) model, which conducts fine-grained tweet level NER .
    Page 2, “Introduction”
  10. Related work can be roughly divided into three categories: NER on tweets, NER on non-tweets (e.g., news, biological medicine, and clinical notes), and semi-supervised learning for NER .
    Page 2, “Related Work”
  11. 2.1 NER on Tweets
    Page 2, “Related Work”

See all papers in Proc. ACL 2011 that mention NER.

See all papers in Proc. ACL that mention NER.

Back to top.

semi-supervised

Appears in 24 sentences as: Semi-supervised (3) semi-supervised (22)
In Recognizing Named Entities in Tweets
  1. We propose to combine a K-Nearest Neighbors (KNN) classifier with a linear Conditional Random Fields (CRF) model under a semi-supervised learning framework to tackle these challenges.
    Page 1, “Abstract”
  2. The semi-supervised learning plus the gazetteers alleviate the lack of training data.
    Page 1, “Abstract”
  3. Extensive experiments show the advantages of our method over the baselines as well as the effectiveness of KNN and semi-supervised learning.
    Page 1, “Abstract”
  4. (2010), which introduces a high-level rule language, called NERL, to build the general and domain specific NER systems; and 2) semi-supervised learning, which aims to use the abundant unlabeled data to compensate for the lack of annotated data.
    Page 1, “Introduction”
  5. Indeed, it is the combination of KNN and CRF under a semi-supervised learning framework that differentiates ours from the existing.
    Page 2, “Introduction”
  6. It is also demonstrated that integrating KNN classified results into the CRF model and semi-supervised learning considerably boost the performance.
    Page 2, “Introduction”
  7. We propose to a novel method that combines a KNN classifier with a conventional CRF based labeler under a semi-supervised learning framework to combat the lack of information in tweet and the unavailability of training data.
    Page 2, “Introduction”
  8. We evaluate our method on a human annotated data set, and show that our method outperforms the baselines and that both the combination with KNN and the semi-supervised learning strategy are effective.
    Page 2, “Introduction”
  9. Related work can be roughly divided into three categories: NER on tweets, NER on non-tweets (e.g., news, biological medicine, and clinical notes), and semi-supervised learning for NER.
    Page 2, “Related Work”
  10. To achieve this, a KNN classifier with a CRF model is combined to leverage cross tweets information, and the semi-supervised learning is adopted to leverage unlabeled tweets.
    Page 2, “Related Work”
  11. 2.3 Semi-supervised Learning for NER
    Page 3, “Related Work”

See all papers in Proc. ACL 2011 that mention semi-supervised.

See all papers in Proc. ACL that mention semi-supervised.

Back to top.

named entities

Appears in 15 sentences as: name entity (1) Named Entities (2) named entities (8) named entity (4)
In Recognizing Named Entities in Tweets
  1. The challenges of Named Entities Recognition (NER) for tweets lie in the insufficient information in a tweet and the unavailability of training data.
    Page 1, “Abstract”
  2. Named Entities Recognition (NER) is generally understood as the task of identifying mentions of rigid designators from text belonging to named-entity types such as persons, organizations and locations (Nadeau and Sekine, 2007).
    Page 1, “Introduction”
  3. (2010) use Amazons Mechanical Turk service 2 and CrowdFlower 3 to annotate named entities in tweets and train a CRF model to evaluate the effectiveness of human labeling.
    Page 2, “Related Work”
  4. In contrast, our work aims to build a system that can automatically identify named entities in tweets.
    Page 2, “Related Work”
  5. Twitter users are interested in named entities , such
    Page 3, “Task Definition”
  6. Figure 1: Portion of different types of named entities in tweets.
    Page 4, “Task Definition”
  7. as person names, organization names and product names, as evidenced by the abundant named entities in tweets.
    Page 4, “Task Definition”
  8. According to our investigation on 12,245 randomly sampled tweets that are manually labeled, about 46.8% have at least one named entity .
    Page 4, “Task Definition”
  9. Figure 1 shows the portion of named entities of different
    Page 4, “Task Definition”
  10. We focus on four types of entities in our study, i.e., persons, organizations, products, and locations, which, according to our investigation as shown in Figure 1, account for 89.0% of all the named entities .
    Page 4, “Task Definition”
  11. We use the Twigg SDK 7 to crawl all tweets from April 20th 2010 to April 25th 2010, then drop non-English tweets and get about 11,371,389, from which 15,800 tweets are randomly sampled, and are then labeled by two independent annotators, so that the beginning and the end of each named entity are marked with <TYPE> and </TYPE>, respectively.
    Page 6, “Experiments”

See all papers in Proc. ACL 2011 that mention named entities.

See all papers in Proc. ACL that mention named entities.

Back to top.

fine-grained

Appears in 4 sentences as: fine-grained (4)
In Recognizing Named Entities in Tweets
  1. The KNN based classifier conducts pre-labeling to collect global coarse evidence across tweets while the CRF model conducts sequential labeling to capture fine-grained information encoded in a tweet.
    Page 1, “Abstract”
  2. Following the two-stage prediction aggregation methods (Krishnan and Manning, 2006), such pre-labeled results, together with other conventional features used by the state-of-the-art NER systems, are fed into a linear Conditional Random Fields (CRF) (Lafferty et al., 2001) model, which conducts fine-grained tweet level NER.
    Page 2, “Introduction”
  3. Our model is hybrid in the sense that a KNN classifier and a CRF model are sequentially applied to the target tweet, with the goal that the KNN classifier captures global coarse evidence while the CRF model fine-grained information encoded in a single tweet and in the gazetteers.
    Page 4, “Our Method”
  4. model, which is good at encoding the subtle interactions between words and their labels, compensates for KNN’s incapability to capture fine-grained evidence involving multiple decision points.
    Page 5, “Our Method”

See all papers in Proc. ACL 2011 that mention fine-grained.

See all papers in Proc. ACL that mention fine-grained.

Back to top.

unlabeled data

Appears in 4 sentences as: unlabeled data (4)
In Recognizing Named Entities in Tweets
  1. (2010), which introduces a high-level rule language, called NERL, to build the general and domain specific NER systems; and 2) semi-supervised learning, which aims to use the abundant unlabeled data to compensate for the lack of annotated data.
    Page 1, “Introduction”
  2. Semi-supervised learning exploits both labeled and unlabeled data .
    Page 3, “Related Work”
  3. It proves useful when labeled data is scarce and hard to construct while unlabeled data is abundant and easy to access.
    Page 3, “Related Work”
  4. Another representative of semi-supervised learning is learning a robust representation of the input from unlabeled data .
    Page 3, “Related Work”

See all papers in Proc. ACL 2011 that mention unlabeled data.

See all papers in Proc. ACL that mention unlabeled data.

Back to top.

entity type

Appears in 3 sentences as: entity type (3) entity types (1)
In Recognizing Named Entities in Tweets
  1. Following the common practice , we adopt a sequential labeling approach to jointly resolve these subtasks, i.e., for each word in the input tweet, a label is assigned to it, indicating both the boundary and entity type .
    Page 4, “Our Method”
  2. For the overall performance, we use the average Precision, Recall and F1, where the weight of each name entity type is proportional to the number of entities of that type.
    Page 6, “Experiments”
  3. Tables 2-5 report the results on each entity type, indicating that our method consistently yields better results on all entity types .
    Page 7, “Experiments”

See all papers in Proc. ACL 2011 that mention entity type.

See all papers in Proc. ACL that mention entity type.

Back to top.

feature vector

Appears in 3 sentences as: feature vector (3)
In Recognizing Named Entities in Tweets
  1. 5: for Each word w E t d0 6: Get the feature vector ID: 13 = reprw(w, t).
    Page 5, “Our Method”
  2. 10: end if 11: end for 12: Get the feature vector t: 7?
    Page 5, “Our Method”
  3. 4: Get the feature vector ID: 13 = reprw (w, t).
    Page 5, “Our Method”

See all papers in Proc. ACL 2011 that mention feature vector.

See all papers in Proc. ACL that mention feature vector.

Back to top.

gold-standard

Appears in 3 sentences as: gold-standard (3)
In Recognizing Named Entities in Tweets
  1. Finally we get 12,245 tweets, forming the gold-standard data set.
    Page 6, “Experiments”
  2. The gold-standard data set is evenly split into two parts: One for training and the other for testing.
    Page 6, “Experiments”
  3. Precision is a measure of what percentage the output labels are correct, and recall tells us to what percentage the labels in the gold-standard data set are correctly labeled, while F1 is the harmonic mean of precision and recall.
    Page 6, “Experiments”

See all papers in Proc. ACL 2011 that mention gold-standard.

See all papers in Proc. ACL that mention gold-standard.

Back to top.

machine learning

Appears in 3 sentences as: Machine learning (1) machine learning (2)
In Recognizing Named Entities in Tweets
  1. Proposed solutions to NER fall into three categories: 1) The rule-based (Krupka and Hausman, 1998); 2) the machine learning based (Finkel and Manning, 2009; Singh et al., 2010) ; and 3) hybrid methods (J ansche and Abney, 2002).
    Page 1, “Introduction”
  2. Machine learning based systems are commonly used and outperform the rule based systems.
    Page 3, “Related Work”
  3. Algorithm 1 outlines our method, where: trains and twink, denote two machine learning processes to get the CRF labeler and the KNN classifier, respectively; reprw converts a word in a tweet into a bag-of-words vector; the reprt function transforms a tweet into a feature matrix that is later fed into the CRF model; the knn function predicts the class of a word; the update function applies the predicted class by KNN to the inputted tweet; the C7“ f function conducts word level NE labeling;7' and 7 represent the minimum labeling confidence of KNN and CRF, respectively, which are experimentally set to 0.1 and 0.001; N (1,000 in our work) denotes the maximum number of new accumulated training data.
    Page 4, “Our Method”

See all papers in Proc. ACL 2011 that mention machine learning.

See all papers in Proc. ACL that mention machine learning.

Back to top.

manually annotated

Appears in 3 sentences as: manually annotated (3)
In Recognizing Named Entities in Tweets
  1. 12,245 tweets are manually annotated as the test data set.
    Page 2, “Introduction”
  2. A data set is manually annotated and a linear CRF model is trained, which achieves an F-score of 81.48% on their test data set; Downey et al.
    Page 3, “Related Work”
  3. In this section, we evaluate our method on a manually annotated data set and show that our system
    Page 6, “Experiments”

See all papers in Proc. ACL 2011 that mention manually annotated.

See all papers in Proc. ACL that mention manually annotated.

Back to top.

Maximum Entropy

Appears in 3 sentences as: Maximum Entropy (2) maximum entropy (1)
In Recognizing Named Entities in Tweets
  1. Other methods, such as classification based on Maximum Entropy models and sequential application of Per-ceptron or Winnow (Collins, 2002), are also practiced.
    Page 3, “Related Work”
  2. We have replaced KNN by other classifiers, such as those based on Maximum Entropy and Support Vector Machines, respectively.
    Page 6, “Our Method”
  3. Similarly, to study the effectiveness of the CRF model, it is replaced by its alternations, such as the HMM labeler and a beam search plus a maximum entropy based classifier.
    Page 6, “Our Method”

See all papers in Proc. ACL 2011 that mention Maximum Entropy.

See all papers in Proc. ACL that mention Maximum Entropy.

Back to top.

randomly sampled

Appears in 3 sentences as: randomly sampled (3)
In Recognizing Named Entities in Tweets
  1. This is based on an investigation of 12,245 randomly sampled tweets, which are manually labeled.
    Page 4, “Task Definition”
  2. According to our investigation on 12,245 randomly sampled tweets that are manually labeled, about 46.8% have at least one named entity.
    Page 4, “Task Definition”
  3. We use the Twigg SDK 7 to crawl all tweets from April 20th 2010 to April 25th 2010, then drop non-English tweets and get about 11,371,389, from which 15,800 tweets are randomly sampled , and are then labeled by two independent annotators, so that the beginning and the end of each named entity are marked with <TYPE> and </TYPE>, respectively.
    Page 6, “Experiments”

See all papers in Proc. ACL 2011 that mention randomly sampled.

See all papers in Proc. ACL that mention randomly sampled.

Back to top.