Collective Tweet Wikification based on Semi-supervised Graph Regularization
Huang, Hongzhao and Cao, Yunbo and Huang, Xiaojiang and Ji, Heng and Lin, Chin-Yew

Article Structure

Abstract

Wikification for tweets aims to automatically identify each concept mention in a tweet and link it to a concept referent in a knowledge base (e.g., Wikipedia).

Introduction

With millions of tweets posted daily, Twitter enables both individuals and organizations to disseminate information, from current affairs to breaking news in a timely fashion.

Preliminaries

Concept and Concept Mention We define a concept c as a Wikipedia article (e. g., Atlanta Hawks), and a concept mention m as an n-gram from a specific tweet.

Principles and Approach Overview

Labeled and Unlabeled Tweets

Relational Graph Construction

We first construct the relational graph G = (V, E > , where V = {211, ..., on} is a set of nodes and E 2 {e1, ..., em} is a set of edges.

Semi-supervised Graph Regularization

Given the constructed relational graph with the weighted matrix W and the label vector Y of all nodes, we assume the first 1 nodes are labeled as Y; and the remaining u nodes (u = n — l) are initialized with labels Y3.

Experiments

In this section we compare our approach with state-of-the-art methods as shown in Table 1.

Related Work

The task of linking concept mentions to a knowledge base has received increased attentions over the past several years, from the linking of concept mentions in a single text (Mihalcea and Csomai, 2007; Milne and Witten, 2008b; Milne and Witten, 2008a; Kulkami et al., 2009; He et al., 2011; Ratinov et al., 2011; Cassidy et al., 2012; Cheng and Roth, 2013), to the linking of a cluster of corefer-

Conclusions

We have introduced a novel semi- supervised graph regularization framework for wikification to simultaneously tackle the unique challenges of annotation and information shortage in short tweets.

Topics

semantic relatedness

Appears in 16 sentences as: Semantic Relatedness (2) semantIc relatedness (1) semantic relatedness (10) semantic relation (1) semantic relations (2)
In Collective Tweet Wikification based on Semi-supervised Graph Regularization
  1. In order to identify semantically-related mentions for collective inference, we detect meta path-based semantic relations through social networks.
    Page 1, “Abstract”
  2. semantIc relatedness ) SImIIarIty)
    Page 3, “Principles and Approach Overview”
  3. Principle 3 ( Semantic Relatedness ): Two highly semantically-related mentions are more likely to be linked to two highly semantically-related concepts.
    Page 3, “Principles and Approach Overview”
  4. The label assignment is obtained by our semi-supervised graph regularization framework based on a relational graph, which is constructed from local compatibility, coreference, and semantic relatedness relations.
    Page 3, “Principles and Approach Overview”
  5. In this subsection, we introduce the concept meta path which will be used to detect coreference (section 4.3) and semantic relatedness relations (section 4.4).
    Page 4, “Relational Graph Construction”
  6. Each meta path represents one particular semantic relation .
    Page 5, “Relational Graph Construction”
  7. 4.4 Semantic Relatedness
    Page 5, “Relational Graph Construction”
  8. Therefore, it is important to extend this evidence to capture semantic relatedness information from multiple tweets.
    Page 5, “Relational Graph Construction”
  9. We define the semantic relatedness score between two mentions as SR(mz-, mj) = 1.0 if at least one meta path exists between mi and mj, otherwise SR(mz-, mj) = 0.
    Page 5, “Relational Graph Construction”
  10. In order to compute the semantic relatedness of two concepts 0, and cj, we adopt the approach proposed by (Milne and
    Page 5, “Relational Graph Construction”
  11. Then we compute a weight matrix representing :he semantic relatedness relation as:
    Page 5, “Relational Graph Construction”

See all papers in Proc. ACL 2014 that mention semantic relatedness.

See all papers in Proc. ACL that mention semantic relatedness.

Back to top.

semi-supervised

Appears in 16 sentences as: semi-supervised (17)
In Collective Tweet Wikification based on Semi-supervised Graph Regularization
  1. To tackle these challenges, we propose a novel semi-supervised graph regularization model to incorporate both local and global evidence from multiple tweets through three fine-grained relations.
    Page 1, “Abstract”
  2. Therefore, it is challenging to create sufficient high quality labeled tweets for supervised models and worth considering semi-supervised learning with the exploration of unlabeled data.
    Page 1, “Introduction”
  3. However, when selecting semi-supervised learning frameworks, we noticed another unique challenge that tweets pose to wikification due to their informal writing style, shortness and noisiness.
    Page 2, “Introduction”
  4. Therefore, a collective inference model over multiple tweets in the semi-supervised setting is desirable.
    Page 2, “Introduction”
  5. In order to address these unique challenges for wikification for the short tweets, we employ graph-based semi-supervised learning algorithms (Zhu et al., 2003; Smola and Kondor, 2003; Blum et al., 2004; Zhou et al., 2004; Talukdar and Crammer, 2009) for collective inference by exploiting the manifold (cluster) structure in both unlabeled and labeled data.
    Page 2, “Introduction”
  6. effort to explore graph-based semi-supervised learning algorithms for the wikification task.
    Page 2, “Introduction”
  7. 0 We propose a novel semi-supervised graph regularization model performing collective inference for joint mention detection and disambiguation.
    Page 2, “Introduction”
  8. The label assignment is obtained by our semi-supervised graph regularization framework based on a relational graph, which is constructed from local compatibility, coreference, and semantic relatedness relations.
    Page 3, “Principles and Approach Overview”
  9. They con-:rol the contributions of these three relations in our semi-supervised graph regularization model.
    Page 5, “Relational Graph Construction”
  10. (ii) It is more appropriate for our graph-based semi-supervised model since it is difficult to assign labels to a pair of mention and concept in the referent graph.
    Page 6, “Relational Graph Construction”
  11. We propose a novel semi-supervised graph regularization framework based on the graph-based semi-supervised learning algorithm (Zhu et al., 2003):
    Page 6, “Semi-supervised Graph Regularization”

See all papers in Proc. ACL 2014 that mention semi-supervised.

See all papers in Proc. ACL that mention semi-supervised.

Back to top.

coreferential

Appears in 15 sentences as: Coreference (3) coreference (5) coreferential (7) coreferential, (1)
In Collective Tweet Wikification based on Semi-supervised Graph Regularization
  1. From a system-to-system perspective, wikification has demonstrated its usefulness in a variety of applications, including coreference resolution (Ratinov and Roth, 2012) and classification (Vitale et al., 2012).
    Page 1, “Introduction”
  2. _ _. Coreference . '
    Page 3, “Principles and Approach Overview”
  3. Principle 2 (Coreference): Two coreferential mentions should be linked to the same concept.
    Page 3, “Principles and Approach Overview”
  4. For example, if we know “nc” and “North Carolina” are coreferential , then they should both be linked to North Carolina.
    Page 3, “Principles and Approach Overview”
  5. The label assignment is obtained by our semi-supervised graph regularization framework based on a relational graph, which is constructed from local compatibility, coreference , and semantic relatedness relations.
    Page 3, “Principles and Approach Overview”
  6. In this subsection, we introduce the concept meta path which will be used to detect coreference (section 4.3) and semantic relatedness relations (section 4.4).
    Page 4, “Relational Graph Construction”
  7. 4.3 Coreference
    Page 5, “Relational Graph Construction”
  8. A coreference relation (Principle 2) usually occurs across multiple tweets due to the highly redundant information in Twitter.
    Page 5, “Relational Graph Construction”
  9. We consider two mentions mi and 7m coreferential if mi and 7m share the same surface form or one is an abbreviation of the other, and at least one meta path exists between mi and mj.
    Page 5, “Relational Graph Construction”
  10. Then we define the weight matrix representing the coreferential relation as:
    Page 5, “Relational Graph Construction”
  11. 1.0 if mi and 7m are coreferential,
    Page 5, “Relational Graph Construction”

See all papers in Proc. ACL 2014 that mention coreferential.

See all papers in Proc. ACL that mention coreferential.

Back to top.

labeled data

Appears in 11 sentences as: Labeled Data (1) labeled data (13)
In Collective Tweet Wikification based on Semi-supervised Graph Regularization
  1. In addition, it is challenging to generate sufficient high quality labeled data for supervised models with low cost.
    Page 1, “Abstract”
  2. Compared to the state-of-the-art supervised model trained from 100% labeled data, our proposed approach achieves comparable performance with 31% labeled data and obtains 5% absolute Fl gain with 50% labeled data .
    Page 1, “Abstract”
  3. Sufficient labeled data is crucial for supervised models.
    Page 1, “Introduction”
  4. In order to address these unique challenges for wikification for the short tweets, we employ graph-based semi-supervised learning algorithms (Zhu et al., 2003; Smola and Kondor, 2003; Blum et al., 2004; Zhou et al., 2004; Talukdar and Crammer, 2009) for collective inference by exploiting the manifold (cluster) structure in both unlabeled and labeled data .
    Page 2, “Introduction”
  5. In comparision with the supervised baseline proposed by (Meij et al., 2012), our model SSRega1 relying on local compatibility already achieves comparable performance with 50% of labeled data .
    Page 7, “Experiments”
  6. We can easily see that our proposed approach using 50% labeled data achieves similar performance with the state-of-the-art supervised model with 100% labeled data .
    Page 7, “Experiments”
  7. 6.4 Effect of Labeled Data Size
    Page 8, “Experiments”
  8. In this subsection, we study the effect of labeled data size on our full model.
    Page 8, “Experiments”
  9. We randomly sample 100 tweets as testing data, and randomly select 50, 100, 150, 200, 250, and 300 tweets as labeled data .
    Page 8, “Experiments”
  10. We find that as the size of the labeled data increases, our proposed model achieves better performance.
    Page 8, “Experiments”
  11. By studying three novel fine-grained relations, detecting semantically-related information with semantic meta paths, and exploiting the data manifolds in both unlabeled and labeled data for collective inference, our work can dramatically save annotation cost and achieve better performance, thus shed light on the challenging wikification task for tweets.
    Page 9, “Conclusions”

See all papers in Proc. ACL 2014 that mention labeled data.

See all papers in Proc. ACL that mention labeled data.

Back to top.

graph-based

Appears in 7 sentences as: graph-based (7)
In Collective Tweet Wikification based on Semi-supervised Graph Regularization
  1. In order to address these unique challenges for wikification for the short tweets, we employ graph-based semi-supervised learning algorithms (Zhu et al., 2003; Smola and Kondor, 2003; Blum et al., 2004; Zhou et al., 2004; Talukdar and Crammer, 2009) for collective inference by exploiting the manifold (cluster) structure in both unlabeled and labeled data.
    Page 2, “Introduction”
  2. effort to explore graph-based semi-supervised learning algorithms for the wikification task.
    Page 2, “Introduction”
  3. Compared to the referent graph which considers each mention or concept as a node in previous graph-based re-ranking approaches (Han et al., 2011; Shen et al., 2013), our
    Page 5, “Relational Graph Construction”
  4. (ii) It is more appropriate for our graph-based semi-supervised model since it is difficult to assign labels to a pair of mention and concept in the referent graph.
    Page 6, “Relational Graph Construction”
  5. We propose a novel semi-supervised graph regularization framework based on the graph-based semi-supervised learning algorithm (Zhu et al., 2003):
    Page 6, “Semi-supervised Graph Regularization”
  6. Non-collective methods usually rely on prior popularity and context similarity with supervised models (Mihalcea and Csomai, 2007; Milne and Witten, 2008b; Han and Sun, 2011), while collective approaches further leverage the global coherence between concepts normally through supervised or graph-based re-ranking models (Cucerzan, 2007; Milne and Witten, 2008b; Han and Zhao, 2009; Kulkarni et al., 2009; Pennacchiotti and Pantel, 2009; Ferragina and Scaiella, 2010; Fernandez et al., 2010; Radford et al., 2010; Cucerzan, 2011; Guo et al., 2011; Han and Sun, 2011; Han et al., 2011; Ratinov et al., 2011; Chen and Ji, 2011; Kozareva et al., 2011; Cassidy et al., 2012; Shen et al., 2013; Liu et al., 2013).
    Page 9, “Related Work”
  7. This work is also related to graph-based semi-supervised learning (Zhu et al., 2003; Smola and Kondor, 2003; Zhou et al., 2004; Talukdar and Crammer, 2009), which has been successfully applied in many Natural Language Processing tasks (Niu et al., 2005; Chen et al., 2006).
    Page 9, “Related Work”

See all papers in Proc. ACL 2014 that mention graph-based.

See all papers in Proc. ACL that mention graph-based.

Back to top.

fine-grained

Appears in 6 sentences as: fine-grained (6)
In Collective Tweet Wikification based on Semi-supervised Graph Regularization
  1. To tackle these challenges, we propose a novel semi-supervised graph regularization model to incorporate both local and global evidence from multiple tweets through three fine-grained relations.
    Page 1, “Abstract”
  2. In order to construct a semantic-rich graph capturing the similarity between mentions and concepts for the model, we introduce three novel fine-grained relations based on a set of local features, social networks and meta paths.
    Page 2, “Introduction”
  3. Our full model SSRegulgg achieves significant improvement over the supervised baseline (5% absolute Fl gain with 95.0% confidence level by the Wilcoxon Matched-Pairs Signed-Ranks Test), showing that incorporating global evidence from multiple tweets with fine-grained relations is beneficial.
    Page 7, “Experiments”
  4. Our method is a collective approach with the following novel advancements: (i) A novel graph representation with fine-grained relations, (ii) A unified framework based on meta paths to explore richer relevant context, (iii) Joint identification and linking of mentions under semi-supervised setting.
    Page 9, “Related Work”
  5. We introduce a novel graph that incorporates three fine-grained relations.
    Page 9, “Related Work”
  6. By studying three novel fine-grained relations, detecting semantically-related information with semantic meta paths, and exploiting the data manifolds in both unlabeled and labeled data for collective inference, our work can dramatically save annotation cost and achieve better performance, thus shed light on the challenging wikification task for tweets.
    Page 9, “Conclusions”

See all papers in Proc. ACL 2014 that mention fine-grained.

See all papers in Proc. ACL that mention fine-grained.

Back to top.

knowledge base

Appears in 5 sentences as: Knowledge Base (1) knowledge base (4)
In Collective Tweet Wikification based on Semi-supervised Graph Regularization
  1. Wikification for tweets aims to automatically identify each concept mention in a tweet and link it to a concept referent in a knowledge base (e.g., Wikipedia).
    Page 1, “Abstract”
  2. concept referent in a knowledge base (KB) (e.g., Wikipedia).
    Page 1, “Introduction”
  3. Knowledge Base (Wikipedia)
    Page 3, “Principles and Approach Overview”
  4. We use a Wikipedia dump on May 3, 2013 as our knowledge base , which includes 30 million pages.
    Page 6, “Experiments”
  5. The task of linking concept mentions to a knowledge base has received increased attentions over the past several years, from the linking of concept mentions in a single text (Mihalcea and Csomai, 2007; Milne and Witten, 2008b; Milne and Witten, 2008a; Kulkami et al., 2009; He et al., 2011; Ratinov et al., 2011; Cassidy et al., 2012; Cheng and Roth, 2013), to the linking of a cluster of corefer-
    Page 8, “Related Work”

See all papers in Proc. ACL 2014 that mention knowledge base.

See all papers in Proc. ACL that mention knowledge base.

Back to top.

end-to-end

Appears in 3 sentences as: end-to-end (3)
In Collective Tweet Wikification based on Semi-supervised Graph Regularization
  1. An end-to-end wikification system needs to solve two sub-problems: (i) concept mention detection, (ii) concept mention disambiguation.
    Page 1, “Introduction”
  2. The relatively low performance of the baseline system TagMe demonstrates that only relying on prior popularity and topical information within a single tweet is not enough for an end-to-end wikification system for the short tweets.
    Page 7, “Experiments”
  3. However, when ,u 2 0.4, the system performance dramatically decreases, showing that prior popularity is not enough for an end-to-end wikification system.
    Page 8, “Experiments”

See all papers in Proc. ACL 2014 that mention end-to-end.

See all papers in Proc. ACL that mention end-to-end.

Back to top.

learning algorithms

Appears in 3 sentences as: learning algorithm (1) learning algorithms (2)
In Collective Tweet Wikification based on Semi-supervised Graph Regularization
  1. In order to address these unique challenges for wikification for the short tweets, we employ graph-based semi-supervised learning algorithms (Zhu et al., 2003; Smola and Kondor, 2003; Blum et al., 2004; Zhou et al., 2004; Talukdar and Crammer, 2009) for collective inference by exploiting the manifold (cluster) structure in both unlabeled and labeled data.
    Page 2, “Introduction”
  2. effort to explore graph-based semi-supervised learning algorithms for the wikification task.
    Page 2, “Introduction”
  3. We propose a novel semi-supervised graph regularization framework based on the graph-based semi-supervised learning algorithm (Zhu et al., 2003):
    Page 6, “Semi-supervised Graph Regularization”

See all papers in Proc. ACL 2014 that mention learning algorithms.

See all papers in Proc. ACL that mention learning algorithms.

Back to top.