Linking Tweets to News: A Framework to Enrich Short Text Data in Social Media
Guo, Weiwei and Li, Hao and Ji, Heng and Diab, Mona

Article Structure

Abstract

Many current Natural Language Processing [NLP] techniques work well assuming a large context of text as input data.

Introduction

Recently there has been an increasing interest in language understanding of Twitter messages.

Task and Data

The task is given the text in a tweet, a system aims to find the most relevant news article.

Weighted Textual Matrix Factorization

The WTMF model (Guo and Diab, 2012a) has been successfully applied to the short text similarity task, achieving state-of-the-art unsupervised performance.

Creating Text-to-text Relations via Twitter/News Features

WTMF exploits the text-to-word information in a very nuanced way, while the dependency between texts is ignored.

WTMF on Graphs

We propose a novel model to incorporate the links generated as described in the previous section.

Experiments

6.1 Experiment Setting

Related Work

Short Text Semantics: The field of short text semantics has progressed immensely in recent years.

Conclusion

We propose a Linking-Tweets-to-News task, which potentially benefits many NLP applications where off-the-shelf NLP tools can be applied to the most relevant news.

Topics

news article

Appears in 39 sentences as: news article (31) news articles (15)
In Linking Tweets to News: A Framework to Enrich Short Text Data in Social Media
  1. To enable the NLP tools to better understand Twitter feeds, we propose the task of linking a tweet to a news article that is relevant to the tweet, thereby augmenting the context of the tweet.
    Page 1, “Introduction”
  2. For example, we want to supplement the implicit context of the above tweet with a news article such as the following entitled:
    Page 1, “Introduction”
  3. To create a gold standard dataset, we download tweets spanning over 18 days, each with a url linking to a news article of CNN or NYTIMES, as well as all the news of CNN and NYTIMES published during the period.
    Page 1, “Introduction”
  4. The goal is to predict the url referred news article based on the text in each tweet.1 We
    Page 1, “Introduction”
  5. Given the few number of words in a tweet (14 words on average in our dataset), the traditional high dimensional surface word matching is lossy and fails to pinpoint the news article .
    Page 2, “Introduction”
  6. Accordingly, we apply a latent variable model, namely, the Weighted Textual Matrix Factorization [WTMF] (Guo and Diab, 2012b; Guo and Diab, 2012c) to both the tweets and the news articles .
    Page 2, “Introduction”
  7. If two texts mention the same entities then they might describe the same event; (3) The temporal information in both genres (tweets and news articles ).
    Page 2, “Introduction”
  8. The task is given the text in a tweet, a system aims to find the most relevant news article .
    Page 2, “Task and Data”
  9. For gold standard data, we harvest all the tweets that have a single url link to a CNN or NYTIMES news article , dated from the 11th of Jan to the 27th of Jan, 2013.
    Page 2, “Task and Data”
  10. In evaluation, we consider this url-referred news article as the gold standard — the most relevant document for the tweet, and remove the url from the text of the tweet.
    Page 2, “Task and Data”
  11. We also collect all the news articles from both CNN and NYTIMES from RSS feeds during the same timeframe.
    Page 2, “Task and Data”

See all papers in Proc. ACL 2013 that mention news article.

See all papers in Proc. ACL that mention news article.

Back to top.

named entities

Appears in 19 sentences as: Named Entities (1) Named entities (2) named entities (12) Named Entity (2) named entity (5)
In Linking Tweets to News: A Framework to Enrich Short Text Data in Social Media
  1. We show that using tweet specific feature (hashtag) and news specific feature ( named entities ) as well as temporal constraints, we are able to extract text-to-text correlations, and thus completes the semantic picture of a short text.
    Page 1, “Abstract”
  2. such as named entities in a document.
    Page 2, “Introduction”
  3. Named entities acquired from a news document, typically with high accuracy using Named Entity Recognition [NER] tools, may be particularly informative.
    Page 2, “Introduction”
  4. 4.1 Hashtags and Named Entities
    Page 4, “Creating Text-to-text Relations via Twitter/News Features”
  5. Named entities are some of the most salient features in a news article.
    Page 4, “Creating Text-to-text Relations via Twitter/News Features”
  6. Directly applying Named Entity Recognition (NER) tools on news titles or
    Page 4, “Creating Text-to-text Relations via Twitter/News Features”
  7. Accordingly, we first apply the NER tool on news summaries, then label named entities in the tweets in the same way as labeling the hashtags: if there is a string in the tweet that matches a named entity from the summaries, then it is labeled as a named entity in the tweet.
    Page 4, “Creating Text-to-text Relations via Twitter/News Features”
  8. 25,132 tweets are assigned at least one named entity.2 To create the similar tweet set, we find k tweets that also contain the named entity .
    Page 4, “Creating Text-to-text Relations via Twitter/News Features”
  9. We extract the 3 subgraphs (based on hashtags, named entities and temporal) on news articles.
    Page 4, “Creating Text-to-text Relations via Twitter/News Features”
  10. However, automatically tagging hashtags or named entities leads to much worse performance (around 93% ATOP values, a 3% decrease from baseline WTMF).
    Page 4, “Creating Text-to-text Relations via Twitter/News Features”
  11. When a hashtag-matched word appears in a tweet, it is often related to the central meaning of the tweet, however news articles are generally much longer than tweets, resulting in many more hashtags/named entities matches even though these named entities may not be closely related.
    Page 4, “Creating Text-to-text Relations via Twitter/News Features”

See all papers in Proc. ACL 2013 that mention named entities.

See all papers in Proc. ACL that mention named entities.

Back to top.

latent variable

Appears in 7 sentences as: Latent variable (1) latent variable (6)
In Linking Tweets to News: A Framework to Enrich Short Text Data in Social Media
  1. The contribution of the paper is twofold: 1. we introduce the Linking-Tweets-to-News task as well as a dataset of linked tweetnews pairs, which can benefit many NLP applications; 2. in contrast to previous research which focuses on lexical features within the short texts (text-to-word information), we propose a graph based latent variable model that models the inter short text correlations (text-to-text information).
    Page 1, “Abstract”
  2. Latent variable models are powerful by going beyond the surface word level and mapping short texts into a low dimensional dense vector (Socher et al., 2011; Guo and Diab, 2012b).
    Page 2, “Introduction”
  3. Accordingly, we apply a latent variable model, namely, the Weighted Textual Matrix Factorization [WTMF] (Guo and Diab, 2012b; Guo and Diab, 2012c) to both the tweets and the news articles.
    Page 2, “Introduction”
  4. Our proposed latent variable model not only models text-to-word information, but also is aware of the text-to-text information (illustrated in Figure 1): two linked texts should have similar latent vectors, accordingly the semantic picture of a tweet is completed by receiving semantics from its related tweets.
    Page 2, “Introduction”
  5. As a latent variable model, it is able to capture global topics (+1.89% ATOP over LDA-wvec); moreover, by explicitly modeling missing words, the existence of a word is also encoded in the latent vector (+2.31% TOPIO and —0.011% RR over IR model).
    Page 6, “Experiments”
  6. The only evidence the latent variable models rely on is lexical items (WTMF-G extract additional text-to-text correlation by word matching).
    Page 8, “Experiments”
  7. We formalize the linking task as a short text modeling problem, and extract Twitter/news specific features to extract text-to-text relations, which are incorporated into a latent variable model.
    Page 9, “Conclusion”

See all papers in Proc. ACL 2013 that mention latent variable.

See all papers in Proc. ACL that mention latent variable.

Back to top.

LDA

Appears in 7 sentences as: LDA (7)
In Linking Tweets to News: A Framework to Enrich Short Text Data in Social Media
  1. WTMF is a state-of-the-art unsupervised model that was tested on two short text similarity datasets: (Li et al., 2006) and (Agirre et al., 2012), which outperforms Latent Semantic Analysis [LSA] (Landauer et al., 1998) and Latent Dirichelet Allocation [ LDA ] (Blei et al., 2003) by a large margin.
    Page 2, “Introduction”
  2. We employ it as a strong baseline in this task as it exploits and effectively models the missing words in a tweet, in practice adding thousands of more features for the tweet, by contrast LDA , for example, only leverages observed words (14 features) to infer the latent vector for a tweet.
    Page 2, “Introduction”
  3. For LDA-6 and LDA-wvec, we run Gibbs Sampling based LDA for 2000 iterations and average the model over the last 10 iterations.
    Page 5, “Experiments”
  4. For LDA we tune the hyperparameter 04 (Dirichlet prior for topic distribution of a document) and [3 (Dirichlet prior for word distribution given a topic).
    Page 6, “Experiments”
  5. (2010) also use hashtags to improve the latent representation of tweets in a LDA framework, Labeled-LDA (Ramage et al., 2009), treating each hashtag as a label.
    Page 8, “Related Work”
  6. Similar to the experiments presented in this paper, the result of using Labeled-LDA alone is worse than the IR model, due to the sparseness in the induced LDA latent vector.
    Page 8, “Related Work”
  7. (2011) apply an LDA based model on clustering by incorporating url referred documents.
    Page 8, “Related Work”

See all papers in Proc. ACL 2013 that mention LDA.

See all papers in Proc. ACL that mention LDA.

Back to top.

topic distribution

Appears in 7 sentences as: topic distribution (7)
In Linking Tweets to News: A Framework to Enrich Short Text Data in Social Media
  1. We use the inferred topic distribution 6 as a latent vector to represent the tweet/news.
    Page 5, “Experiments”
  2. The problem with LDA-6 is the inferred topic distribution latent vector is very sparse with only a few nonzero values, resulting in many tweet/news pairs receiving a high similarity value as long as they are in the same topic domain.
    Page 5, “Experiments”
  3. Hence following (Guo and Diab, 2012b), we first compute the latent vector of a word by ( topic distribution per word), then average the word latent vectors weighted by TF-IDF values to represent the short text, which yields much better results.
    Page 5, “Experiments”
  4. For LDA we tune the hyperparameter 04 (Dirichlet prior for topic distribution of a document) and [3 (Dirichlet prior for word distribution given a topic).
    Page 6, “Experiments”
  5. the inferred topic distribution of a text 6.
    Page 6, “Experiments”
  6. LDA-wvec preserves more information by creating a dense latent vector from the topic distribution of a word P(z|w), and thus does much better in ATOP.
    Page 6, “Experiments”
  7. The semantics of long documents are transferred to the topic distribution of tweets.
    Page 8, “Related Work”

See all papers in Proc. ACL 2013 that mention topic distribution.

See all papers in Proc. ACL that mention topic distribution.

Back to top.

gold standard

Appears in 5 sentences as: gold standard (5)
In Linking Tweets to News: A Framework to Enrich Short Text Data in Social Media
  1. To create a gold standard dataset, we download tweets spanning over 18 days, each with a url linking to a news article of CNN or NYTIMES, as well as all the news of CNN and NYTIMES published during the period.
    Page 1, “Introduction”
  2. For gold standard data, we harvest all the tweets that have a single url link to a CNN or NYTIMES news article, dated from the 11th of Jan to the 27th of Jan, 2013.
    Page 2, “Task and Data”
  3. In evaluation, we consider this url-referred news article as the gold standard — the most relevant document for the tweet, and remove the url from the text of the tweet.
    Page 2, “Task and Data”
  4. For our task evaluation, ideally, we would like the system to be able to identify the news article specifically referred to by the url within each tweet in the gold standard .
    Page 3, “Task and Data”
  5. We also collect a gold standard dataset by crawling tweets each with a url referring to a news article.
    Page 9, “Conclusion”

See all papers in Proc. ACL 2013 that mention gold standard.

See all papers in Proc. ACL that mention gold standard.

Back to top.

NER

Appears in 4 sentences as: NER (4)
In Linking Tweets to News: A Framework to Enrich Short Text Data in Social Media
  1. Named entities acquired from a news document, typically with high accuracy using Named Entity Recognition [ NER ] tools, may be particularly informative.
    Page 2, “Introduction”
  2. Directly applying Named Entity Recognition ( NER ) tools on news titles or
    Page 4, “Creating Text-to-text Relations via Twitter/News Features”
  3. Accordingly, we first apply the NER tool on news summaries, then label named entities in the tweets in the same way as labeling the hashtags: if there is a string in the tweet that matches a named entity from the summaries, then it is labeled as a named entity in the tweet.
    Page 4, “Creating Text-to-text Relations via Twitter/News Features”
  4. The noise introduced during automatic NER accumulates much faster given the large number of named entities in news data.
    Page 4, “Creating Text-to-text Relations via Twitter/News Features”

See all papers in Proc. ACL 2013 that mention NER.

See all papers in Proc. ACL that mention NER.

Back to top.

significant improvement

Appears in 4 sentences as: significant improvement (3) significantly improve (1)
In Linking Tweets to News: A Framework to Enrich Short Text Data in Social Media
  1. Our experiments show significant improvement of our new model over baselines with three evaluation metrics in the new task.
    Page 1, “Abstract”
  2. With WTMF being a very challenging baseline, WTMF-G can still significantly improve all 3 metrics.
    Page 7, “Experiments”
  3. In the case k = 4, 6 = 3 compared to WTMF, WTMF-G receives +1.37l% TOP10, +2.727% RR, and +0.518% ATOP value (this is a significant improvement of ATOP value considering that it is averaged on 30,000 data points, at an already high level of 96% reducing error rate by 13%).
    Page 7, “Experiments”
  4. We achieve significant improvement over baselines.
    Page 9, “Conclusion”

See all papers in Proc. ACL 2013 that mention significant improvement.

See all papers in Proc. ACL that mention significant improvement.

Back to top.

cosine similarity

Appears in 3 sentences as: cosine similarity (3)
In Linking Tweets to News: A Framework to Enrich Short Text Data in Social Media
  1. In the WTMF model, we would like the latent vectors of two text nodes Q.,j1,Q.,j-2 to be as similar as possible, namely that their cosine similarity to be close to 1.
    Page 5, “WTMF on Graphs”
  2. Evaluation: The similarity between a tweet and a news article is measured by cosine similarity .
    Page 5, “Experiments”
  3. 4 The cosine similarity
    Page 5, “Experiments”

See all papers in Proc. ACL 2013 that mention cosine similarity.

See all papers in Proc. ACL that mention cosine similarity.

Back to top.

objective function

Appears in 3 sentences as: objective function (2) objective function: (1)
In Linking Tweets to News: A Framework to Enrich Short Text Data in Social Media
  1. P and Q are optimized by minimize the objective function:
    Page 3, “Weighted Textual Matrix Factorization”
  2. To implement this, we add a regularization term in the objective function of WTMF (equation 2) for each linked pairs ijl , 6233-2:
    Page 5, “WTMF on Graphs”
  3. Therefore we approximate the objective function by treating the vector length |Q.,j as fixed values during the ALS iterations:
    Page 5, “WTMF on Graphs”

See all papers in Proc. ACL 2013 that mention objective function.

See all papers in Proc. ACL that mention objective function.

Back to top.