Article Structure
Abstract
We present a novel method for record extraction from social streams such as Twitter.
Introduction
We propose a method for discovering event records from social media feeds such as Twitter.
Related Work
A large number of information extraction approaches exploit redundancy in text collections to improve their accuracy and reduce the need for manually annotated data (Agichtein and Gravano, 2000; Yangarber et al., 2000; Zhu et al., 2009; Mintz et al., 2009a; Yao et al., 2010b; Hasegawa et al., 2004; Shinyama and Sekine, 2006).
Problem Formulation
Here we describe the key latent and observed random variables of our problem.
Model
Our model can be represented as a factor graph which takes the form,
Inference
Our goal is to predict a set of records R. Ideally we would like to compute P(R|x), marginalizing out the nuisance variables A and y.
Evaluation Setup
Data We apply our approach to construct a database of concerts in New York City.
Evaluation
The evaluation of record construction is challenging because many induced music events discussed
Conclusion
We presented a novel model for record extraction from social media streams such as Twitter.
Topics
CRF
Appears in 11 sentences as: +CRF (2) CRF (11)
In Event Discovery in Social Media Feeds
- We bias local decisions made by the CRF to be consistent with canonical record values, thereby facilitating consistency within an event cluster.
Page 2, “Introduction”
- The sequence labeling factor is similar to a standard sequence CRF (Lafferty et al., 2001), Where the potential over a message label sequence decomposes
Page 3, “Model”
- The weights of the CRF component of our model, QSEQ, are the only weights learned at training time, using a distant supervision process described in Section 6.
Page 5, “Model”
- Since a uniform initialization of all factors is a saddle-point of the objective, we opt to initialize the q(y) factors with the marginals obtained using just the CRF parameters, accomplished by running forwards-backwards on all messages using only the
Page 6, “Inference”
- To do so, we run the CRF component of our model (ngEQ) over the corpus and extract, for each 6, all spans that have a token-level probability of being labeled 6 greater than A = 0.1.
Page 7, “Inference”
- +LowThresh +CRF +List -)'(-OurWork
Page 8, “Evaluation Setup”
- The CRF lines terminate because of low record yield.
Page 8, “Evaluation Setup”
- Our List Baseline labels messages by finding string overlaps against a list of musical artists and venues scraped from web data (the same lists used as features in our CRF component).
Page 8, “Evaluation Setup”
- The CRF Baseline is most similar to Mann and Yarowsky (2005)’s CRF Voting method and uses the maximum likelihood CRF labeling of each message.
Page 8, “Evaluation Setup”
- +LowThresh +CRF +List *Our Work +Our Work+Con
Page 9, “Evaluation”
- The CRF and hard-constrained consensus lines terminate because of low record yield.
Page 9, “Evaluation”
See all papers in Proc. ACL 2011 that mention CRF.
See all papers in Proc. ACL that mention CRF.
Back to top.
social media
Appears in 7 sentences as: Social Media (1) Social media (2) social media (4)
In Event Discovery in Social Media Feeds
- We propose a method for discovering event records from social media feeds such as Twitter.
Page 1, “Introduction”
- Social media messages are often short, make heavy use of colloquial language, and require situational context for interpretation (see examples in Figure 1).
Page 1, “Introduction”
- Social Media Feeds
Page 1, “Introduction”
- These properties of social media streams make existing extraction techniques significantly less effective.
Page 1, “Introduction”
- Social media is a natural place to discover new events missed by curation, but mentioned online by someone planning to attend.
Page 1, “Introduction”
- While our experiments utilized binary relations, we believe our general approach should be useful for nary relation recovery in the social media domain.
Page 9, “Evaluation”
- We presented a novel model for record extraction from social media streams such as Twitter.
Page 9, “Conclusion”
See all papers in Proc. ACL 2011 that mention social media.
See all papers in Proc. ACL that mention social media.
Back to top.
sequence labeling
Appears in 4 sentences as: Sequence Labeling (1) sequence labeling (3)
In Event Discovery in Social Media Feeds
- Message Labels (3/): We assume that each message has a sequence labeling , where the labels consist of the record fields (e.g., ARTIST and VENUE) as well as a NONE label denoting the token does not correspond to any domain field.
Page 3, “Problem Formulation”
- 4.1 Sequence Labeling Factor
Page 3, “Model”
- The sequence labeling factor is similar to a standard sequence CRF (Lafferty et al., 2001), Where the potential over a message label sequence decomposes
Page 3, “Model”
- As mentioned in Section 4, the only learned parameters in our model are those associated with the sequence labeling factor ngEQ.
Page 7, “Evaluation Setup”
See all papers in Proc. ACL 2011 that mention sequence labeling.
See all papers in Proc. ACL that mention sequence labeling.
Back to top.
distant supervision
Appears in 3 sentences as: distant supervision (2) “distant supervision” (1)
In Event Discovery in Social Media Feeds
- Our work also relates to recent approaches for relation extraction with distant supervision (Mintz et al., 2009b; Bunescu and Mooney, 2007; Yao et al., 2010a).
Page 2, “Related Work”
- The weights of the CRF component of our model, QSEQ, are the only weights learned at training time, using a distant supervision process described in Section 6.
Page 5, “Model”
- While it is possible to train these parameters via direct annotation of messages with label sequences, we opted instead to use a simple approach where message tokens from the training weekend are labeled via their intersection with gold records, often called “distant supervision” (Mintz et al., 2009b).
Page 7, “Evaluation Setup”
See all papers in Proc. ACL 2011 that mention distant supervision.
See all papers in Proc. ACL that mention distant supervision.
Back to top.