Abstract | While user attribute extraction on social media has received considerable attention, existing approaches, mostly supervised, encounter great difficulty in obtaining gold standard data and are therefore limited to predicting unary predicates (e.g., gender). |
Abstract | Users’ profiles from social media websites such as Facebook or Google Plus are used as a distant source of supervision for extraction of their attributes from user-generated text. |
Abstract | In addition to traditional linguistic features used in distant supervision for information extraction, our approach also takes into account network information, a unique opportunity offered by social media . |
Dataset Creation | Spouse Facebook is the only type of social media where spouse information is commonly displayed. |
Introduction | The overwhelming popularity of online social media creates an opportunity to display given aspects of oneself. |
Introduction | We are optimistic that our approach can easily be applied to further user attributes such as HOBBIES and INTERESTS (MOVIES, BOOKS, SPORTS or STARS), RELIGION, HOMETOWN, LIVING LOCATION, FAMILY MEMBERS and so on, where training data can be obtained by matching ground truth retrieved from multiple types of online social media such as Facebook, Google Plus, or LinkedIn. |
Introduction | 0 We present a large-scale dataset for this task gathered from various structured and unstructured social media sources. |
Related Work | While user profile inference from social media has received considerable attention (Al Zamal et al., 2012; Rao and Yarowsky, 2010; Rao et al., 2010; Rao et al., 2011), most previous work has treated this as a classification task where the goal is to predict unary predicates describing attributes of the user. |
Related Work | Homophily Online social media offers a rich source of network information. |
Related Work | (2001) discovered that people sharing more attributes such as background or hobby have a higher chance of becoming friends in social media . |
Abstract | Code-switched documents are common in social media , providing evidence for polylingual topic models to infer aligned topics across languages. |
Code-Switching | Code-switching specifically in social media has also received some recent attention. |
Introduction | Topic models (Blei et al., 2003) have become standard tools for analyzing document collections, and topic analyses are quite common for social media (Paul and Dredze, 2011; Zhao et al., 2011; Hong and Davison, 2010; Ramage et al., 2010; Eisenstein et al., 2010). |
Introduction | In social media especially, there is a large diversity in terms of both the topic and language, necessitating the modeling of multiple languages simultaneously. |
Introduction | However, the ever changing vocabulary and topics of social media (Eisenstein, 2013) make finding suitable comparable corpora difficult. |
Abstract | Existing models for social media personal analytics assume access to thousands of messages per user, even though most users author content only sporadically over time. |
Batch Models | The proposed baseline model follows the same trends as the existing state-of-the-art approaches for user attribute classification in social media as described in Section 8. |
Batch Models | 7We use log-linear models over reasonable alternatives such as perceptron or SVM, following the practice of a wide range of previous work in related areas (Smith, 2004; Liu et a1., 2005; Poon et a1., 2009) including text classification in social media (Van Durme, 2012b; Yang and Eisenstein, 2013). |
Batch Models | Following the streaming nature of social media , we see the scarce available resource as the number of requests allowed per day to the Twitter API. |
Conclusions and Future Work | This may be also the effect of data heterogeneity in social media compared to e.g., political debate text (Thomas et al., 2006). |
Introduction | Inferring latent user attributes such as gender, age, and political preferences (Rao et al., 2011; Zamal et al., 2012; Cohen and Ruths, 2013) automatically from personal communications and social media including emails, blog posts or public discussions has become increasingly popular with the web getting more social and volume of data available. |
Introduction | In this paper we analyze and go beyond static models formulating personal analytics in social media as a streaming task. |
Related Work | Supervised Batch Approaches The vast majority of work on predicting latent user attributes in social media apply supervised static SVM models for discrete categorical e.g., gender and regression models for continuous attributes e.g., age with lexical bag-of-word features for classifying user gender (Garera and Yarowsky, 2009; Rao et al., 2010; Burger et al., 2011; Van Durme, 2012b), age (Rao et al., 2010; Nguyen et al., 2011; Nguyen et al., 2013) or political orientation. |
Related Work | Additionally, using social media for mining political opinions (O’Connor et al., 2010a; Maynard and Funk, 2012) or understanding sociopolitical trends and voting outcomes (Tumasjan et al., 2010; Gayo-Avello, 2012; Lampos et al., 2013) is becoming a common practice. |
Related Work | (2013) propose a bilinear user-centric model for predicting voting intentions in the UK and Australia from social media data. |
Abstract | With the proliferation of social media sites, social streams have proven to contain the most up-to-date information on current events. |
Abstract | However, it is not straightforward to adapt the existing event extraction systems since texts in social media are fragmented and noisy. |
Abstract | In this paper we propose a simple and yet effective Bayesian model, called Latent Event Model (LEM), to extract structured representation of events from social media . |
Conclusions and Future Work | In this paper we have proposed an unsupervised Bayesian model, called the Latent Event Model (LEM), to extract the structured representation of events from social media data. |
Introduction | With the increasing popularity of social media , social networking sites such as Twitter have become an important source of event information. |
Introduction | Social media messages are often short and evolve rapidly over time. |
Introduction | In our work here, we notice a very important property in social media data that the same event could be referenced by high volume messages. |
Introduction | Contemporary journalism is increasingly conducted through social media services like Twitter (Lotan et al., 2011; Hermida et al., 2012). |
Introduction | However, less is known about this phenomenon in social media — a domain whose endemic uncertainty makes proper treatment of factuality even more crucial (Morris et al., 2012). |
Related work | search, which focuses on quoted statements in social media text. |
Related work | Credibility in social media Recent work in the area of computational social science focuses on understanding credibility cues on Twitter. |
Related work | The search for reliable signals of information credibility in social media has led to the construction of automatic classifiers to identify credible tweets (Castillo et al., 2011). |
Annotator disagreements across domains and languages | Besides these English data sets, we also obtained doubly-annotated POS data from the French Social Media Bank project (Seddah et al., 2012).3 All data sets, except the French one, are publicly available at http: / /lowlands . |
Annotator disagreements across domains and languages | Lastly, we compare the disagreements of annotators on a French social media data set (Seddah et al., 2012), which we mapped to the universal POS tag set. |
Hard cases and annotation errors | Figure 3: Disagreement on French social media |
Introduction | N OUN VERB ADP/PRT ADV/NOUN (2) Noam likes social media |
Approach | To address the large scale and complexity of language use in social media , we modify the way in which text is presented to ADIOS by focusing separately on text around key terms of interest, rather than processing all sentences en masse. |
Introduction | There is an obvious need for text mining techniques to deal with large volumes of very diverse material, especially since the advent of social media and user-generated content which includes dynamic discussions of wide-ranging and controversial topics. |
Introduction | We see one particular area of application in elucidating the semantic content of social media debates about controversial topics, like climate change, both for casual users, and for social scientists studying online discourses. |
Introduction | Social media such as Twitter, Facebook or YouTube contain rapidly changing information generated by millions of users that can dramatically affect the reputation of a person or an organization. |
Introduction | This raises the importance of automatic extraction of sentiments and opinions expressed in social media . |
Related work | The aforementioned corpora are, however, only partially suitable for developing models on social media , since the informal text poses additional challenges for Information Extraction and Natural Language Processing. |