Huang, Hongzhao and Wen, Zhen and Yu, Dian and Ji, Heng and Sun, Yizhou and Han, Jiawei and Li, He
Article Structure
Abstract
In some societies, internet users have to create information morphs (e.g.
Introduction
Language constantly evolves to maximize communicative success and expressive power in daily social interactions.
Approach Overview
Morph Query
Target Candidate Identification
The general goal of the first step is to identify a list of target candidates for each morph query from the comparable corpora including Sina Weibo, Chinese News websites and English Twitter.
Target Candidate Ranking
Next, we propose a learning-to-rank framework to rank target candidates based on various levels of novel features based on surface, semantic and social analysis.
Experiments
Next, we present the experiment under various settings shown in Table 3, and the impacts of cross source and cross genre information.
Related Work
To analyze social media behavior under active censorship, (Bamman et al., 2012) automatically discovered politically sensitive terms from Chinese tweets based on message deletion analysis.
Conclusion and Future Work
To the best of our knowledge, this is the first work of resolving implicit information morphs from the data under active censorship.
Topics
similarity measures
Appears in 14 sentences as: Similarity Measurements (1) similarity measurements (2) similarity measures (11)
In Resolving Entity Morphs in Censored Data
- Then we propose various novel similarity measurements including surface features, meta-path based semantic features and social correlation features and combine them in a learning-to-rank framework.
Page 1, “Abstract”
- 0 We propose two new similarity measures , as well as integrating temporal information into
Page 2, “Introduction”
- the similarity measures to generate global semantic features.
Page 2, “Introduction”
- We first extract surface features between the morph and the candidate based on measuring orthographic similarity measures which were commonly used in entity coreference resolution (e.g.
Page 3, “Target Candidate Ranking”
- 4.2.3 Meta-Path-Based Semantic Similarity Measurements
Page 4, “Target Candidate Ranking”
- We then adopt meta-path-based similarity measures (Sun et al., 2011a; Sun et al., 2011b), which are defined over heterogeneous networks to extract semantic features.
Page 4, “Target Candidate Ranking”
- For the determined meta-paths, we extract semantic features using the similarity measures proposed in (Sun et al., 2011a; Hsiung et al., 2005).
Page 4, “Target Candidate Ranking”
- We now list several meta-path-based similarity measures below.
Page 4, “Target Candidate Ranking”
- Beyond the above similarity measures , we also propose to use cosine-similarity-style normalization method to modify common neighbor and pairwise random walk measures so that we can ensure the morph node and the target candidate node are strongly connected and also have similar popularity.
Page 5, “Target Candidate Ranking”
- The above similarity measures can also be applied to homogeneous networks that do not differentiate the neighbor types.
Page 5, “Target Candidate Ranking”
- into similarity measures to generate global semantic features.
Page 5, “Target Candidate Ranking”
See all papers in Proc. ACL 2013 that mention similarity measures.
See all papers in Proc. ACL that mention similarity measures.
Back to top.
social media
Appears in 11 sentences as: social media (13)
In Resolving Entity Morphs in Censored Data
- The proliferation of online social media significantly expedites this evolution, as new phrases triggered by social events may be disseminated rapidly in social media .
Page 1, “Introduction”
- To automatically analyze such fast evolving language in social media , new computational models are demanded.
Page 1, “Introduction”
- We believe that successful resolution of morphs is a crucial step for automated understanding of the fast evolving social media language, which is important for social media marketing (Bar-wise and Meehan, 2010).
Page 1, “Introduction”
- However, morph resolution in social media is challenging due to the following reasons.
Page 2, “Introduction”
- Thus, the co-occurrence of a morph and its target is quite low in the vast amount of information in social media .
Page 2, “Introduction”
- 0 We detect target candidates by exploiting the dynamics of the social media to extract temporal distribution of entities, based on the assumption that the popularity of an individual is correlated between censored and uncensored text within a certain time window.
Page 2, “Introduction”
- Unfortunately the state-of-the-art techniques for these tasks still perform poorly on social media in terms of both accuracy and coverage of important information, these sophisticated semantic links all produced negative impact on the target ranking performance.
Page 4, “Target Candidate Ranking”
- In contrast, users are less restricted in some other uncensored social media such as Twitter.
Page 5, “Target Candidate Ranking”
- Because of such social correlation, close social neighbors in social media such as Twitter and Weibo may post similar information, or share similar opinion.
Page 6, “Target Candidate Ranking”
- As shown in Table9 (K is the number of predefined topics), PLSA is not quite effective mainly because traditional topic modeling approaches do not perform well on short texts from social media .
Page 8, “Experiments”
- To analyze social media behavior under active censorship, (Bamman et al., 2012) automatically discovered politically sensitive terms from Chinese tweets based on message deletion analysis.
Page 9, “Related Work”
See all papers in Proc. ACL 2013 that mention social media.
See all papers in Proc. ACL that mention social media.
Back to top.
named entities
Appears in 5 sentences as: Named entities (1) named entities (2) named entity (2)
In Resolving Entity Morphs in Censored Data
- However, obviously we cannot consider all of the named entities in these sources as target candidates due to the sheer volume of information.
Page 3, “Target Candidate Identification”
- In addition, morphs are not limited to named entity forms.
Page 3, “Target Candidate Identification”
- Then we apply a hierarchical Hidden Markov Model (HMM) based Chinese lexical analyzer ICTCLAS (Zhang et al., 2003) to extract named entities , noun phrases and events.
Page 4, “Target Candidate Ranking”
- Named entities which co-occur at least 6 times with a morph query in the same topic are selected as its target candidates.
Page 8, “Experiments”
- Other similar research lines are the TAC-KBP Entity Linking (EL) (Ji et al., 2010; Ji et al., 2011), which links a named entity in news and web documents to an appropriate knowledge base (KB) entry, the task of mining name translation pairs from comparable corpora (Udupa et al., 2009; Ji, 2009; Fung and Yee, 1998; Rapp, 1999; Shao and Ng, 2004; Hassan et al., 2007) and the link prediction problem (Adamic and Adar, 2001; Liben-Nowell and Kleinberg, 2003; Sun et al., 2011b;
Page 9, “Related Work”
See all papers in Proc. ACL 2013 that mention named entities.
See all papers in Proc. ACL that mention named entities.
Back to top.
noun phrases
Appears in 4 sentences as: Noun Phrases (1) noun phrases (3)
In Resolving Entity Morphs in Censored Data
- Then we apply a hierarchical Hidden Markov Model (HMM) based Chinese lexical analyzer ICTCLAS (Zhang et al., 2003) to extract named entities, noun phrases and events.
Page 4, “Target Candidate Ranking”
- Therefore we limited the types of vertices into: Morph (M), Entity(E), which includes target candidates, Event (EV), and NonEntity Noun Phrases (NP); and used co-occnrrence as the edge type.
Page 4, “Target Candidate Ranking”
- We extract entities, events, and nonentity noun phrases that occur in more than one tweet as neighbors.
Page 4, “Target Candidate Ranking”
- entities, events, nonentity noun phrases ).
Page 4, “Target Candidate Ranking”
See all papers in Proc. ACL 2013 that mention noun phrases.
See all papers in Proc. ACL that mention noun phrases.
Back to top.
semantic similarity
Appears in 4 sentences as: semantic similarities (1) Semantic Similarity (1) semantic similarity (2)
In Resolving Entity Morphs in Censored Data
- 0 We model social user behaviors and use social correlation to assist in measuring semantic similarities because the users who posted a morph and its corresponding target tend to share similar interests and opinions.
Page 2, “Introduction”
- 4.2.3 Meta-Path-Based Semantic Similarity Measurements
Page 4, “Target Candidate Ranking”
- In this paper we exploit cross-genre information and social correlation to measure semantic similarity .
Page 9, “Related Work”
- Both of the Meta-path based and social correlation based semantic similarity measurements are proven powerful and complementary.
Page 9, “Conclusion and Future Work”
See all papers in Proc. ACL 2013 that mention semantic similarity.
See all papers in Proc. ACL that mention semantic similarity.
Back to top.
co-occurrence
Appears in 3 sentences as: co-occurrence (3)
In Resolving Entity Morphs in Censored Data
- Thus, the co-occurrence of a morph and its target is quite low in the vast amount of information in social media.
Page 2, “Introduction”
- After applying the same annotation techniques as tweets for uncensored data sets, sentence-level co-occurrence relations are extracted and integrated into the network as shown in Figure 3.
Page 6, “Target Candidate Ranking”
- Retweets and redundant web documents are filtered to ensure more reliable frequency counting of co-occurrence relations.
Page 6, “Experiments”
See all papers in Proc. ACL 2013 that mention co-occurrence.
See all papers in Proc. ACL that mention co-occurrence.
Back to top.
topic modeling
Appears in 3 sentences as: topic modeling (3)
In Resolving Entity Morphs in Censored Data
- For comparison we also attempted topic modeling approach to detect target candidates, as shown in section 5.3.
Page 3, “Target Candidate Identification”
- We also attempted using topic modeling approach to detect target candidates.
Page 8, “Experiments”
- As shown in Table9 (K is the number of predefined topics), PLSA is not quite effective mainly because traditional topic modeling approaches do not perform well on short texts from social media.
Page 8, “Experiments”
See all papers in Proc. ACL 2013 that mention topic modeling.
See all papers in Proc. ACL that mention topic modeling.
Back to top.