Abstract | We present a reformulation of the word pair features typically used for the task of disambiguating implicit relations in the Penn Discourse Treebank. |
Abstract | Our word pair features achieve significantly higher performance than the previous formulation when evaluated without additional features. |
Introduction | Without an explicit marker to rely on, work on this task initially focused on using lexical cues in the form of word pairs mined from large corpora where they appear around an explicit marker (Marcu and Echihabi, 2002). |
Introduction | The intuition is that these pairs will tend to represent semantic relationships which are related to the discourse marker (for example, word pairs often appearing around but may tend to be antonyms). |
Introduction | While this approach showed some success and has been used extensively in later work, it has been pointed out by multiple authors that many of the most useful word pairs |
Related Work | This line of research began with (Marcu and Echihabi, 2002), who used a small number of unambiguous explicit markers and patterns involving them, such as [Arg1, but Arg2] to collect sets of word pairs from a large corpus using the crossproduct of the words in Argl and Arg2. |
Related Work | Second, it is constructed with the same unsupervised method they use to extract the word pairs -by assuming that the patterns correspond to a particular relation and collecting the arguments from an unannotated corpus. |
Related Work | They used word pairs as well as additional features to train four binary classifiers, each corresponding to one of the high-level PDTB relation classes. |
DNN for word alignment | In contrast, our model does not maintain a separate translation score parameters for every source-target word pair , but computes tlegc through a multilayer network, which naturally handles contexts on both sides without explosive growth of number of parameters. |
DNN for word alignment | The example computes translation score for word pair (yibula, yibulayin) given its surrounding context. |
DNN for word alignment | For word pair (61', fj), we take fixed length windows surrounding both 6, and fj as input: (€i_%, . |
Introduction | As shown in example (a) of Figure 1, in word pair {“juda” =>“mammot ”}, the Chinese word “juda” is a common word, but |
Introduction | For example (b) in Figure l, for the word pair {“yibula” => “Yibula”}, both the Chinese word “yibula” and English word “Yibula” are rare name entities, but the words around them are very common, which are {“nongmin”, “shuo”} for Chinese side and {“farmer”, “said”} for the English side. |
Introduction | The pattern of the context {“nongmin X shuo” => “farmer X said”} may help to align the word pair which fill the variable X, and also, the pattern {“yixiang X gongcheng” => “a X job”} is helpful to align the word pair {“juda” =>“mammoth”} for example (a). |
Training | ma${071 _ t9((ev f)+|e7 + t9((ev f)—|e7 (10) where (e, f)+ is a correct word pair, (6, f)‘ is a wrong word pair in the same sentence, and 759 is as defined in Eq. |
Training | This training criteria essentially means our model suffers loss unless it gives correct word pairs a higher score than random pairs from the same sentence pair with some margin. |
Training | We randomly cycle through all sentence pairs in training data; for each correct word pair (including null alignment), we generate a positive example, and generate two negative examples by randomly corrupting either |
Application to Essay Scoring | We calculated correlations between essay score and the proportion of word pairs in each of the 60 bins of the WAP histogram, separately for each of the prompts p1-p6 in setA. |
Application to Essay Scoring | Next, observe the consistent negative correlations between essay score and the proportion of word pairs in bins PMI=0.833 through PMI=1.5. |
Conclusion | We hypothesize that this pattern is consistent with the better essays demonstrating both a better topic development (hence the higher percentage of highly related pairs) and a more creative use of language resources, as manifested in a higher percentage of word pairs that generally do not tend to appear together. |
Illustration: The shape of the distribution | Yet, the picture at the right tail is remarkably similar to that of the essays, with 9% of word pairs , on average, having PMI>2.17. |
Illustration: The shape of the distribution | The right tail, with PMI>2.17, holds 19% of all word pairs in these texts — more than twice the proportion in essays written by college graduates or in texts from the WSJ. |
Introduction | fact that a text segmentation algorithm that uses information about patterns of word co-occurrences can detect subtopic shifts in a text (Riedl and Bie-mann, 2012; Misra et al., 2009; Eisenstein and Barzilay, 2008) tells us that texts contain some proportion of more highly associated word pairs (those in subsequent sentences within the same topical unit) and of less highly associated pairs (those in sentences from different topical units).1 Yet, does each text have a different distribution of highly associated, mildly associated, unassoci-ated, and disassociated pairs of words, or do texts tend to strike a similar balance of these? |
Methodology | The third decision is how to represent the co-occurrence profiles; we use a histogram where each bin represents the proportion of word pairs in the given interval of PMI values. |
Methodology | The lowest bin (shown in Figures 1 and 2 as PMI = —5) contains pairs with PMIg—S; the topmost bin (shown in Figures 1 and 2 as PMI = 4.83) contains pairs with PMI > 4.67, while the rest of the bins contain word pairs (:c,y) with —5 <PMI(x,y) g 4.67. |
Methodology | Thus, the text “The dog barked and wagged its tail” is much tighter than the text “Green ideas sleep furiously”, with all the six content word pairs scoring above PMI=5.5 in the first and below PMI=2.2 in the second.4 |
Related Work | Our results suggest that this direction is promising, as merely the proportion of highly associated word pairs is already contributing a clear signal regarding essay quality; it is possible that additional information can be derived from richer representations common in the lexical cohesion literature. |
Abstract | Sentiment Similarity of word pairs reflects the distance between the words regarding their underlying sentiments. |
Abstract | This paper aims to infer the sentiment similarity between word pairs with respect to their senses. |
Abstract | The resultant emotional vectors are then employed to infer the sentiment similarity of word pairs . |
Analysis and Discussions | For this purpose, we repeat the experiment for SO prediction by computing sentiment similarity of word pairs with and without using synonyms and antonyms. |
Introduction | In this paper, we show that sentiment similarity between word pairs can be effectively utilized to compute SO of words. |
Introduction | 0 We propose an effective approach to predict the sentiment similarity between word pairs through hidden emotions at the sense level, |
Related Works | Most previous works employed semantic similarity of word pairs to address SO prediction and IQAP inference tasks. |
Sentiment Similarity through Hidden Emotions | As we discussed above, semantic similarity measures are less effective to infer sentiment similarity between word pairs . |
Experiments | In other words, the IDF values help decide the importance of word pairs to the model. |
Experiments | 4 to the word pair and use their estimated degree of synonymy, antonymy, hyponymy and semantic relatedness as features. |
Experiments | 5, the features for the whole questiorflsentence pair are the average and max of features of all the word pairs . |
Learning QA Matching Models | It then aggregates features extracted from each of these word pairs to represent the whole questiorflsentence pair. |
Learning QA Matching Models | Given a word pair (wq,w8), where mg 6 Vq and ws 6 V8, feature functions o1, - -- ,gbd map it to a d-dimensional real-valued feature vector. |
Connotation Induction Algorithms | We experimented with many different variations on the graph structure and edge weights, including ones that include any word pairs that occurred frequently enough together. |
Connotation Induction Algorithms | R59”: word pairs in synonyms relation. |
Connotation Induction Algorithms | Ram: word pairs in antonyms relation. |
Related work | The Word Pair Classification (WPC) method (J iang and Liu, 2010) modifies the DPA method and makes it more robust. |
Unsupervised Dependency Grammar Induction | denotes the word pair dependency relationship (e;- —> 63-). |
Unsupervised Dependency Grammar Induction | Based on the features around deij, we can calculate the probability Pr(y|deij) that the word pair dew. |
Unsupervised Dependency Grammar Induction | where y is the category of the relationship of dew: y = + means it is the probability that the word pair deij can form a dependency arc and y = —means the contrary. |
Background | is one of the few examples where distributional representations are used for word pairs . |
Experiments | The task is thus to rank these pairs of word pairs by their semantic similarity. |
Experiments | We assume fixed parse trees for all of the compounds (Figure 6), and use these to compute compound level vectors for all word pairs . |
Joint Model of Extraction and Compression | Although the authors of QSB also provided scores of word pairs to avoid putting excessive penalties |
Joint Model of Extraction and Compression | on word overlaps, we do not score word pairs . |
Joint Model of Extraction and Compression | The score function is supermodular as a score function of subtree extraction3, because the union of two subtrees can have extra word pairs that are not included in either subtree. |
A Unified Semantic Representation | Commonly, semantic comparisons are between word pairs or sentence pairs that do not have their lexical content sense-annotated, despite the potential utility of sense annotation in making semantic comparisons. |
Experiment 2: Word Similarity | The dataset contains 65 word pairs judged by 51 human subjects on a scale of 0 to 4 according to their semantic similarity. |
Introduction | Third, we demonstrate that this single representation can achieve state-of-the-art performance on three similarity tasks, each operating at a different lexical level: (1) surpassing the highest scores on the SemEval-2012 task on textual similarity (Agirre et al., 2012) that compares sentences, (2) achieving a near-perfect performance on the TOEFL synonym selection task proposed by Landauer and Dumais (1997), which measures word pair similarity, and also obtaining state-of-the-art performance in terms of the correlation with human judgments on the RG-65 dataset (Rubenstein and Goodenough, 1965), and finally (3) surpassing the performance of Snow et al. |