Introduction | The standard process for pattem-based relation extraction is to start with hand-selected patterns or word pairs expressing a particular relationship, and iteratively scan the corpus for co-appearances of word pairs in patterns and for patterns that contain known word pairs . |
Introduction | We also propose a way to label each cluster by word pairs that represent it best. |
Pattern Clustering Algorithm | (Alfonseca et al., 2006) for extracting general relations starting from given seed word pairs . |
Pattern Clustering Algorithm | Unlike most previous work, our hook words are not provided in advance but selected randomly; the goal in those papers is to discover relationships between given word pairs , while we use hook words in order to discover relationships that generally occur in the corpus. |
Pattern Clustering Algorithm | To label pattern clusters we define a HITS measure that reflects the affinity of a given word pair to a given cluster. |
Related Work | Several recent papers discovered relations on the web using seed patterns (Pantel et al., 2004), rules (Etzioni et al., 2004), and word pairs (Pasca et al., 2006; Alfonseca et al., 2006). |
SAT-based Evaluation | We addressed the evaluation questions above using a SAT-like analogy test automatically generated from word pairs captured by our clusters (see below in this section). |
SAT-based Evaluation | The header of the question is a word pair that is one of the label pairs of the cluster. |
SAT-based Evaluation | In our sample there were no word pairs assigned as labels to more than one cluster4. |
Analysis of word pair features | For the analysis of word pair features, we use a large collection of automatically extracted explicit examples from the experiments in Blair-Goldensohn et al. |
Analysis of word pair features | For the complete set of 10,000 examples, word pair features were computed. |
Analysis of word pair features | After removing word pairs that appear less than 5 times, the remaining features were ranked by information gain using the MALLET toolkitl. |
Introduction | We examine the most informative word pair features and find that they are not the semantically-related pairs that researchers had hoped. |
Word pair features in prior work | Indeed, word pairs form the basic feature of most previous work on classifying implicit relations (Marcu and Echihabi, 2001; Blair-Goldensohn et al., 2007; Sporleder and Lascarides, 2008) or the simpler task of predicting which connective should be used to express a relation (Lapata and Lascarides, 2004). |
Word pair features in prior work | Semantic relations vs. function word pairs If the hypothesis for word pair triggers of discourse relations were true, the analysis of unambiguous relations can be used to discover pairs of words with causal or contrastive relations holding between them. |
Word pair features in prior work | At the same time, feature selection is always necessary for word pairs , which are numerous and lead to data sparsity problems. |
Abstract | Using an ensemble method, the key information extracted from word pairs with dependency relations in the translated text is effectively integrated into the parser for the target language. |
Dependency Parsing: Baseline | In each step, the classifier checks a word pair , namely, 5, the top of a stack that consists of the processed words, and, i, the first word in the (input) unprocessed sequence, to determine if a dependent relation should be established between them. |
Exploiting the Translated Treebank | As we cannot expect too much for a word-by-word translation, only word pairs with dependency relation in translated text are extracted as useful and reliable information. |
Exploiting the Translated Treebank | Then some features based on a query in these word pairs according to the current parsing state (namely, words in the current stack and input) will be derived to enhance the Chinese parser. |
Exploiting the Translated Treebank | As all concerned feature values here are calculated from the searching result in the translated word pair list according to the current parsing state, and a complete and exact match cannot be always expected, our solution to the above segmentation issue is using a partial matching strategy based on characters that the words include. |
Introduction | However, dependency parsing focuses on the relations of word pairs , this allows us to use a dictionary-based translation without assuming a parallel corpus available, and the training stage of translation may be ignored and the decoding will be quite fast in this case. |
Abstract | We identify whether a candidate word pair has hypernym—hyponym relation by using the word-embedding-based semantic projections between words and their hypernyms. |
Introduction | Subsequently, we identify whether an unknown word pair is a hypernym—hyponym relation using the projections (Section 3.4). |
Method | Table l: Embedding offsets on a sample of hypernym—hyponym word pairs . |
Method | Looking at the well-known example: v(king) — v(queen) % v(man) —v(woman), it indicates that the embedding offsets indeed represent the shared semantic relation between the two word pairs . |
Method | As a preliminary experiment, we compute the embedding offsets between some randomly sampled hypernym—hyponym word pairs and measure their similarities. |
Basics of ITG | From the viewpoint of word alignment, the terminal unary rules provide the links of word pairs , whereas the binary rules represent the reordering factor. |
Basics of ITG | For instance, there are two parses for three consecutive word pairs , viz. |
Basics of ITG Parsing | The base step applies all relevant terminal unary rules to establish the links of word pairs . |
Basics of ITG Parsing | The word pairs are then combined into span pairs in all possible ways. |
Introduction | HMM); Zhang and Gildea (2005) propose Tic-tac-toe pruning, which is based on the Model 1 probabilities of word pairs inside and outside a pair of spans. |
The DITG Models | The following features about alignment link are used in W-DITG: 1) Word pair translation probabilities trained from HMM model (Vogel, et.al., 1996) and IBM model 4 (Brown et.al., 1993; Och and Ney, 2000). |
The DPDI Framework | In the base step, only the word pairs listed in sentence-level annotation are inserted in the hypergraph, and the re-cursive steps are just the same as usual. |
The DPDI Framework | Zhang and Gildea (2005) show that Model 1 (Brown et al., 1993; Och and Ney., 2000) probabilities of the word pairs inside and outside a span pair ([ei1,ei2]/[jj-1,jj-2]) are useful. |
The DPDI Framework | probability of word pairs Within the span pair): |
DNN for word alignment | In contrast, our model does not maintain a separate translation score parameters for every source-target word pair , but computes tlegc through a multilayer network, which naturally handles contexts on both sides without explosive growth of number of parameters. |
DNN for word alignment | The example computes translation score for word pair (yibula, yibulayin) given its surrounding context. |
DNN for word alignment | For word pair (61', fj), we take fixed length windows surrounding both 6, and fj as input: (€i_%, . |
Introduction | As shown in example (a) of Figure 1, in word pair {“juda” =>“mammot ”}, the Chinese word “juda” is a common word, but |
Introduction | For example (b) in Figure l, for the word pair {“yibula” => “Yibula”}, both the Chinese word “yibula” and English word “Yibula” are rare name entities, but the words around them are very common, which are {“nongmin”, “shuo”} for Chinese side and {“farmer”, “said”} for the English side. |
Introduction | The pattern of the context {“nongmin X shuo” => “farmer X said”} may help to align the word pair which fill the variable X, and also, the pattern {“yixiang X gongcheng” => “a X job”} is helpful to align the word pair {“juda” =>“mammoth”} for example (a). |
Training | ma${071 _ t9((ev f)+|e7 + t9((ev f)—|e7 (10) where (e, f)+ is a correct word pair, (6, f)‘ is a wrong word pair in the same sentence, and 759 is as defined in Eq. |
Training | This training criteria essentially means our model suffers loss unless it gives correct word pairs a higher score than random pairs from the same sentence pair with some margin. |
Training | We randomly cycle through all sentence pairs in training data; for each correct word pair (including null alignment), we generate a positive example, and generate two negative examples by randomly corrupting either |
Abstract | We also apply our method to English/Hindi and English/Arabic parallel corpora and compare the results with manually built gold standards which mark transliterated word pairs . |
Extraction of Transliteration Pairs | Initially, we extract a list of word pairs from a word-aligned parallel corpus using GIZA++. |
Extraction of Transliteration Pairs | The extracted word pairs are either transliterations, other kinds of translations, or misalignments. |
Introduction | We first align a bilingual corpus at the word level using GIZA++ and create a list of word pairs containing a mix of non-transliterations and transliterations. |
Introduction | tistical transliterator on the list of word pairs . |
Introduction | We then filter out a few word pairs (those which have the lowest transliteration probabilities according to the trained transliteration system) which are likely to be non-transliterations. |
Models | The training data is a list of word pairs (a source word and its presumed transliteration) extracted from a word-aligned parallel corpus. |
Models | g2p builds a joint sequence model on the character sequences of the word pairs and infers m-to-n alignments between source and target characters with Expectation Maximization (EM) training. |
Models | For training Moses as a transliteration system, we treat each word pair as if it were a parallel sentence, by putting spaces between the characters of each word. |
Experimental Results | 5.1 Word Pair Mining |
Experimental Results | Table 1 shows some examples of the mined word pairs . |
Experimental Results | Table 1: Examples of Word Pairs |
Model for Candidate Generation | Figure 1: Example of rule extraction from word pair |
Model for Candidate Generation | If we can apply a set of rules to transform the misspelled word mm to a correct word we in the vocabulary, then we call the rule set a “transformation” for the word pair mm and we. |
Model for Candidate Generation | Note that for a given word pair , it is likely that there are multiple possible transformations for it. |
Abstract | We present a reformulation of the word pair features typically used for the task of disambiguating implicit relations in the Penn Discourse Treebank. |
Abstract | Our word pair features achieve significantly higher performance than the previous formulation when evaluated without additional features. |
Introduction | Without an explicit marker to rely on, work on this task initially focused on using lexical cues in the form of word pairs mined from large corpora where they appear around an explicit marker (Marcu and Echihabi, 2002). |
Introduction | The intuition is that these pairs will tend to represent semantic relationships which are related to the discourse marker (for example, word pairs often appearing around but may tend to be antonyms). |
Introduction | While this approach showed some success and has been used extensively in later work, it has been pointed out by multiple authors that many of the most useful word pairs |
Related Work | This line of research began with (Marcu and Echihabi, 2002), who used a small number of unambiguous explicit markers and patterns involving them, such as [Arg1, but Arg2] to collect sets of word pairs from a large corpus using the crossproduct of the words in Argl and Arg2. |
Related Work | Second, it is constructed with the same unsupervised method they use to extract the word pairs -by assuming that the patterns correspond to a particular relation and collecting the arguments from an unannotated corpus. |
Related Work | They used word pairs as well as additional features to train four binary classifiers, each corresponding to one of the high-level PDTB relation classes. |
Conclusion and Future Work | We have also provided the first intrinsic evaluation of word translation probabilities with respect to human relatedness rankings for reference word pairs . |
Introduction | To do so, we compare translation probabilities with concept vector based semantic relatedness measures with respect to human relatedness rankings for reference word pairs . |
Related Work | In this study, we use the correlation with human rankings for reference word pairs to investigate how word translation probabilities compare with traditional semantic relatedness measures. |
Semantic Relatedness Experiments | The aim of this first experiment is to perform an intrinsic evaluation of the word translation probabilities obtained by comparing them to traditional semantic relatedness measures on the task of ranking word pairs . |
Semantic Relatedness Experiments | This dataset comprises two subsets, which have been annotated by different annotators: Fin1—153, containing 153 word pairs, and Fin2—200, containing 200 word pairs . |
Semantic Relatedness Experiments | In order to ensure a fair evaluation, we limit the comparison to the word pairs which are contained in all resources and translation tables. |
Application to Essay Scoring | We calculated correlations between essay score and the proportion of word pairs in each of the 60 bins of the WAP histogram, separately for each of the prompts p1-p6 in setA. |
Application to Essay Scoring | Next, observe the consistent negative correlations between essay score and the proportion of word pairs in bins PMI=0.833 through PMI=1.5. |
Conclusion | We hypothesize that this pattern is consistent with the better essays demonstrating both a better topic development (hence the higher percentage of highly related pairs) and a more creative use of language resources, as manifested in a higher percentage of word pairs that generally do not tend to appear together. |
Illustration: The shape of the distribution | Yet, the picture at the right tail is remarkably similar to that of the essays, with 9% of word pairs , on average, having PMI>2.17. |
Illustration: The shape of the distribution | The right tail, with PMI>2.17, holds 19% of all word pairs in these texts — more than twice the proportion in essays written by college graduates or in texts from the WSJ. |
Introduction | fact that a text segmentation algorithm that uses information about patterns of word co-occurrences can detect subtopic shifts in a text (Riedl and Bie-mann, 2012; Misra et al., 2009; Eisenstein and Barzilay, 2008) tells us that texts contain some proportion of more highly associated word pairs (those in subsequent sentences within the same topical unit) and of less highly associated pairs (those in sentences from different topical units).1 Yet, does each text have a different distribution of highly associated, mildly associated, unassoci-ated, and disassociated pairs of words, or do texts tend to strike a similar balance of these? |
Methodology | The third decision is how to represent the co-occurrence profiles; we use a histogram where each bin represents the proportion of word pairs in the given interval of PMI values. |
Methodology | The lowest bin (shown in Figures 1 and 2 as PMI = —5) contains pairs with PMIg—S; the topmost bin (shown in Figures 1 and 2 as PMI = 4.83) contains pairs with PMI > 4.67, while the rest of the bins contain word pairs (:c,y) with —5 <PMI(x,y) g 4.67. |
Methodology | Thus, the text “The dog barked and wagged its tail” is much tighter than the text “Green ideas sleep furiously”, with all the six content word pairs scoring above PMI=5.5 in the first and below PMI=2.2 in the second.4 |
Related Work | Our results suggest that this direction is promising, as merely the proportion of highly associated word pairs is already contributing a clear signal regarding essay quality; it is possible that additional information can be derived from richer representations common in the lexical cohesion literature. |
Abstract | Sentiment Similarity of word pairs reflects the distance between the words regarding their underlying sentiments. |
Abstract | This paper aims to infer the sentiment similarity between word pairs with respect to their senses. |
Abstract | The resultant emotional vectors are then employed to infer the sentiment similarity of word pairs . |
Analysis and Discussions | For this purpose, we repeat the experiment for SO prediction by computing sentiment similarity of word pairs with and without using synonyms and antonyms. |
Introduction | In this paper, we show that sentiment similarity between word pairs can be effectively utilized to compute SO of words. |
Introduction | 0 We propose an effective approach to predict the sentiment similarity between word pairs through hidden emotions at the sense level, |
Related Works | Most previous works employed semantic similarity of word pairs to address SO prediction and IQAP inference tasks. |
Sentiment Similarity through Hidden Emotions | As we discussed above, semantic similarity measures are less effective to infer sentiment similarity between word pairs . |
Experiments | The prior tree has about 1000 word pairs (dict). |
Experiments | We then remove the word pairs appearing more than 50K times or fewer than 500 times and construct a second prior tree with about 2500 word pairs (align). |
Polylingual Tree-based Topic Models | Figure 1: An example of constructing a prior tree from a bilingual dictionary: word pairs with the same meaning but in different languages are concepts; we create a common parent node to group words in a concept, and then connect to the root; un-correlated words are connected to the root directly. |
Polylingual Tree-based Topic Models | The word pairs define concepts for the prior tree (align). |
Topic Models for Machine Translation | The phrase pair probabilities pw (6| f) are the normalized product of lexical probabilities of the aligned word pairs within that phrase pair (Koehn et al., 2003). |
Topic Models for Machine Translation | where cd(o) is the number of occurrences of the word pair in document d. The lexical probability conditioned on topic k is the unsmoothed probability estimate of those expected counts |
Topic Models for Machine Translation | While vanilla topic models (LDA) can only be applied to monolingual data, there are a number of topic models for parallel corpora: Zhao and Xing (2006) assume aligned word pairs share same topics; Mimno et al. |
Abstract | And we also propose an effective strategy for dependency projection, where the dependency relationships of the word pairs in the source language are projected to the word pairs of the target language, leading to a set of classification instances rather than a complete tree. |
Conclusion and Future Works | In this paper, we first describe an intuitionistic method for dependency parsing, which resorts to a classifier to determine whether a word pair forms a dependency edge, and then propose an effective strategy for dependency projection, which produces a set of projected classification instances rather than complete projected trees. |
Introduction | Given a word-aligned bilingual corpus with source language sentences parsed, the dependency relationships of the word pairs in the source language are projected to the word pairs of the target language. |
Introduction | A dependency relationship is a boolean value that represents whether this word pair forms a dependency edge. |
Word-Pair Classification Model | The task of the word-pair classification model is to determine whether any candidate word pair , :10, and 553- st. 1 g i,j g |x| andz' 75 j, forms a dependency edge. |
Word-Pair Classification Model | Ideally, given the classification results for all candidate word pairs , the dependency parse tree can be composed of the candidate edges with higher score (1 for the boolean-valued classifier, and large p for the real-valued classifier). |
Word-Pair Classification Model | Here we give the calculation of dependency probability C (7', j We use w to denote the parameter vector of the ME model, and f (7', j, 7“) to denote the feature vector for the assumption that the word pair 7' and j has a dependency relationship 7“. |
Experiments | In other words, the IDF values help decide the importance of word pairs to the model. |
Experiments | 4 to the word pair and use their estimated degree of synonymy, antonymy, hyponymy and semantic relatedness as features. |
Experiments | 5, the features for the whole questiorflsentence pair are the average and max of features of all the word pairs . |
Learning QA Matching Models | It then aggregates features extracted from each of these word pairs to represent the whole questiorflsentence pair. |
Learning QA Matching Models | Given a word pair (wq,w8), where mg 6 Vq and ws 6 V8, feature functions o1, - -- ,gbd map it to a d-dimensional real-valued feature vector. |
Experimental Setup | 4435 word pairs constitute the overlap between Nelson et al.’s norms (1998) and McRae et al.’s (2005) nouns. |
Experimental Setup | This resulted in 7,576 word pairs for which we obtained similarity ratings using Amazon Mechanical Turk (AMT). |
Experimental Setup | Word Pairs Semantic Visual |
Introduction | We performed a large-scale evaluation on a new dataset consisting of human similarity judgments for 7,576 word pairs . |
Results | Table 4: Word pairs with highest semantic and visual similarity according to SAE model. |
Results | Table 4 shows examples of word pairs with highest semantic and visual similarity according to the SAE model. |
Alignment Link Confidence Measure | which is defined as the word translation probability of the aligned word pair divided by the sum of the translation probabilities over all the target words in the sentence. |
Alignment Link Confidence Measure | Intuitively, the above link confidence definition compares the lexical translation probability of the aligned word pair with the translation probabilities of all the target words given the source word. |
Alignment Link Confidence Measure | On the other hand, additional information (such as the distance of the word pair , the alignment of neighbor words) could indicate higher likelihood for the alignment link. |
Improved MaXEnt Aligner with Confidence-based Link Filtering | Furthermore, it is possible to create new links by relinking unaligned source and target word pairs within the context window if their context-dependent link posterior probability is high. |
Sentence Alignment Confidence Measure | It is the product of lexical translation probabilities for the aligned word pairs . |
Experiments | The dataset has three interesting characteristics: 1) human judgments are on pairs of words presented in sentential context, 2) word pairs and their contexts are chosen to reflect interesting variations in meanings of homonymous and polysemous words, and 3) verbs and adjectives are present in addition to nouns. |
Experiments | We obtained a total of 2,003 word pairs and their sentential contexts. |
Experiments | The word pairs consist of 1,712 unique words. |
Introduction | To capture interesting word pairs , we sample different senses of words using WordNet (Miller, 1995). |
Introduction | First, since a sentence contributes its counts only to the translation table for the source it came from, many word pairs will be unobserved for a given table. |
Model Description | (2011) showed that is it beneficial to condition the lexical weighting features on provenance by assigning each sentence pair a set of features, fs(é|7), one for each domain 8, which compute a new word translation table p3(e| f) estimated from only those sentences which belong to s: 03(f, e)/Ze 03(f, e), where cs(-) is the number of occurrences of the word pair in 3. |
Model Description | To obtain the lexical probability conditioned on topic distribution, we first compute the expected count ezn (e, f) of a word pair under topic Zn: |
Model Description | where cj(-) denotes the number of occurrences of the word pair in sentence 303-, and then compute: |
Extensions | We use a seed dictionary of 12,630 word pairs to establish node-node correspondences between the two graphs. |
Extensions | As the seed dictionary contains 12,630 word pairs , this means that only every fourth entry of the PPR vector (the German graph has 47,439 nodes) is used for similarity calculation. |
Extensions | synonym extraction lexicon extraction (68 word pairs) (1000 word pairs ) |
Connotation Induction Algorithms | We experimented with many different variations on the graph structure and edge weights, including ones that include any word pairs that occurred frequently enough together. |
Connotation Induction Algorithms | R59”: word pairs in synonyms relation. |
Connotation Induction Algorithms | Ram: word pairs in antonyms relation. |
Experimental Setup | To enrich the set of given word pairs and patterns as described in Section 4.1 and to perform clarifying queries, we utilize the Yahoo API for web queries. |
Experimental Setup | If only several links were found for a given word pair we perform local crawling to depth 3 in an attempt to discover more instances. |
Introduction | The standard classification process is to find in an auxiliary corpus a set of patterns in which a given training word pair co-appears, and use pattern-word pair co-appearance statistics as features for machine learning algorithms. |
Relationship Classification | Co-appearance of nominal pairs can be very rare (in fact, some word pairs in the Task 4 set co-appear only once in Yahoo web search). |
Introduction | However, the goals of their work are different from ours in that their models mainly focus on mining cross-lingual topics of matching word pairs and discovering the correspondence at the vocabulary level. |
Introduction | Therefore, the topics extracted using their model cannot indicate how a common topic is covered diflerently in the two languages, because the words in each word pair share the same probability in a common topic. |
Introduction | In our model, since we only add a soft constraint on word pairs in the dictionary, their probabilities in common topics are generally different, naturally capturing which shows the different variations of a common topic in different languages. |
Probabilistic Cross-Lingual Latent Semantic Analysis | Thus when a cross-lingual topic picks up words that co-occur in monolingual text, it would prefer picking up word pairs whose translations in other languages also co-occur with each other, giving us a coherent multilingual word distribution that characterizes well the content of text in different languages. |
Related work | The Word Pair Classification (WPC) method (J iang and Liu, 2010) modifies the DPA method and makes it more robust. |
Unsupervised Dependency Grammar Induction | denotes the word pair dependency relationship (e;- —> 63-). |
Unsupervised Dependency Grammar Induction | Based on the features around deij, we can calculate the probability Pr(y|deij) that the word pair dew. |
Unsupervised Dependency Grammar Induction | where y is the category of the relationship of dew: y = + means it is the probability that the word pair deij can form a dependency arc and y = —means the contrary. |
Distributional semantic models | The weighting parameter a (0 S a S 1) is tuned on the MEN development data (2,000 word pairs ; details on the MEN dataset in the next section). |
Textual and visual models as general semantic models | WordSim353 (Finkelstein et al., 2002) is a widely used benchmark constructed by asking 16 subjects to rate a set of 353 word pairs on a 10-point similarity scale and averaging the ratings (dollar/buck receives a high 9.22 average rating, professor/cucumber a low 0.31). |
Textual and visual models as general semantic models | The version used here contained 10 judgements per word pair . |
Textual and visual models as general semantic models | Because of its design, word pairs in MEN can be expected to be more imageable than those in WordSim, so the visual information is more relevant for this dataset. |
Conclusions and Future work | One unique characteristic of our model is that a NE normalization variable is introduced to indicate whether a word pair belongs to the mentions of the same entity. |
Experiments | There are two possible ways to fix these errors: 1) Extending the scope of z-serial variables to each word pairs with a common prefix; and 2) developing advanced normalization components to restore such slang expressions and informal abbreviations into their canonical forms. |
Introduction | Hereafter, we use tm to denote the mth tweet ,tfn and to denote the 73th word of of tm and its BIL OU label, respectively, and If; to denote the factor related to 1 and Next, for each word pair with the same lemma, denoted by 753,, and 75%,, we introduce a binary random variable, denoted by 277%”, whose value indicates whether 75%,, and ti, belong to two mentions of the same entity. |
Our Method | 11We first conduct a simple dictionary—lookup based normalization with the incorrect/correct word pair list provided by Han et al. |
Features | For word pairs whose source-side word is a verb, we add a feature marking the number of its subject, with separate features for noun and pronoun subjects. |
Features | For word pairs whose source side is an adjective, we add a feature marking the number of the head of the smallest noun phrase that contains it. |
Modeling unobserved target inflections | For greater speed we estimate the probabilities for the other two models using interpolated Kneser-Ney smoothing (Chen and Goodman, 1998), where the surface form of a rule or an aligned word pair plays to role of a trigram, the pairing of the source surface form with the lemmatized target form plays the role of a bigram, and the source surface form alone plays the role of a unigram. |
Joint Model of Extraction and Compression | Although the authors of QSB also provided scores of word pairs to avoid putting excessive penalties |
Joint Model of Extraction and Compression | on word overlaps, we do not score word pairs . |
Joint Model of Extraction and Compression | The score function is supermodular as a score function of subtree extraction3, because the union of two subtrees can have extra word pairs that are not included in either subtree. |
A Unified Semantic Representation | Commonly, semantic comparisons are between word pairs or sentence pairs that do not have their lexical content sense-annotated, despite the potential utility of sense annotation in making semantic comparisons. |
Experiment 2: Word Similarity | The dataset contains 65 word pairs judged by 51 human subjects on a scale of 0 to 4 according to their semantic similarity. |
Introduction | Third, we demonstrate that this single representation can achieve state-of-the-art performance on three similarity tasks, each operating at a different lexical level: (1) surpassing the highest scores on the SemEval-2012 task on textual similarity (Agirre et al., 2012) that compares sentences, (2) achieving a near-perfect performance on the TOEFL synonym selection task proposed by Landauer and Dumais (1997), which measures word pair similarity, and also obtaining state-of-the-art performance in terms of the correlation with human judgments on the RG-65 dataset (Rubenstein and Goodenough, 1965), and finally (3) surpassing the performance of Snow et al. |
Background | is one of the few examples where distributional representations are used for word pairs . |
Experiments | The task is thus to rank these pairs of word pairs by their semantic similarity. |
Experiments | We assume fixed parse trees for all of the compounds (Figure 6), and use these to compute compound level vectors for all word pairs . |
Cross-Language Structural Correspondence Learning | CL-SCL comprises three steps: In the first step, CL-SCL selects word pairs {2123,2127}, called pivots, where 7.03 E V3 and 2127 6 V7. |
Cross-Language Structural Correspondence Learning | Considering our sentiment classification example, the word pair {excellent3, exzellentT} satisfies both conditions: (1) the words are strong indicators of positive sentiment, |
Cross-Language Structural Correspondence Learning | Second, for each word 1115 6 VP we find its translation in the target vocabulary V7 by querying the translation oracle; we refer to the resulting set of word pairs as the candidate pivots, P’ : |
Collocation Model | Figure 1 shows an example of the potentially collocated word pairs aligned by the MWA method. |
Collocation Model | Then the probability for each aligned word pair is estimated as follows: |
Improving Phrase Table | word pair calculated according to Eq. |
Experiments | Web page hits for word pairs and trigrams are obtained using a simple heuristic query to the search engine Google.11 Inflected queries are performed by expanding a bigram or trigram into all its morphological forms. |
Introduction | The idea is very simple: web-scale data have large coverage for word pair acquisition. |
Related Work | Several previous studies have exploited the web-scale data for word pair acquisition. |
Phrase Pair Embedding | Word 1G 500K 20 X 500K Word Pair 7M (500K)2 20 X (500K)2 Phrase Pair 7M (500104 20 X (500104 |
Phrase Pair Embedding | For word pair and phrase pair embedding, the numbers are calculated on IWSLT 2009 dialog training set. |
Phrase Pair Embedding | But for source-target word pair , we may only have 7M bilingual corpus for training (taking IWSLT data set as an example), and there are 20 ><(500K)2 parameters to be tuned. |