Experiments | To calculate the similarity scores of path alignments, we use the sum of word vectors of the words from each path, and calculate the cosine similarity. |
Experiments | For example, the similarity score of the path alignment “OB J (blame) I OB J -ARG(death) m SUB J (cause)OB J -ARG(loss)MOD-ARG(life)” is calculated as the cosine similarity of vectors blame+death and cause+loss+life. |
Experiments | 60%, which is fairly high, given our rough estimation of the similarity score . |
Generating On-the-fly Knowledge | Aligned paths are evaluated by a similarity score to estimate their likelihood of being paraphrases. |
Generating On-the-fly Knowledge | Aligned paths are evaluated by a similarity score , for which we use distributional similarity of the words that appear in the paths (§4.1). |
Generating On-the-fly Knowledge | Only path alignments with high similarity scores can be accepted. |
Abstract | Although soft-matching approaches can improve the similarity scores , they are corpus-dependent and match relaxations may be task-specific. |
Abstract | We propose an alternative approach called descending path kernel which gives intuitive similarity scores on comparable structures. |
Background | For example, the similarity score between the NPs in Figure l(b) would be zero since the production rule is different (the overall similarity score is above-zero because of matching pre-terminals). |
Conclusion | This kernel uses a descending path representation in trees to allow higher similarity scores on partially matching structures, while being simpler and faster than other methods for doing the same. |
Evaluation | For the tree kernel KT, subset tree (SST) kernel was applied on each tree representation p. The final similarity score between two instances is the T-weighted sum of the similarities of all representations, combined with the flat feature (FF) similarity as measured by a feature kernel K F (linear or polynomial). |
Introduction | This approach assigns more robust similarity scores (e. g., 78% similarity in the above example) than other soft matching tree kernels, is faster than the partial tree kernel (Moschitti, 2006), and is less ad hoc than the grammar-based convolution kernel (Zhang et al., 2007). |
Methods | Unlike SST and PTK, once the root category comparison is successfully completed, DPK looks at all paths that go through it and accumulates their similarity scores independent of ordering — in other words, it will ignore the ordering of the children in its pro- |
Methods | This means, for example, that if the rule production NP —> NN J J DT were ever found in a tree, to DPK it would be indistinguishable from the common production NP —> DT JJ NN, despite having inverted word order, and thus would have a maximal similarity score . |
Experiments | Because topic-specific rules usually have a larger sensitivity score, they can beat general rules when they obtain the same similarity score against the input sentence. |
Experiments | The similarity scores indicate that “deliver X” and “distribute X” are more appropriate to translate the sentence. |
Topic Similarity Model with Neural Network | The similarity scores are integrated into the standard log-linear model for making translation decisions. |
Topic Similarity Model with Neural Network | The similarity score of the representation pair (zf, 26) is defined as the cosine similarity of the two vectors: |
Topic Similarity Model with Neural Network | Since a parallel sentence pair should have the same topic, our goal is to maximize the similarity score between the source sentence and target sentence. |
Background | Given a similarity score for all pairs of sentences in the dataset, a regressor is trained on the training set to map the system’s output to the gold standard scores. |
Evaluation | Then, for STS 2012, 1,500 pairs were selected and annotated with similarity scores . |
Evaluation | 0 Pearson correlation: The Pearson correlation between the system’s similarity scores and the human gold-standards. |
PSL for STS | KB: The knowledge base is a set of lexical and phrasal rules generated from distributional semantics, along with a similarity score for each rule (section 2.6). |
PSL for STS | where vs_sim is a similarity function that calculates the distributional similarity score between the two lexical predicates. |
PSL for STS | To produce a final similarity score, we train a regressor to learn the mapping between the two PSL scores and the overall similarity score . |
Approach | The candidate answers are scored using a linear interpolation of two cosine similarity scores : one between the entire parent document and question (to model global context), and a second between the answer candidate and question (for local context).6 Because the number of answer candidates is typically large (e.g., equal to the number of paragraphs in the textbook), we return the N top candidates with the highest scores. |
Models and Features | If text before or after a marker out to a given sentence range matches the entire text of the question (with a cosine similarity score larger than a threshold), that argument takes on the label QSEG, or OTHER otherwise. |
Models and Features | The values of the discourse features are the mean of the similarity scores (e. g., cosine similarity using tfidf weighting) of the two marker arguments and the corresponding question. |
Models and Features | Both this overall similarity score , as well as the average pairwise cosine similarity between each word in the question and answer candidate, serve as features. |
Background and Related Work | The distributional similarity scores of the nearest neighbours are associated with the respective target word senses using a WordNet similarity measure, such as those proposed by J iang and Conrath (1997) and Banerjee and Pedersen (2002). |
Background and Related Work | The word senses are ranked based on these similarity scores , and the most frequent sense is selected for the corpus that the distributional similarity thesaurus was trained over. |
Methodology | To compute the similarity between a sense and a topic, we first convert the words in the gloss/definition into a multinomial distribution over words, based on simple maximum likelihood estimation.6 We then calculate the Jensen—Shannon divergence between the multinomial distribution (over words) of the gloss and that of the topic, and convert the divergence value into a similarity score by subtracting it from 1. |
Methodology | The prevalence score for a sense is computed by summing the product of its similarity scores with each topic (i.e. |
Experiments | In this case, both the definitional and structural similarity scores are treated as equally important and two concepts are aligned if their overall similarity exceeds the middle point of the similarity scale. |
Resource Alignment | If their similarity score exceeds a certain value denoted by 6 |
Resource Alignment | Each of these components gets, as its input, a pair of concepts belonging to two different semantic networks and produces a similarity score . |
Introduction | To address this issue, we develop a manifold model (Belkin et al., 2006) that encourages examples (including both labeled and unlabeled examples) with similar contents to be assigned with similar scores . |
Relation Extraction with Manifold Models | Scores are fit so that examples (both labeled and unlabeled) with similar content get similar scores , and scores of labeled examples are close to their labels. |
Relation Extraction with Manifold Models | In addition, we also want f to preserve the manifold topology of the dataset, such that similar examples (both labeled and unlabeled) get similar scores . |