Background and Related Work | Most approaches to the task used distributional similarity as a major component within their system. |
Background and Related Work | (2006) presented a system for learning inference rules between nouns, using distributional similarity and pattern-based features. |
Background and Related Work | (2011) used distributional similarity between predicates to weight the edges of an entailment graph. |
Discussion | The distributional similarity between p L and p R under this model is Sim(pL,pR) = 2:121 sim(wi,w3), where sim(wi, is the dot product between 2),- and 213. |
Introduction | Most works to this task use distributional similarity , either as their main component (Szpektor and Dagan, 2008; Melamud et al., 2013b), or as part of a more comprehensive system (Berant et al., 2011; Lewis and Steedman, 2013). |
Our Proposal: A Latent LC Approach | Distributional Similarity Features. |
Our Proposal: A Latent LC Approach | The distributional similarity features are based on the DIRT system (Lin and Pantel, 2001). |
Background | Distributional similarity between pairs of words is converted into weighted inference rules that are added to the logical representation, and Markov Logic Networks are used to perform probabilistic logical inference. |
Introduction | deep representation of sentence meaning, expressed in first-order logic, to capture sentence structure, but combine it with distributional similarity ratings at the word and phrase level. |
Introduction | This approach is interesting in that it uses a very deep and precise representation of meaning, which can then be relaxed in a controlled fashion using distributional similarity . |
PSL for STS | where vs_sim is a similarity function that calculates the distributional similarity score between the two lexical predicates. |
Evaluation | The fifth Arabic-English example demonstrates the pitfalls of over-reliance on the distributional hypothesis: the source bigram corresponding to the name “abd almahmood” is distributional similar to another named entity “mahmood” and the English equivalent is offered as a translation. |
Generation & Propagation | Co-occurrence counts for each feature (context word) are accumulated over the monolingual corpus, and these counts are converted to pointwise mutual information (PMI) values, as is standard practice when computing distributional similarities . |
Related Work | The idea presented in this paper is similar in spirit to bilingual lexicon induction (BLI), where a seed lexicon in two different languages is expanded with the help of monolingual corpora, primarily by extracting distributional similarities from the data using word context. |
Related Work | Paraphrases extracted by “pivoting” via a third language (Callison-Burch et al., 2006) can be derived solely from monolingual corpora using distributional similarity (Marton et al., 2009). |
Background | For distributional similarity computing, each word is represented as a semantic vector composed of the pointwise mutual information (PMI) with its contexts. |
Introduction | Besides, distributional similarity methods (Kotlerman et al., 2010; Lenci and Benotto, 2012) are based on the assumption that a term can only be used in contexts where its hypemyms can be used and that a term might be used in any contexts where its hyponyms are used. |
Related Work | (2010) and Lenci and Benotto (2012), other researchers also propose directional distributional similarity methods (Weeds et al., 2004; Geffet and Dagan, 2005; Bhagat et al., 2007; Szpektor et al., 2007; Clarke, 2009). |
Background and Related Work | The distributional similarity scores of the nearest neighbours are associated with the respective target word senses using a WordNet similarity measure, such as those proposed by J iang and Conrath (1997) and Banerjee and Pedersen (2002). |
Background and Related Work | The word senses are ranked based on these similarity scores, and the most frequent sense is selected for the corpus that the distributional similarity thesaurus was trained over. |
WordNet Experiments | It is important to bear in mind that MKWC in these experiments makes use of full-text parsing in calculating the distributional similarity thesaurus, and the WordNet graph structure in calculating the similarity between associated words and different senses. |
Experiments: predicting relevance in context | For each pair neighboura/neighbourb, we computed a set of features from Wikipedia (the corpus used to derive the distributional similarity ): We first computed the frequencies of each item in the corpus, f reqa and f reqb, from which we derive |
Introduction | They are not suitable for the evaluation of the whole range of semantic relatedness that is exhibited by distributional similarities , which exceeds the limits of classical lexical relations, even though researchers have tried to collect equivalent resources manually, to be used as a gold standard (Weeds, 2003; Bordag, 2008; Anguiano et al., 2011). |
Introduction | One advantage of distributional similarities is to exhibit a lot of different semantic relations, not necessarily standard lexical relations. |