Index of papers in Proc. ACL that mention
  • distributional similarity
McIntosh, Tara and Curran, James R.
Abstract
We propose an integrated distributional similarity filter to identify and censor potential semantic drifts, ensuring over 10% higher precision when extracting large semantic lexicons.
Background
2.2 Distributional Similarity
Background
Distributional similarity has been used to extract semantic lexicons (Grefenstette, 1994), based on the distributional hypothesis that semantically similar words appear in similar contexts (Harris, 1954).
Background
(2006) used 11 patterns, and the distributional similarity score of each pair of terms, to construct features for lexical entailment.
Conclusion
In this paper, we have proposed unsupervised bagging and integrated distributional similarity to minimise the problem of semantic drift in iterative bootstrapping algorithms, particularly when extracting large semantic lexicons.
Detecting semantic drift
In this section, we propose distributional similarity measurements over the extracted lexicon to detect semantic drift during the bootstrapping process.
Detecting semantic drift
We calculate the average distributional similarity (Sim) of t with all terms in L1,”, and those in L( N_m)m N and call the ratio the drift for term t:
Detecting semantic drift
For calculating drift we use the distributional similarity approach described in Curran (2004).
Introduction
We integrate a distributional similarity filter directly into WMEB (McIntosh and Curran, 2008).
Introduction
Our distributional similarity filter gives a similar performance improvement.
distributional similarity is mentioned in 12 sentences in this paper.
Topics mentioned in this paper:
Zapirain, Beñat and Agirre, Eneko and Màrquez, Llu'is
Abstract
The best results are obtained with a novel second-order distributional similarity measure, and the positive effect is specially relevant for out-of-domain data.
Related Work
Distributional similarity has also been used to tackle syntactic ambiguity.
Related Work
Pantel and Lin (2000) obtained very good results using the distributional similarity measure defined by Lin (1998).
Related Work
The results over 100 frame-specific roles showed that distributional similarities get smaller error rates than Resnik and EM, with Lin’s formula having the smallest error rate.
Results and Discussion
Regarding the selectional preference variants, WordNet—based and first-order distributional similarity models attain similar levels of precision, but the former are clearly worse on recall and F1.
Results and Discussion
The second-order distributional similarity measures perform best overall, both in precision and recall.
Results and Discussion
Regarding the similarity metrics, the cosine seems to perform consistently better for first-order distributional similarity , while J accard provided slightly better results for second-order similarity.
Selectional Preference Models
Distributional SP models: Given the availability of publicly available resources for distributional similarity , we used 1) a ready-made thesaurus (Lin, 1998), and 2) software (Pado and Lapata, 2007) which we run on the British National Corpus (BNC).
distributional similarity is mentioned in 13 sentences in this paper.
Topics mentioned in this paper:
Berant, Jonathan and Dagan, Ido and Goldberger, Jacob
Background
Entailment learning Two information types have primarily been utilized to learn entailment rules between predicates: lexicographic resources and distributional similarity resources.
Background
Therefore, distributional similarity is used to learn broad-scale resources.
Background
Distributional similarity algorithms predict a semantic relation between two predicates by comparing the arguments with which they occur.
Experimental Evaluation
When computing distributional similarity scores, a template is represented as a feature vector of the CUIs that instantiate its arguments.
Experimental Evaluation
Local algorithms We described 12 distributional similarity measures computed over our corpus (Section 5.1).
Experimental Evaluation
For each distributional similarity measure (altogether 16 measures), we learned a graph by inserting any edge (u, v) , when u is in the top K templates most similar to 2).
Learning Entailment Graph Edges
Next, we represent each pair of propositional templates with a feature vector of various distributional similarity scores.
Learning Entailment Graph Edges
Distributional similarity representation We aim to train a classifier that for an input template pair (t1, t2) determines whether t1 entails 752.
Learning Entailment Graph Edges
A template pair is represented by a feature vector where each coordinate is a different distributional similarity score.
distributional similarity is mentioned in 15 sentences in this paper.
Topics mentioned in this paper:
Tanigaki, Koichi and Shiba, Mitsuteru and Munaka, Tatsuji and Sagisaka, Yoshinori
Evaluation
The background documents consists of 2.7M running words, which was used to compute distributional similarity .
Evaluation
The context metric space was composed by k:-nearest neighbor words of distributional similarity (Lin, 1998), as is described in Section 4.
Evaluation
(2004), which determines the word sense based on sense similarity and distributional similarity to the k-nearest neighbor words of a target word by distributional similarity .
Introduction
(2004) proposed a method to combine sense similarity with distributional similarity and configured predominant sense score.
Introduction
Distributional similarity was used to weight the influence of context words, based on large-scale statistics.
Introduction
(2009) used a k-nearest words on distributional similarity as context words.
Metric Space Implementation
Distributional similarity (Lin, 1998) was computed among target words, based on the statistics of the test set and the background text provided as the official dataset of the SemEval-2 English all-words task (Agirre et al., 2010).
Metric Space Implementation
Those texts were parsed using RASP parser (Briscoe et al., 2006) version 3.1, to obtain grammatical relations for the distributional similarity , as well as to obtain lemmata and part-of-speech (POS) tags which are required to look up the sense inventory of WordNet.
Metric Space Implementation
Based on the distributional similarity , we just used k-nearest neighbor words as the context of each target word.
distributional similarity is mentioned in 12 sentences in this paper.
Topics mentioned in this paper:
Melamud, Oren and Berant, Jonathan and Dagan, Ido and Goldberger, Jacob and Szpektor, Idan
Abstract
Automatic acquisition of inference rules for predicates has been commonly addressed by computing distributional similarity between vectors of argument words, operating at the word space level.
Background and Model Setting
learning, based on distributional similarity at the word level, and then context-sensitive scoring for rule applications, based on topic-level similarity.
Background and Model Setting
The DIRT algorithm (Lin and Pantel, 2001) follows the distributional similarity paradigm to learn predicate inference rules.
Discussion and Future Work
In particular, we proposed a novel scheme that applies over any base distributional similarity measure which operates at the word level, and computes a single context-insensitive score for a rule.
Discussion and Future Work
We therefore focused on comparing the performance of our two-level scheme with state-of-the-art prior topic-level and word-level models of distributional similarity , over a random sample of inference rule applications.
Experimental Settings
Since our model can contextualize various distributional similarity measures, we evaluated the performance of all the above methods on several base similarity measures and their learned rule-
Experimental Settings
Whenever we evaluated a distributional similarity measure (namely Lin, BInc, or Cosine), we discarded instances from Zeichner et al.’s dataset in which the assessed rule is not in the context-insensitive rule-set learned for this measure or the argument instantiation of the rule is not in the LDA lexicon.
Results
Specifically, topics are leveraged for high-level domain disambiguation, while fine grained word-level distributional similarity is computed for each rule under each such domain.
Results
Indeed, on test-setvc, in which context mismatches are rare, our algorithm is still better than the original measure, indicating that WT can be safely applied to distributional similarity measures without concerns of reduced performance in different context scenarios.
Two-level Context-sensitive Inference
On the other hand, the topic-biased similarity for 751 is substantially lower, since prominent words in this topic are likely to occur with ‘acquire’ but not with ‘learn’, yielding low distributional similarity .
distributional similarity is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Pereira, Lis and Manguilimotan, Erlyn and Matsumoto, Yuji
Related Work
rity: l) thesaurus-based word similarity, 2) distributional similarity and 3) confusion set derived from learner corpus.
Related Work
Distributional Similarity : Thesaurus-based methods produce weak recall since many words, phrases and semantic connections are not covered by hand-built thesauri, especially for verbs and adjectives.
Related Work
As an alternative, distributional similarity models are often used since it gives higher recall.
distributional similarity is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Abend, Omri and Cohen, Shay B. and Steedman, Mark
Background and Related Work
Most approaches to the task used distributional similarity as a major component within their system.
Background and Related Work
(2006) presented a system for learning inference rules between nouns, using distributional similarity and pattern-based features.
Background and Related Work
(2011) used distributional similarity between predicates to weight the edges of an entailment graph.
Discussion
The distributional similarity between p L and p R under this model is Sim(pL,pR) = 2:121 sim(wi,w3), where sim(wi, is the dot product between 2),- and 213.
Introduction
Most works to this task use distributional similarity , either as their main component (Szpektor and Dagan, 2008; Melamud et al., 2013b), or as part of a more comprehensive system (Berant et al., 2011; Lewis and Steedman, 2013).
Our Proposal: A Latent LC Approach
Distributional Similarity Features.
Our Proposal: A Latent LC Approach
The distributional similarity features are based on the DIRT system (Lin and Pantel, 2001).
distributional similarity is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Berant, Jonathan and Dagan, Ido and Goldberger, Jacob
Background
Most work on learning entailment rules between predicates considered each rule independently of others, using two sources of information: lexicographic resources and distributional similarity .
Background
Distributional similarity algorithms use large corpora to learn broader resources by assuming that semantically similar predicates appear with similar arguments.
Background
Distributional similarity algorithms differ in their feature representation: Some use a binary representation: each predicate is represented by one feature vector where each feature is a pair of arguments (Szpektor et al., 2004; Yates and Etzioni, 2009).
Experimental Evaluation
Second, to distributional similarity algorithms: (a) SR: the score used by Schoenmackers et al.
Experimental Evaluation
Third, we compared to the entailment classifier with no transitivity constraints (clsf) to see if combining distributional similarity scores improves performance over single measures.
Learning Typed Entailment Graphs
We compute 11 distributional similarity scores for each pair of predicates based on the arguments appearing in the extracted arguments.
distributional similarity is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Qazvinian, Vahed and Radev, Dragomir R.
Abstract
Finally, we present a ranker that employs distributional similarities to build a network of words, and captures the diversity of perspectives by detecting communities in this network.
Conclusion and Future Work
Finally, we proposed a ranking system that employs word distributional similarities to identify semantically equivalent words, and compared it with a wide
Diversity-based Ranking
5.1 Distributional Similarity
Diversity-based Ranking
In order to capture the nuggets of equivalent semantic classes, we use a distributional similarity of
Diversity-based Ranking
The method based on the distributional similarities of words outperforms other methods in the citations category.
distributional similarity is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Kotlerman, Lili and Dagan, Ido and Szpektor, Idan and Zhitomirsky-Geffet, Maayan
Background
To date, most distributional similarity research concentrated on symmetric measures, such as the widely cited and competitive (as shown in (Weeds and Weir, 2003)) LIN measure (Lin, 1998):
Evaluation and Results
In this setting, category names were taken as seeds and expanded by distributional similarity , further measuring cosine similarity with categorized documents similarly to IR query expansion.
Introduction
Much work on automatic identification of semantically similar terms exploits Distributional Similarity , assuming that such terms appear in similar contexts.
Introduction
This paper is motivated by one of the prominent applications of distributional similarity , namely identifying lexical expansions.
Introduction
Often, distributional similarity measures are used to identify expanding terms (e.g.
distributional similarity is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Bhagat, Rahul and Ravichandran, Deepak
Acquiring Paraphrases
3.1 Distributional Similarity
Conclusion
We have shown that high precision surface paraphrases can be obtained by using distributional similarity on a large corpus.
Introduction
A popular method, the so-called distributional similarity , is based on the dictum of Zelig Harris “you shall know the words by the company they keep”: given highly discriminating left and right contexts, only words with very similar meaning will be found to fit in between them.
Related Work
Our method however, pre-computes paraphrases for a large set of surface patterns using distributional similarity over a large corpus and then obtains patterns for a relation by simply finding paraphrases (offline) for a few seed patterns.
Related Work
Using distributional similarity avoids the problem of obtaining overly general patterns and the pre-computation of paraphrases means that we can obtain the set of patterns for any relation instantaneously.
distributional similarity is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Feng, Song and Kang, Jun Seok and Kuznetsova, Polina and Choi, Yejin
Abstract
The focus of this paper is drawing nuanced, connotative sentiments from even those words that are objective on the surface, such as “intelligence”, “human”, and “cheesecake We propose induction algorithms encoding a diverse set of linguistic insights (semantic prosody, distributional similarity , semantic parallelism of coordination) and prior knowledge drawn from lexical resources, resulting in the first broad-coverage connotation lexicon.
Connotation Induction Algorithms
The second subgraph is based on the distributional similarities among the arguments.
Connotation Induction Algorithms
One possible way of constructing such a graph is simply connecting all nodes and assign edge weights proportionate to the word association scores, such as PMI, or distributional similarity .
Connotation Induction Algorithms
where (meso‘ly is the scores based on semantic prosody, (1)0007"d captures the distributional similarity over coordination, and (13%“ controls the sensitivity of connotation detection between positive (negative) and neutral.
Introduction
Therefore, in order to attain a broad coverage lexicon while maintaining good precision, we guide the induction algorithm with multiple, carefully selected linguistic insights: [1] distributional similarity , [2] semantic parallelism of coordination, [3] selectional preference, and [4] semantic prosody (e.g., Sinclair (1991), Louw (1993), Stubbs (1995), Stefanowitsch and Gries (2003))), and also exploit existing lexical resources as an additional inductive bias.
distributional similarity is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Saluja, Avneesh and Hassan, Hany and Toutanova, Kristina and Quirk, Chris
Evaluation
The fifth Arabic-English example demonstrates the pitfalls of over-reliance on the distributional hypothesis: the source bigram corresponding to the name “abd almahmood” is distributional similar to another named entity “mahmood” and the English equivalent is offered as a translation.
Generation & Propagation
Co-occurrence counts for each feature (context word) are accumulated over the monolingual corpus, and these counts are converted to pointwise mutual information (PMI) values, as is standard practice when computing distributional similarities .
Related Work
The idea presented in this paper is similar in spirit to bilingual lexicon induction (BLI), where a seed lexicon in two different languages is expanded with the help of monolingual corpora, primarily by extracting distributional similarities from the data using word context.
Related Work
Paraphrases extracted by “pivoting” via a third language (Callison-Burch et al., 2006) can be derived solely from monolingual corpora using distributional similarity (Marton et al., 2009).
distributional similarity is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Beltagy, Islam and Erk, Katrin and Mooney, Raymond
Background
Distributional similarity between pairs of words is converted into weighted inference rules that are added to the logical representation, and Markov Logic Networks are used to perform probabilistic logical inference.
Introduction
deep representation of sentence meaning, expressed in first-order logic, to capture sentence structure, but combine it with distributional similarity ratings at the word and phrase level.
Introduction
This approach is interesting in that it uses a very deep and precise representation of meaning, which can then be relaxed in a controlled fashion using distributional similarity .
PSL for STS
where vs_sim is a similarity function that calculates the distributional similarity score between the two lexical predicates.
distributional similarity is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Korkontzelos, Ioannis and Manandhar, Suresh
Introduction and related work
In this paper, we propose a novel unsupervised approach that compares the major senses of a MWE and its semantic head using distributional similarity measures to test the compositionality of the MWE.
Proposed approach
We used two techniques to measure the distributional similarity of major uses of the M WE and its semantic head, both based on Jaccard coefi‘icient (J).
Proposed approach
Given the major uses of a MWE and its semantic head, the MWE is considered as compositional, when the corresponding distributional similarity measure (Jc or 197,) value is above a parameter threshold, sim.
Unsupervised parameter tuning
The best performing distributional similarity measure is an.
distributional similarity is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Muller, Philippe and Fabre, Cécile and Adam, Clémentine
Experiments: predicting relevance in context
For each pair neighboura/neighbourb, we computed a set of features from Wikipedia (the corpus used to derive the distributional similarity ): We first computed the frequencies of each item in the corpus, f reqa and f reqb, from which we derive
Introduction
They are not suitable for the evaluation of the whole range of semantic relatedness that is exhibited by distributional similarities , which exceeds the limits of classical lexical relations, even though researchers have tried to collect equivalent resources manually, to be used as a gold standard (Weeds, 2003; Bordag, 2008; Anguiano et al., 2011).
Introduction
One advantage of distributional similarities is to exhibit a lot of different semantic relations, not necessarily standard lexical relations.
distributional similarity is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Berant, Jonathan and Dagan, Ido and Adler, Meni and Goldberger, Jacob
Background
Most methods utilized the distributional similarity hypothesis that states that semantically similar predicates occur with similar arguments (Lin and Pantel, 2001; Szpektor et al., 2004; Yates and Etzioni, 2009; Schoenmackers et al., 2010).
Background
For every pair of predicates i, j, an entailment score wij was learned by training a classifier over distributional similarity features.
Experiments and Results
The data set also contains, for every pair of predicates i, j in every graph, a local score 87;], which is the output of a classifier trained over distributional similarity features.
distributional similarity is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Lau, Jey Han and Cook, Paul and McCarthy, Diana and Gella, Spandana and Baldwin, Timothy
Background and Related Work
The distributional similarity scores of the nearest neighbours are associated with the respective target word senses using a WordNet similarity measure, such as those proposed by J iang and Conrath (1997) and Banerjee and Pedersen (2002).
Background and Related Work
The word senses are ranked based on these similarity scores, and the most frequent sense is selected for the corpus that the distributional similarity thesaurus was trained over.
WordNet Experiments
It is important to bear in mind that MKWC in these experiments makes use of full-text parsing in calculating the distributional similarity thesaurus, and the WordNet graph structure in calculating the similarity between associated words and different senses.
distributional similarity is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Fu, Ruiji and Guo, Jiang and Qin, Bing and Che, Wanxiang and Wang, Haifeng and Liu, Ting
Background
For distributional similarity computing, each word is represented as a semantic vector composed of the pointwise mutual information (PMI) with its contexts.
Introduction
Besides, distributional similarity methods (Kotlerman et al., 2010; Lenci and Benotto, 2012) are based on the assumption that a term can only be used in contexts where its hypemyms can be used and that a term might be used in any contexts where its hyponyms are used.
Related Work
(2010) and Lenci and Benotto (2012), other researchers also propose directional distributional similarity methods (Weeds et al., 2004; Geffet and Dagan, 2005; Bhagat et al., 2007; Szpektor et al., 2007; Clarke, 2009).
distributional similarity is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Vickrey, David and Kipersztok, Oscar and Koller, Daphne
Set Expansion
We consider three similarity data sources: the Moby thesaurus1 , WordNet (Fellbaum, 1998), and distributional similarity based on a large corpus of text (Lin, 1998).
Set Expansion
Distributional similarity .
Set Expansion
Second, the data sources used: each source separately (M for Moby, W for WordNet, D for distributional similarity ), and all three in combination (MWD).
distributional similarity is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Biemann, Chris and Choudhury, Monojit and Mukherjee, Animesh
Abstract
We study the global topology of the syntactic and semantic distributional similarity networks for English through the technique of spectral analysis.
Introduction
An alternative, but equally popular, visualization of distributional similarity is through graphs or networks, where each word is represented as nodes and weighted edges indicate the extent of distributional similarity between them.
Introduction
intriguing question, whereby we construct the syntactic and semantic distributional similarity network (DSN) and analyze their spectrum to understand their global topology.
distributional similarity is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: