Index of papers in Proc. ACL that mention

distributional similarity

Seen in text as:

distributional similarity (107)
distributional similarities (11)
Distributional similarity (10)
Distributional Similarity (10)

Seen in 138 sentences in 21 papers.

1. Reducing Semantic Drift with Bagging and Distributional Similarity

McIntosh, Tara and Curran, James R.

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We propose an integrated distributional similarity filter to identify and censor potential semantic drifts, ensuring over 10% higher precision when extracting large semantic lexicons.
Background	2.2 Distributional Similarity
Background	Distributional similarity has been used to extract semantic lexicons (Grefenstette, 1994), based on the distributional hypothesis that semantically similar words appear in similar contexts (Harris, 1954).
Background	(2006) used 11 patterns, and the distributional similarity score of each pair of terms, to construct features for lexical entailment.
Conclusion	In this paper, we have proposed unsupervised bagging and integrated distributional similarity to minimise the problem of semantic drift in iterative bootstrapping algorithms, particularly when extracting large semantic lexicons.
Detecting semantic drift	In this section, we propose distributional similarity measurements over the extracted lexicon to detect semantic drift during the bootstrapping process.
Detecting semantic drift	We calculate the average distributional similarity (Sim) of t with all terms in L1,”, and those in L( N_m)m N and call the ratio the drift for term t:
Detecting semantic drift	For calculating drift we use the distributional similarity approach described in Curran (2004).
Introduction	We integrate a distributional similarity filter directly into WMEB (McIntosh and Curran, 2008).
Introduction	Our distributional similarity filter gives a similar performance improvement.

distributional similarity is mentioned in 12 sentences in this paper.

Topics mentioned in this paper:

distributional similarity (12)
random sampling (3)

2. Generalizing over Lexical Features: Selectional Preferences for Semantic Role Classification

Zapirain, Beñat and Agirre, Eneko and Màrquez, Llu'is

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	The best results are obtained with a novel second-order distributional similarity measure, and the positive effect is specially relevant for out-of-domain data.
Related Work	Distributional similarity has also been used to tackle syntactic ambiguity.
Related Work	Pantel and Lin (2000) obtained very good results using the distributional similarity measure defined by Lin (1998).
Related Work	The results over 100 frame-specific roles showed that distributional similarities get smaller error rates than Resnik and EM, with Lin’s formula having the smallest error rate.
Results and Discussion	Regarding the selectional preference variants, WordNet—based and first-order distributional similarity models attain similar levels of precision, but the former are clearly worse on recall and F1.
Results and Discussion	The second-order distributional similarity measures perform best overall, both in precision and recall.
Results and Discussion	Regarding the similarity metrics, the cosine seems to perform consistently better for first-order distributional similarity , while J accard provided slightly better results for second-order similarity.
Selectional Preference Models	Distributional SP models: Given the availability of publicly available resources for distributional similarity , we used 1) a ready-made thesaurus (Lin, 1998), and 2) software (Pado and Lapata, 2007) which we run on the British National Corpus (BNC).

distributional similarity is mentioned in 13 sentences in this paper.

Topics mentioned in this paper:

3. Global Learning of Focused Entailment Graphs

Berant, Jonathan and Dagan, Ido and Goldberger, Jacob

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Background	Entailment learning Two information types have primarily been utilized to learn entailment rules between predicates: lexicographic resources and distributional similarity resources.
Background	Therefore, distributional similarity is used to learn broad-scale resources.
Background	Distributional similarity algorithms predict a semantic relation between two predicates by comparing the arguments with which they occur.
Experimental Evaluation	When computing distributional similarity scores, a template is represented as a feature vector of the CUIs that instantiate its arguments.
Experimental Evaluation	Local algorithms We described 12 distributional similarity measures computed over our corpus (Section 5.1).
Experimental Evaluation	For each distributional similarity measure (altogether 16 measures), we learned a graph by inserting any edge (u, v) , when u is in the top K templates most similar to 2).
Learning Entailment Graph Edges	Next, we represent each pair of propositional templates with a feature vector of various distributional similarity scores.
Learning Entailment Graph Edges	Distributional similarity representation We aim to train a classifier that for an input template pair (t1, t2) determines whether t1 entails 752.
Learning Entailment Graph Edges	A template pair is represented by a feature vector where each coordinate is a different distributional similarity score.

distributional similarity is mentioned in 15 sentences in this paper.

Topics mentioned in this paper:

4. Density Maximization in Context-Sense Metric Space for All-words WSD

Tanigaki, Koichi and Shiba, Mitsuteru and Munaka, Tatsuji and Sagisaka, Yoshinori

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Evaluation	The background documents consists of 2.7M running words, which was used to compute distributional similarity .
Evaluation	The context metric space was composed by k:-nearest neighbor words of distributional similarity (Lin, 1998), as is described in Section 4.
Evaluation	(2004), which determines the word sense based on sense similarity and distributional similarity to the k-nearest neighbor words of a target word by distributional similarity .
Introduction	(2004) proposed a method to combine sense similarity with distributional similarity and configured predominant sense score.
Introduction	Distributional similarity was used to weight the influence of context words, based on large-scale statistics.
Introduction	(2009) used a k-nearest words on distributional similarity as context words.
Metric Space Implementation	Distributional similarity (Lin, 1998) was computed among target words, based on the statistics of the test set and the background text provided as the official dataset of the SemEval-2 English all-words task (Agirre et al., 2010).
Metric Space Implementation	Those texts were parsed using RASP parser (Briscoe et al., 2006) version 3.1, to obtain grammatical relations for the distributional similarity , as well as to obtain lemmata and part-of-speech (POS) tags which are required to look up the sense inventory of WordNet.
Metric Space Implementation	Based on the distributional similarity , we just used k-nearest neighbor words as the context of each target word.

distributional similarity is mentioned in 12 sentences in this paper.

Topics mentioned in this paper:

5. A Two Level Model for Context Sensitive Inference Rules

Melamud, Oren and Berant, Jonathan and Dagan, Ido and Goldberger, Jacob and Szpektor, Idan

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Automatic acquisition of inference rules for predicates has been commonly addressed by computing distributional similarity between vectors of argument words, operating at the word space level.
Background and Model Setting	learning, based on distributional similarity at the word level, and then context-sensitive scoring for rule applications, based on topic-level similarity.
Background and Model Setting	The DIRT algorithm (Lin and Pantel, 2001) follows the distributional similarity paradigm to learn predicate inference rules.
Discussion and Future Work	In particular, we proposed a novel scheme that applies over any base distributional similarity measure which operates at the word level, and computes a single context-insensitive score for a rule.
Discussion and Future Work	We therefore focused on comparing the performance of our two-level scheme with state-of-the-art prior topic-level and word-level models of distributional similarity , over a random sample of inference rule applications.
Experimental Settings	Since our model can contextualize various distributional similarity measures, we evaluated the performance of all the above methods on several base similarity measures and their learned rule-
Experimental Settings	Whenever we evaluated a distributional similarity measure (namely Lin, BInc, or Cosine), we discarded instances from Zeichner et al.’s dataset in which the assessed rule is not in the context-insensitive rule-set learned for this measure or the argument instantiation of the rule is not in the LDA lexicon.
Results	Specifically, topics are leveraged for high-level domain disambiguation, while fine grained word-level distributional similarity is computed for each rule under each such domain.
Results	Indeed, on test-setvc, in which context mismatches are rare, our algorithm is still better than the original measure, indicating that WT can be safely applied to distributional similarity measures without concerns of reduced performance in different context scenarios.
Two-level Context-sensitive Inference	On the other hand, the topic-biased similarity for 751 is substantially lower, since prominent words in this topic are likely to occur with ‘acquire’ but not with ‘learn’, yielding low distributional similarity .

distributional similarity is mentioned in 10 sentences in this paper.

Topics mentioned in this paper:

6. Automated Collocation Suggestion for Japanese Second Language Learners

Pereira, Lis and Manguilimotan, Erlyn and Matsumoto, Yuji

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Related Work	rity: l) thesaurus-based word similarity, 2) distributional similarity and 3) confusion set derived from learner corpus.
Related Work	Distributional Similarity : Thesaurus-based methods produce weak recall since many words, phrases and semantic connections are not covered by hand-built thesauri, especially for verbs and adjectives.
Related Work	As an alternative, distributional similarity models are often used since it gives higher recall.

distributional similarity is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

7. Lexical Inference over Multi-Word Predicates: A Distributional Approach

Abend, Omri and Cohen, Shay B. and Steedman, Mark

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Background and Related Work	Most approaches to the task used distributional similarity as a major component within their system.
Background and Related Work	(2006) presented a system for learning inference rules between nouns, using distributional similarity and pattern-based features.
Background and Related Work	(2011) used distributional similarity between predicates to weight the edges of an entailment graph.
Discussion	The distributional similarity between p L and p R under this model is Sim(pL,pR) = 2:121 sim(wi,w3), where sim(wi, is the dot product between 2),- and 213.
Introduction	Most works to this task use distributional similarity , either as their main component (Szpektor and Dagan, 2008; Melamud et al., 2013b), or as part of a more comprehensive system (Berant et al., 2011; Lewis and Steedman, 2013).
Our Proposal: A Latent LC Approach	Distributional Similarity Features.
Our Proposal: A Latent LC Approach	The distributional similarity features are based on the DIRT system (Lin and Pantel, 2001).

distributional similarity is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

8. Global Learning of Typed Entailment Rules

Berant, Jonathan and Dagan, Ido and Goldberger, Jacob

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Background	Most work on learning entailment rules between predicates considered each rule independently of others, using two sources of information: lexicographic resources and distributional similarity .
Background	Distributional similarity algorithms use large corpora to learn broader resources by assuming that semantically similar predicates appear with similar arguments.
Background	Distributional similarity algorithms differ in their feature representation: Some use a binary representation: each predicate is represented by one feature vector where each feature is a pair of arguments (Szpektor et al., 2004; Yates and Etzioni, 2009).
Experimental Evaluation	Second, to distributional similarity algorithms: (a) SR: the score used by Schoenmackers et al.
Experimental Evaluation	Third, we compared to the entailment classifier with no transitivity constraints (clsf) to see if combining distributional similarity scores improves performance over single measures.
Learning Typed Entailment Graphs	We compute 11 distributional similarity scores for each pair of predicates based on the arguments appearing in the extracted arguments.

distributional similarity is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

9. Learning From Collective Human Behavior to Introduce Diversity in Lexical Choice

Qazvinian, Vahed and Radev, Dragomir R.

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Finally, we present a ranker that employs distributional similarities to build a network of words, and captures the diversity of perspectives by detecting communities in this network.
Conclusion and Future Work	Finally, we proposed a ranking system that employs word distributional similarities to identify semantically equivalent words, and compared it with a wide
Diversity-based Ranking	5.1 Distributional Similarity
Diversity-based Ranking	In order to capture the nuggets of equivalent semantic classes, we use a distributional similarity of
Diversity-based Ranking	The method based on the distributional similarities of words outperforms other methods in the citations category.

distributional similarity is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

10. Directional Distributional Similarity for Lexical Expansion

Kotlerman, Lili and Dagan, Ido and Szpektor, Idan and Zhitomirsky-Geffet, Maayan

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Background	To date, most distributional similarity research concentrated on symmetric measures, such as the widely cited and competitive (as shown in (Weeds and Weir, 2003)) LIN measure (Lin, 1998):
Evaluation and Results	In this setting, category names were taken as seeds and expanded by distributional similarity , further measuring cosine similarity with categorized documents similarly to IR query expansion.
Introduction	Much work on automatic identification of semantically similar terms exploits Distributional Similarity , assuming that such terms appear in similar contexts.
Introduction	This paper is motivated by one of the prominent applications of distributional similarity , namely identifying lexical expansions.
Introduction	Often, distributional similarity measures are used to identify expanding terms (e.g.

distributional similarity is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

11. Large Scale Acquisition of Paraphrases for Learning Surface Patterns

Bhagat, Rahul and Ravichandran, Deepak

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Acquiring Paraphrases	3.1 Distributional Similarity
Conclusion	We have shown that high precision surface paraphrases can be obtained by using distributional similarity on a large corpus.
Introduction	A popular method, the so-called distributional similarity , is based on the dictum of Zelig Harris “you shall know the words by the company they keep”: given highly discriminating left and right contexts, only words with very similar meaning will be found to fit in between them.
Related Work	Our method however, pre-computes paraphrases for a large set of surface patterns using distributional similarity over a large corpus and then obtains patterns for a relation by simply finding paraphrases (offline) for a few seed patterns.
Related Work	Using distributional similarity avoids the problem of obtaining overly general patterns and the pre-computation of paraphrases means that we can obtain the set of patterns for any relation instantaneously.

distributional similarity is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

12. Connotation Lexicon: A Dash of Sentiment Beneath the Surface Meaning

Feng, Song and Kang, Jun Seok and Kuznetsova, Polina and Choi, Yejin

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	The focus of this paper is drawing nuanced, connotative sentiments from even those words that are objective on the surface, such as “intelligence”, “human”, and “cheesecake We propose induction algorithms encoding a diverse set of linguistic insights (semantic prosody, distributional similarity , semantic parallelism of coordination) and prior knowledge drawn from lexical resources, resulting in the first broad-coverage connotation lexicon.
Connotation Induction Algorithms	The second subgraph is based on the distributional similarities among the arguments.
Connotation Induction Algorithms	One possible way of constructing such a graph is simply connecting all nodes and assign edge weights proportionate to the word association scores, such as PMI, or distributional similarity .
Connotation Induction Algorithms	where (meso‘ly is the scores based on semantic prosody, (1)0007"d captures the distributional similarity over coordination, and (13%“ controls the sensitivity of connotation detection between positive (negative) and neutral.
Introduction	Therefore, in order to attain a broad coverage lexicon while maintaining good precision, we guide the induction algorithm with multiple, carefully selected linguistic insights: [1] distributional similarity , [2] semantic parallelism of coordination, [3] selectional preference, and [4] semantic prosody (e.g., Sinclair (1991), Louw (1993), Stubbs (1995), Stefanowitsch and Gries (2003))), and also exploit existing lexical resources as an additional inductive bias.

distributional similarity is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

13. Graph-based Semi-Supervised Learning of Translation Models from Monolingual Data

Saluja, Avneesh and Hassan, Hany and Toutanova, Kristina and Quirk, Chris

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Evaluation	The fifth Arabic-English example demonstrates the pitfalls of over-reliance on the distributional hypothesis: the source bigram corresponding to the name “abd almahmood” is distributional similar to another named entity “mahmood” and the English equivalent is offered as a translation.
Generation & Propagation	Co-occurrence counts for each feature (context word) are accumulated over the monolingual corpus, and these counts are converted to pointwise mutual information (PMI) values, as is standard practice when computing distributional similarities .
Related Work	The idea presented in this paper is similar in spirit to bilingual lexicon induction (BLI), where a seed lexicon in two different languages is expanded with the help of monolingual corpora, primarily by extracting distributional similarities from the data using word context.
Related Work	Paraphrases extracted by “pivoting” via a third language (Callison-Burch et al., 2006) can be derived solely from monolingual corpora using distributional similarity (Marton et al., 2009).

distributional similarity is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

language model (18)
BLEU (15)
bigram (13)

14. Probabilistic Soft Logic for Semantic Textual Similarity

Beltagy, Islam and Erk, Katrin and Mooney, Raymond

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Background	Distributional similarity between pairs of words is converted into weighted inference rules that are added to the logical representation, and Markov Logic Networks are used to perform probabilistic logical inference.
Introduction	deep representation of sentence meaning, expressed in first-order logic, to capture sentence structure, but combine it with distributional similarity ratings at the word and phrase level.
Introduction	This approach is interesting in that it uses a very deep and precise representation of meaning, which can then be relaxed in a controlled fashion using distributional similarity .
PSL for STS	where vs_sim is a similarity function that calculates the distributional similarity score between the two lexical predicates.

distributional similarity is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

15. Detecting Compositionality in Multi-Word Expressions.

Korkontzelos, Ioannis and Manandhar, Suresh

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction and related work	In this paper, we propose a novel unsupervised approach that compares the major senses of a MWE and its semantic head using distributional similarity measures to test the compositionality of the MWE.
Proposed approach	We used two techniques to measure the distributional similarity of major uses of the M WE and its semantic head, both based on Jaccard coefi‘icient (J).
Proposed approach	Given the major uses of a MWE and its semantic head, the MWE is considered as compositional, when the corresponding distributional similarity measure (Jc or 197,) value is above a parameter threshold, sim.
Unsupervised parameter tuning	The best performing distributional similarity measure is an.

distributional similarity is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

16. Predicting the relevance of distributional semantic similarity with contextual information

Muller, Philippe and Fabre, Cécile and Adam, Clémentine

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments: predicting relevance in context	For each pair neighboura/neighbourb, we computed a set of features from Wikipedia (the corpus used to derive the distributional similarity ): We first computed the frequencies of each item in the corpus, f reqa and f reqb, from which we derive
Introduction	They are not suitable for the evaluation of the whole range of semantic relatedness that is exhibited by distributional similarities , which exceeds the limits of classical lexical relations, even though researchers have tried to collect equivalent resources manually, to be used as a gold standard (Weeds, 2003; Bordag, 2008; Anguiano et al., 2011).
Introduction	One advantage of distributional similarities is to exhibit a lot of different semantic relations, not necessarily standard lexical relations.

distributional similarity is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

17. Efficient Tree-based Approximation for Entailment Graph Learning

Berant, Jonathan and Dagan, Ido and Adler, Meni and Goldberger, Jacob

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Background	Most methods utilized the distributional similarity hypothesis that states that semantically similar predicates occur with similar arguments (Lin and Pantel, 2001; Szpektor et al., 2004; Yates and Etzioni, 2009; Schoenmackers et al., 2010).
Background	For every pair of predicates i, j, an entailment score wij was learned by training a classifier over distributional similarity features.
Experiments and Results	The data set also contains, for every pair of predicates i, j in every graph, a local score 87;], which is the output of a classifier trained over distributional similarity features.

distributional similarity is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

18. Learning Word Sense Distributions, Detecting Unattested Senses and Identifying Novel Senses Using Topic Models

Lau, Jey Han and Cook, Paul and McCarthy, Diana and Gella, Spandana and Baldwin, Timothy

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Background and Related Work	The distributional similarity scores of the nearest neighbours are associated with the respective target word senses using a WordNet similarity measure, such as those proposed by J iang and Conrath (1997) and Banerjee and Pedersen (2002).
Background and Related Work	The word senses are ranked based on these similarity scores, and the most frequent sense is selected for the corpus that the distributional similarity thesaurus was trained over.
WordNet Experiments	It is important to bear in mind that MKWC in these experiments makes use of full-text parsing in calculating the distributional similarity thesaurus, and the WordNet graph structure in calculating the similarity between associated words and different senses.

distributional similarity is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

19. Learning Semantic Hierarchies via Word Embeddings

Fu, Ruiji and Guo, Jiang and Qin, Bing and Che, Wanxiang and Wang, Haifeng and Liu, Ting

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Background	For distributional similarity computing, each word is represented as a semantic vector composed of the pointwise mutual information (PMI) with its contexts.
Introduction	Besides, distributional similarity methods (Kotlerman et al., 2010; Lenci and Benotto, 2012) are based on the assumption that a term can only be used in contexts where its hypemyms can be used and that a term might be used in any contexts where its hyponyms are used.
Related Work	(2010) and Lenci and Benotto (2012), other researchers also propose directional distributional similarity methods (Weeds et al., 2004; Geffet and Dagan, 2005; Bhagat et al., 2007; Szpektor et al., 2007; Clarke, 2009).

distributional similarity is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

20. An Active Learning Approach to Finding Related Terms

Vickrey, David and Kipersztok, Oscar and Koller, Daphne

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Set Expansion	We consider three similarity data sources: the Moby thesaurus1 , WordNet (Fellbaum, 1998), and distributional similarity based on a large corpus of text (Lin, 1998).
Set Expansion	Distributional similarity .
Set Expansion	Second, the data sources used: each source separately (M for Moby, W for WordNet, D for distributional similarity ), and all three in combination (MWD).

distributional similarity is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

21. Syntax is from Mars while Semantics from Venus! Insights from Spectral Analysis of Distributional Similarity Networks

Biemann, Chris and Choudhury, Monojit and Mukherjee, Animesh

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We study the global topology of the syntactic and semantic distributional similarity networks for English through the technique of spectral analysis.
Introduction	An alternative, but equally popular, visualization of distributional similarity is through graphs or networks, where each word is represented as nodes and weighted edges indicate the extent of distributional similarity between them.
Introduction	intriguing question, whereby we construct the syntactic and semantic distributional similarity network (DSN) and analyze their spectrum to understand their global topology.

distributional similarity is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

distributional similarity (3)