Index of papers in Proc. ACL that mention
  • co-occurrence
Chong, Tze Yuang and E. Banchs, Rafael and Chng, Eng Siong and Li, Haizhou
Abstract
In this paper, we explore the use of distance and co-occurrence information of word—pairs for language modeling.
Introduction
the distance is described regardless the actual frequency of the history-word, while the co-occurrence is described regardless the actual position of the history-word.
Language Modeling with TD and TO
In Eq.3, we have decoupled the observation of a word-pair into the events of distance and co-occurrence .
Language Modeling with TD and TO
The TD likelihood for a distance k given the co-occurrence of the word-pair (wi_k, wt) can be estimated from counts as follows:
Language Modeling with TD and TO
zero co-occurrence C(wi_k E h,t = w,) = O, which results in a division by zero.
Motivation of the Proposed Approach
The attributes of distance and co-occurrence are exploited and modeled differently in each language modeling approach.
Motivation of the Proposed Approach
Both, the conventional trigger model and the latent-semantic model capture the co-occurrence information while ignoring the distance information.
Motivation of the Proposed Approach
On the other hand, distant-bigram models and distance-dependent trigger models make use of both, distance and co-occurrence , information up to window sizes of ten to twenty.
co-occurrence is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Neubig, Graham and Watanabe, Taro and Mori, Shinsuke and Kawahara, Tatsuya
Experiments
In this section, we compare the translation accuracies for character-based translation using the phrasal ITG model with and without the proposed improvements of substring co-occurrence priors and lookahead parsing as described in Sections 4 and 5.2.
Experiments
Table 5: METEOR scores for alignment with and without lookahead and co-occurrence priors.
Introduction
This method is attractive, as it is theoretically able to handle all sparsity phenomena in a single unified framework, but has only been shown feasible between similar language pairs such as Spanish-Catalan (Vilar et al., 2007), Swedish-Norwegian (Tiedemann, 2009), and Thai-Lao (Somlertlamvanich et al., 2008), which have a strong co-occurrence between single characters.
Substring Prior Probabilities
In this section, we overview an existing method used to calculate these prior probabilities, and also propose a new way to calculate priors based on substring co-occurrence statistics.
Substring Prior Probabilities
5.2 Substring Co-occurrence Priors
Substring Prior Probabilities
Instead, we propose a method for using raw substring co-occurrence statistics to bias alignments towards substrings that often co-occur in the entire training corpus.
co-occurrence is mentioned in 13 sentences in this paper.
Topics mentioned in this paper:
Morin, Emmanuel and Hazem, Amir
Bilingual Lexicon Extraction
For each word 2' of the source and the target languages, we obtain a context vector U;- which gathers the set of co-occurrence words j associated with the number of times that j and 2' occur together 0000(2', j).
Bilingual Lexicon Extraction
One way to deal with this problem is to reestimate co-occurrence counts by a prediction function (Hazem and Morin, 2013).
Bilingual Lexicon Extraction
This consists in assigning to each observed co-occurrence count of a small comparable corpora, a new value learned beforehand from a large training corpus.
Experiments and Results
The aim of this experiment is twofold: first, we want to evaluate the usefulness of predicting word co-occurrence counts and second, we want to find out whether it is more appropriate to apply prediction to the source side, the target side or both sides of the bilingual comparable corpora.
Experiments and Results
We applied the same regression function to all co-occurrence counts while learning models for low and high frequencies should have been more appropriate.
co-occurrence is mentioned in 13 sentences in this paper.
Topics mentioned in this paper:
Yang, Qiang and Chen, Yuqiang and Xue, Gui-Rong and Dai, Wenyuan and Yu, Yong
Image Clustering with Annotated Auxiliary Data
Intuitively, our algorithm aPLSA performs PLSA analysis on the target images, which are converted to an image instance-to-feature co-occurrence matrix.
Image Clustering with Annotated Auxiliary Data
At the same time, PLSA is also applied to the annotated image data from social Web, which is converted into a text-to-image-feature co-occurrence matrix.
Image Clustering with Annotated Auxiliary Data
Based on the image data set V, we can estimate an image instance-to-feature co-occurrence matrix AMXIJEl 6 WWW, where each element Ala-(1 g i g |V| and l g j g |.7-"|) in the matrix A is the frequency of the feature fj appearing in the instance 21,-.
co-occurrence is mentioned in 13 sentences in this paper.
Topics mentioned in this paper:
Yang, Hui and Callan, Jamie
Experiments
Co-occurrence 0.47 0.56 0.45 0.41 0.41
Experiments
Co-occurrence 0.34 0.36 0.34 0.31 0.31
Experiments
Both co-occurrence and lexico-syntactic patterns work well for all three types of relations.
Introduction
The common types of features include contextual (Lin, 1998), co-occurrence (Yang and Callan, 2008), and syntactic dependency (Pantel and Lin, 2002; Pantel and Ravichandran, 2004).
Introduction
The framework integrates contextual, co-occurrence , syntactic dependency, lexi-cal-syntactic patterns, and other features to learn an ontology metric, a score indicating semantic distance, for each pair of terms in a taxonomy; it then incrementally clusters terms based on their ontology metric scores.
Related Work
Inspired by the conjunction and appositive structures, Riloff and Shepherd (1997), Roark and Charniak (1998) used co-occurrence statistics in local context to discover sibling relations.
Related Work
Besides contextual features, the vectors can also be represented by verb-noun relations (Pereira et al., 1993), syntactic dependency (Pantel and Ravichandran, 2004; Snow et al., 2005), co-occurrence (Yang and Callan, 2008), conjunction and appositive features (Caraballo, 1999).
The Features
The features include contextual, co-occurrence , syntactic dependency, lexical-syntactic patterns, and miscellaneous.
The Features
The second set of features is co-occurrence .
The Features
In our work, co-occurrence is measured by point-wise mutual information between two terms:
co-occurrence is mentioned in 13 sentences in this paper.
Topics mentioned in this paper:
Bruni, Elia and Boleda, Gemma and Baroni, Marco and Tran, Nam Khanh
Distributional semantic models
In all cases, occurrence and co-occurrence statistics are extracted from the freely available ukWaC and Wackypedia corpora combined (size: 1.9B and 820M tokens, respectively).1 Moreover, in all models the raw co-occurrence counts are transformed into nonnegative Local Mutual Information (LMI) scores.2 Finally, in all models we harvest vector representations for the same words (lemmas), namely the top 20K most frequent nouns, 5K most frequent adjectives and 5K most frequent verbs in the combined corpora (for coherence with the vision-based models, that cannot exploit contextual information to distinguish nouns and adjectives, we merge nominal and adjectival usages of the color adjectives in the text-based models as well).
Distributional semantic models
Window2 records sentence-internal co-occurrence with the nearest 2 content words to the left and right of each target concept, a narrow context definition expected to capture taxonomic relations.
Distributional semantic models
We further introduce hybrid models that exploit the patterns of co-occurrence of words as tags of the same images.
Experiment 2
11We also experimented with a model based on direct co-occurrence of adjectives and nouns, obtaining promising results in a preliminary version of Exp.
Experiment 2
our results suggest that co-occurrence in an image label can be used as a surrogate of true visual information to some extent, but the behavior of hybrid models depends on ad-hoc aspects of the labeled dataset, and, from an empirical perspective, they are more limited than truly multimodal models, because they require large amounts of rich verbal picture descriptions to reach good coverage.
Introduction
Traditional semantic space models represent meaning on the basis of word co-occurrence statistics in large text corpora (Tumey and Pantel, 2010).
Introduction
We also show that “hybrid” models exploiting the patterns of co-occurrence of words as tags of the same images can be a powerful surrogate of visual information under certain circumstances.
Related work
Like us, Ozbal and colleagues use both a textual model and a visual model (as well as Google adjective-noun co-occurrence counts) to find the typical color of an object.
co-occurrence is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Bansal, Mohit and Klein, Dan
Abstract
Specifically, we exploit short-distance cues to hypernymy, semantic compatibility, and semantic context, as well as general lexical co-occurrence .
Introduction
For example, we can collect the co-occurrence statistics of an anaphor with various candidate antecedents to judge relative surface affinities (i.e., (Obama, president) versus (Jobs, president)).
Introduction
We can also count co-occurrence statistics of competing antecedents when placed in the context of an anaphoric pronoun (i.e., Obama ’s election campaign versus Jobs’ election campaign).
Introduction
We explore five major categories of semantically informative Web features, based on (1) general lexical affinities (via generic co-occurrence statistics), (2) lexical relations (via Hearst-style hypernymy patterns), (3) similarity of entity-based context (e.g., common values of y for
Semantics via Web Features
The first four types are most intuitive for mention pairs where both members are non-pronominal, but, aside from the general co-occurrence group, helped for all mention pair types.
Semantics via Web Features
3.1 General co-occurrence
Semantics via Web Features
These features capture co-occurrence statistics of the two headwords, i.e., how often hl and hg are seen adjacent or nearly adjacent on the Web.
co-occurrence is mentioned in 17 sentences in this paper.
Topics mentioned in this paper:
Wintrode, Jonathan and Khudanpur, Sanjeev
Abstract
Instead of taking a broad view of topic context in spoken documents, variability of word co-occurrence statistics across corpora leads us to focus instead the on phenomenon of word repetition within single documents.
Introduction
In order to arrive at our eventual solution, we take the BABEL Tagalog corpus and analyze word co-occurrence and repetition statistics in detail.
Introduction
Our observation of the variability in co-occurrence statistics between Tagalog training and development partitions leads us to narrow the scope of document context to same word co-occurrences, i.e.
Motivation
Given the rise of unsupervised latent topic modeling with Latent Dirchlet Allocation (Blei et al., 2003) and similar latent variable approaches for discovering meaningful word co-occurrence patterns in large text corpora, we ought to be able to leverage these topic contexts instead of merely N-grams.
Motivation
The difficulty in this approach arises from the variability in word co-occurrence statistics.
Motivation
Unfortunately, estimates of co-occurrence from small corpora are not very consistent, and often over- or underestimate concurrence probabilities needed for term detection.
Term Detection Re-scoring
As with word co-occurrence , we consider if estimates of Padapt(w) from training data are consistent when estimated on development data.
co-occurrence is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Shezaf, Daphna and Rappoport, Ari
Algorithm
where Pr(w1, 2122) is the co-occurrence count, and Pr(wi) is the total number of appearance of w,-in the corpus (Church and Hanks, 1990).
Conclusion
Our results confirm that alignment is problematic in using co-occurrence methods across languages, at least in our settings.
Introduction
While co-occurrence scores are used to compute signatures, signatures, unlike context vectors, do not contain the score values.
Lexicon Generation Experiments
In the case of context vectors, the vector indices, or keys, are words, and their values are co-occurrence based scores.
Lexicon Generation Experiments
The window size for co-occurrence counting, k, was 4.
Lexicon Generation Experiments
In the three co-occurrence based methods, NAS similarity, cosine distance and and city block distance, the highest ranking translation was selected.
Previous Work
(2009) replaced the traditional window-based co-occurrence counting with dependency-tree based counting, while Pekar et al.
Previous Work
(2006) predicted missing co-occurrence values based on similar words in the same language.
co-occurrence is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Ogura, Yukari and Kobayashi, Ichiro
Experiment
By this, constructing a graph based on word co-occurrence of each 3 sentences in a document works well to rank important words, taking account of the context of the word.
Introduction
In our study, we express the relation of word co-occurrence in the form of a graph.
Related studies
(2011) has detected topics in a document by constructing a graph of word co-occurrence and applied the PageRank algorithm on it.
Related studies
The graph used in our method is constructed based on word co-occurrence so that important words which are sensitive to latent information can be extracted by the PageRank algorithm.
Techniques for text classification
According to (Newman et al., 2010), topic coherence is related to word co-occurrence .
Techniques for text classification
The refined documents are composed of the important sentences extracted from a viewpoint of latent information, i.e., word co-occurrence , so they are proper to be classified based on latent information.
Techniques for text classification
In our study, we construct a graph based on word co-occurrence .
co-occurrence is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Maxwell, K. Tamsin and Oberlander, Jon and Croft, W. Bruce
Experimental setup
Co-occurrence features (C)
Experimental setup
Co-occurrence and IR effectiveness prediction features (CI) was the most influential class, and accounted for 70% of all features in the model.
Related work
In ad hoc IR, most models of term dependence use word co-occurrence and proximity (Song and Croft, 1999; Metzler and Croft, 2005; Srikanth and Srihari, 2002; van Rijsbergen, 1993).
Selection method for catenae
Co-occurrence features: A governor wl tends to subcategorize for its dependents wn.
Selection method for catenae
We conclude that co-occurrence is an important feature of dependency relations (Mel’cuk, 2003).
Selection method for catenae
In addition, term frequencies and inverse document frequencies calculated using word co-occurrence measures are commonly used in IR.
co-occurrence is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Beigman Klebanov, Beata and Flor, Michael
Application to Essay Scoring
8It is also possible that some of the instances with very high PMI are pairs that contain low frequency words for which the database predicts a spuriously high PMI based on a single (and atypical) co-occurrence that happens to repeat in an essay — similar to the Schwartz eschews example in (Manning and Schiitze, 1999, Table 5.16, p. 181).
Introduction
Thus, co-occurrence of words in n-word windows, syntactic structures, sentences, paragraphs, and even whole documents is captured in vector-space models built from text corpora (Turney and Pan-tel, 2010; Basili and Pennacchiotti, 2010; Erk and Pado, 2008; Mitchell and Lapata, 2008; Bullinaria and Levy, 2007; Jones and Mewhort, 2007; Pado and Lapata, 2007; Lin, 1998; Landauer and Dumais, 1997; Lund and Burgess, 1996; Salton et al., 1975).
Introduction
However, little is known about typical profiles of texts in terms of co-occurrence behavior of their words.
Introduction
The cited approaches use topic models that are in turn estimated using word co-occurrence .
Methodology
The first decision is how to quantify the extent of co-occurrence between two words; we will use point-wise mutual information (PMI) estimated from a large and diverse corpus of texts.
Methodology
The third decision is how to represent the co-occurrence profiles; we use a histogram where each bin represents the proportion of word pairs in the given interval of PMI values.
Methodology
To obtain comprehensive information about typical co-occurrence behavior of words of English, we build a first-order co-occurrence word-space model (Turney and Pantel, 2010; Baroni and Lenci, 2010).
co-occurrence is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Han, Xianpei and Zhao, Jun
Experiments
There were three knowledge sources we used for our experiments: the WordNet 3.0; the Sep. 9, 2007 English version of Wikipedia; and the Web pages of each ambiguous name in WePS datasets as the NE Co-occurrence Corpus.
Introduction
This model measures similarity based on only the co-occurrence statistics of terms, without considering all the semantic relations like social relatedness between named entities, associative relatedness between concepts, and lexical relatedness (e.g., acronyms, synonyms) between key terms.
Related Work
(2007) used the co-occurrence statistics between named entities in the Web.
The Structural Semantic Relatedness Measure
We extract three types of semantic relations (semantic relatedness between Wikipedia concepts, lexical relatedness between WordNet concepts and social relatedness between NEs) correspondingly from three knowledge sources: Wikipedia, WordNet and NE Co-occurrence Corpus.
The Structural Semantic Relatedness Measure
NE Co-occurrence Corpus, a corpus of documents for capturing the social relatedness between named entities.
The Structural Semantic Relatedness Measure
According to the fuzzy set theory (Baeza-Yates et al., 1999), the degree of named entities co-occurrence in a corpus is a measure of the relatedness between them.
co-occurrence is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Hassan, Ahmed and Radev, Dragomir R.
Experiments
The spin model approach uses word glosses, WordNet synonym, hypernym, and antonym relations, in addition to co-occurrence statistics extracted from corpus.
Experiments
Adding co-occurrence statistics slightly improved performance, while using glosses did not help at all.
Experiments
No glosses or co-occurrence statistics are used.
Related Work
To get co-occurrence statistics, they submit several queries to a search engine.
Related Work
They construct a network of words using gloss definitions, thesaurus, and co-occurrence statistics.
Word Polarity
Another source of links between words is co-occurrence statistics from corpus.
Word Polarity
We study the effect of using co-occurrence statistics to connect words later at the end of our experiments.
co-occurrence is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Thater, Stefan and Fürstenau, Hagen and Pinkal, Manfred
Experiments: Ranking Paraphrases
As for the full model, we use pmi values rather than raw frequency counts as co-occurrence statistics.
Introduction
In the standard approach, word meaning is represented by feature vectors, with large sets of context words as dimensions, and their co-occurrence frequencies as values.
Introduction
This allows us to model the semantic interaction between the meaning of a head word and its dependent at the microlevel of relation-specific co-occurrence frequencies.
Related Work
Figure l: Co-occurrence graph of a small sample corpus of dependency trees.
The model
The basis for the construction of both kinds of vector representations are co-occurrence graphs.
The model
Figure 1 shows the co-occurrence graph of a small sample corpus of dependency trees: Words are represented as nodes in the graph, possible dependency relations between them are drawn as labeled edges, with weights corresponding to the observed frequencies.
The model
introduce another kind of vectors capturing infor-mations about all words that can be reached with two steps in the co-occurrence graph.
co-occurrence is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Jurgens, David and Stevens, Keith
The S-Space Framework
We divide the algorithms into four categories based on their structural similarity: document-based, co-occurrence , approximation, and Word Sense Induction (WSI) models.
The S-Space Framework
Co-occurrence models build the vector space using the distribution of co-occurring words in a context, which is typically defined as a region around a word or paths rooted in a parse tree.
The S-Space Framework
co-occurrence data rather than model it explicitly in order to achieve better scalability for larger data sets.
Word Space Models
Later models have expanded the notion of co-occurrence but retain the premise that distributional similarity can be used to extract meaningful relationships between words.
Word Space Models
Common approaches use a lexical distance, syntactic relation, or document co-occurrence to define the context.
Word Space Models
Co-occurrence Models HAL (Burgess and Lund, 1997) COALS (Rohde et a1., 2009)
co-occurrence is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Liu, Kang and Xu, Liheng and Zhao, Jun
Experiments
But they captured relations only using co-occurrence statistics.
Experiments
Second, our method captures semantic relations using topic modeling and captures opinion relations through word alignments, which are more precise than Hai which merely uses co-occurrence information to indicate such relations among words.
Introduction
In traditional extraction strategy, opinion associations are usually computed based on the co-occurrence frequency.
Related Work
They usually captured different relations using co-occurrence information.
The Proposed Method
Each opinion target can find its corresponding modifiers in sentences through alignment, in which multiple factors are considered globally, such as co-occurrence information, word position in sentence, etc.
The Proposed Method
p(vt, 210) is the co-occurrence probability of v75 and 110 based on the opinion relation identification results.
co-occurrence is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Li, Zhifei and Yarowsky, David
Related Work
Moreover, the HMM model is computationally-expensive and unable to exploit the data co-occurrence phenomena that we
Unsupervised Translation Induction for Chinese Abbreviations
3.3.1 Data Co-occurrence
Unsupervised Translation Induction for Chinese Abbreviations
In a monolingual corpus, relevant words tend to appear together (i.e., co-occurrence ).
Unsupervised Translation Induction for Chinese Abbreviations
The co-occurrence may imply a relationship (e.g., Bill Gates is the founder of Microsoft).
co-occurrence is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Bollegala, Danushka and Weir, David and Carroll, John
Distribution Prediction
main, we construct a feature co-occurrence matrix A in which columns correspond to unigram features and rows correspond to either unigram or bigram features.
Distribution Prediction
The value of the element aij in the co-occurrence matrix A is set to the number of sentences in which the i-th and j-th features co-occur.
Distribution Prediction
We apply Positive Pointwise Mutual Information (PPMI) to the co-occurrence matrix A.
Experiments and Results
For each domain D in the SANCL (POS tagging) and Amazon review (sentiment classification) datasets, we create a PPMI weighted co-occurrence matrix FD.
co-occurrence is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Tian, Zhenhua and Xiang, Hengheng and Liu, Ziqi and Zheng, Qinghua
RSP: A Random Walk Model for SP
We initiate the links E with the raw co-occurrence counts of seen predicate-argument pairs in a given generalization data.
RSP: A Random Walk Model for SP
But in SP, the preferences between the predicates and arguments are implicit: their co-occurrence counts follow the power law distribution and vary greatly.
RSP: A Random Walk Model for SP
investigate the correlations between the co-occurrence counts (CT) C(q, a), or smoothed counts with the human plausibility judgements (Lapata et al., 1999; Lapata et al., 2001).
Related Work 2.1 WordNet-based Approach
(1999) introduce a general similarity-based model for word co-occurrence probabilities, which can be interpreted for SP.
co-occurrence is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Pereira, Lis and Manguilimotan, Erlyn and Matsumoto, Yuji
Related Work
Table 2 Context of a particular noun represented as a co-occurrence vector
Related Work
Context is represented as co-occurrence vectors that are based on syntactic dependencies.
Related Work
Table 3 Context of a particular noun represented as a co-occurrence vector
co-occurrence is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Zhou, Guangyou and Zhao, Jun and Liu, Kang and Cai, Li
Related Work
Turney (2007) measured the semantic orientation for sentiment classification using co-occurrence statistics obtained from the search engines.
Related Work
Besides, there are some work exploring the word-to-word co-occurrence derived from the web-scale data or a fixed size of corpus (Calvo and Gelbukh, 2004; Calvo and Gelbukh, 2006; Yates et al., 2006; Drabek and Zhou, 2000; van Noord, 2007) for PP attachment ambiguities or shallow parsing.
Related Work
Abekawa and Oku-mura (2006) improved Japanese dependency parsing by using the co-occurrence information derived from the results of automatic dependency parsing of large-scale corpora.
Web-Derived Selectional Preference Features
Co-occurrence probabilities can be calculated directly from the N- gram counts.
Web-Derived Selectional Preference Features
where p(“X y”) is the co-occurrence probabilities.
co-occurrence is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Mitchell, Jeff and Lapata, Mirella and Demberg, Vera and Keller, Frank
Models of Processing Difficulty
To give a concrete example, Latent Semantic Analysis (LSA, Landauer and Dumais 1997) creates a meaning representation for words by constructing a word-document co-occurrence matrix from a large collection of documents.
Models of Processing Difficulty
Like LSA, ICD is based on word co-occurrence vectors, however it does not employ singular value decomposition, and constructs a word-word rather than a word-document co-occurrence matrix.
Models of Processing Difficulty
Importantly, composition models are not defined with a specific semantic space in mind, they could easily be adapted to LSA, or simple co-occurrence vectors, or more sophisticated semantic representations (e.g., Griffiths et al.
co-occurrence is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Guo, Weiwei and Diab, Mona
Evaluation for SS
After the SS model learns the co-occurrence of words from WN definitions, in the testing phase, given an ON definition d, the SS algorithm needs to identify the equivalent WN definitions by computing the similarity values between all WN definitions and the ON definition d, then sorting the values in decreasing order.
Experiments and Results
For the Brown corpus, each sentence is treated as a document in order to create more coherent co-occurrence values.
Introduction
2. word co-occurrence information is not sufficiently exploited.
Limitations of Topic Models and LSA for Modeling Sentences
LSA and PLSNLDA work on a word-sentence co-occurrence matrix.
Limitations of Topic Models and LSA for Modeling Sentences
The yielded M X N co-occurrence matrix X comprises the TF-IDF values in each X ij cell, namely that TF-IDF value of word w, in sentence 83-.
co-occurrence is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Lazaridou, Angeliki and Marelli, Marco and Zamparelli, Roberto and Baroni, Marco
Composition methods
Distributional semantic models (DSMs), also known as vector-space models, semantic spaces, or by the names of famous incarnations such as Latent Semantic Analysis or Topic Models, approximate the meaning of words with vectors that record their patterns of co-occurrence with corpus context features (often, other words).
Experimental setup
We collect co-occurrence statistics for the top 20K content words (adjectives, adverbs, nouns, verbs)
Experimental setup
Due to differences in co-occurrence weighting schemes (we use a logarithmically scaled measure, they do not), their multiplicative model is closer to our additive one.
Introduction
Distributional semantic models (DSMs) in particular represent the meaning of a word by a vector, the dimensions of which encode corpus-extracted co-occurrence statistics, under the assumption that words that are semantically similar will occur in similar contexts (Tumey and Pantel, 2010).
Introduction
Trying to represent the meaning of arbitrarily long constructions by directly collecting co-occurrence statistics is obviously ineffective and thus methods have been developed to derive the meaning of larger constructions as a function of the meaning of their constituents (Baroni and Zamparelli, 2010; Coecke et al., 2010; Mitchell and Lapata, 2008; Mitchell and Lapata, 2010; Socher et al., 2012).
co-occurrence is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Kim, Jungi and Li, Jin-Ji and Lee, Jong-Hyeok
Term Weighting and Sentiment Analysis
Statistical measures of associations between terms include estimations by the co-occurrence in the whole collection, such as Point-wise Mutual Information (PMI) and Latent Semantic Analysis (LSA).
Term Weighting and Sentiment Analysis
Another way is to use co-occurrence statistics
Term Weighting and Sentiment Analysis
where K is the maximum window size for the co-occurrence and is arbitrarily set to 3 in our experiments.
co-occurrence is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Zou, Bowei and Zhou, Guodong and Zhu, Qiaoming
Baselines
One is word co-occurrence (if word w and word wj occur in the same sentence or in the adjacent sentences, Sim(wi,wj) increases 1), and the other is WordNet (Miller, 1995) based similarity.
Baselines
1 Total weight of words in the focus candidate using the co-occurrence similarity.
Baselines
2 Max weight of words in the focus candidate using the co-occurrence similarity.
co-occurrence is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Xu, Liheng and Liu, Kang and Lai, Siwei and Zhao, Jun
Experiments
Afterwards, word-syntactic pattern co-occurrence statistic is used as feature for a semi-supervised classifier TSVM (J oachims, 1999) to further refine the results.
Experiments
In contrast, CONT exploits latent semantics of each word in context, and LEX takes advantage of word embedding, which is induced from global word co-occurrence statistic.
Introduction
0 It exploits semantic similarity between words to capture lexical clues, which is shown to be more effective than co-occurrence relation between words and syntactic patterns.
Related Work
A recent research (Xu et al., 2013) extracted infrequent product features by a semi-supervised classifier, which used word-syntactic pattern co-occurrence statistics as features for the classifier.
co-occurrence is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Nguyen, Thang and Hu, Yuening and Boyd-Graber, Jordan
Anchor Words: Scalable Topic Models
Rethinking Data: Word Co-occurrence Inference in topic models can be viewed as a black box: given a set of documents, discover the topics that best explain the data.
Anchor Words: Scalable Topic Models
The difference between anchor and conventional inference is that while conventional methods take a collection of documents as input, anchor takes word co-occurrence statistics.
Anchor Words: Scalable Topic Models
(3) The anchor method is fast, as it only depends on the size of the vocabulary once the co-occurrence statistics Q are obtained.
Introduction
This approach is fast and effective; because it only uses word co-occurrence information, it can scale to much larger datasets than MCMC or EM alternatives.
co-occurrence is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Razmara, Majid and Siahbani, Maryam and Haffari, Reza and Sarkar, Anoop
Collocational Lexicon Induction
A distributional profile (DP) of a word or phrase type is a co-occurrence vector created by combining all co-occurrence vectors of the tokens of that phrase type.
Collocational Lexicon Induction
These co-occurrence counts are converted to an association measure (Section 2.2) that encodes the relatedness of each pair of words or phrases.
Collocational Lexicon Induction
A(-, is an association measure and can simply be defined as co-occurrence counts within sliding windows.
Related work
They used a graph based on context similarity as well as co-occurrence graph in propagation process.
co-occurrence is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Bansal, Mohit and Burkett, David and de Melo, Gerard and Klein, Dan
Features
Presence and distance: For each potential edge (stump, we mine patterns from all abstracts in which the two terms co-occur in either order, allowing a maximum term distance of 20 (because beyond that, co-occurrence may not imply a relation).
Introduction
Our model is also the first to directly learn relational patterns as part of the process of training an end-to-end taxonomic induction system, rather than using patterns that were hand-selected or learned via pairwise classifiers on manually annotated co-occurrence patterns.
Related Work
Both of these systems use a process that starts by finding basic level terms (leaves of the final taxonomy tree, typically) and then using relational patterns (hand-selected ones in the case of Kozareva and Hovy (2010), and ones learned separately by a pairwise classifier on manually annotated co-occurrence patterns for Navigli and Velardi (2010), Navigli et al.
Related Work
Our model also automatically learns relational patterns as a part of the taxonomic training phase, instead of relying on handpicked rules or pairwise classifiers on manually annotated co-occurrence patterns, and it is the first end-to-end (i.e., non-incremental) system to include heterogeneous relational information via sibling (e.g., coordination) patterns.
co-occurrence is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Hasan, Kazi Saidul and Ng, Vincent
Analysis
As discussed before, the relationship between two candidates is traditionally established using co-occurrence information.
Analysis
However, using co-occurrence windows has its shortcomings.
Keyphrase Extraction Approaches
Researchers have computed relatedness between candidates using co-occurrence counts (Mihalcea and Tarau, 2004; Matsuo and Ishizuka, 2004) and semantic relatedness (Grineva et al., 2009), and represented the relatedness information collected from a document as a graph (Mihalcea and Tarau, 2004; Wan and Xiao, 2008a; Wan and Xiao, 2008b; Bougouin et al., 2013).
Keyphrase Extraction Approaches
Finally, an edge weight in a WW graph denotes the co-occurrence or knowledge-based similarity between the two connected words.
co-occurrence is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Li, Jianguo and Brew, Chris
Integration of Syntactic and Lexical Information
Co-occurrence (CO): CO features mostly convey lexical information only and are generally considered not particularly sensitive to argument structures (Rohde et al., 2004).
Integration of Syntactic and Lexical Information
Adapted co-occurrence (ACO): Conventional CO features generally adopt a stop list to filter out function words.
Results and Discussion
On the other hand, the co-occurrence feature (CO), which is believed to convey only lexical information, outperforms SCF on every n-way classification when n 2 10, suggesting that verbs in the same Levin classes tend to share their neighboring words.
Results and Discussion
In fact, even the simple co-occurrence feature (CO) yields a better performance (42.4%) than these Levin-selected SCF sets.
co-occurrence is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
McIntyre, Neil and Lapata, Mirella
Experimental Setup
The second one creates a story randomly without taking any co-occurrence frequency into account.
Introduction
Our generator operates over predicate-argument and predicate-predicate co-occurrence statistics gathered from corpora.
Story Ranking
As explained earlier, our generator produces stories stochastically, by relying on co-occurrence frequencies collected from the training corpus.
The Story Generator
A fragment of the action graph is shown in Figure 3 (for simplicity, the edges in the example are weighted with co-occurrence frequencies).
co-occurrence is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Lazaridou, Angeliki and Bruni, Elia and Baroni, Marco
Experimental Setup
We apply Local Mutual Information (LMI, (Evert, 2005)) as weighting scheme and reduce the full co-occurrence space to 300 dimensions using the Singular Value Decomposition.
Experimental Setup
For constructing the text-based vectors, we follow a standard pipeline in distributional semantics (Turney and Pantel, 2010) without tuning its parameters and collect co-occurrence statistics from the concatenation of ukWaC4 and the Wikipedia, amounting to 2.7 billion tokens in total.
Introduction
However, the models induce the meaning of words entirely from their co-occurrence with other words, without links to the external world.
co-occurrence is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Cao, Guihong and Robertson, Stephen and Nie, Jian-Yun
Regression Model for Alteration Selection
For example, for the query “controlling acid rain”, the coherence of the alteration “acidic” is measured by the logarithm of its co-occurrence with the other query terms within a predefined window (90 words) in the corpus.
Regression Model for Alteration Selection
where P(controlling...acidic...rain|window) is the co-occurrence probability of the trigram containing acidic within a predefined window (50 words).
Regression Model for Alteration Selection
On the other hand, the second feature helps because it can capture some co-occurrence information no matter how long the query is.
co-occurrence is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Yao, Xuchen and Van Durme, Benjamin
Relation Mapping
Treating the aligned pairs as observation, the co-occurrence matrix between aligning relations and words was computed.
Relation Mapping
5. lirom the co-occurrence matrix ~we computed P(w | R), P(R), P(w | 7“) and PO“).
Relation Mapping
ReverbMapping does the same, except that we took a uniform distribution on 15(w | R) and 15(R) since the contributed dataset did not include co-occurrence counts to estimate these probabilities.7 Note that the median rank from CluewebMapping is only 12, indicating that half of all answer relations are ranked in the top 12.
co-occurrence is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Mitchell, Jeff and Lapata, Mirella
Composition Models
We formulate semantic composition as a function of two vectors, u and v. We assume that individual words are represented by vectors acquired from a corpus following any of the parametrisa-tions that have been suggested in the literature.1 We briefly note here that a word’s vector typically represents its co-occurrence with neighboring words.
Composition Models
The construction of the semantic space depends on the definition of linguistic context (e.g., neighbour-ing words can be documents or collocations), the number of components used (e.g., the k most frequent words in a corpus), and their values (e.g., as raw co-occurrence frequencies or ratios of probabilities).
Composition Models
Here, the space has only five dimensions, and the matrix cells denote the co-occurrence of the target words (horse and run) with the context words animal, stable, and so on.
co-occurrence is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Titov, Ivan and McDonald, Ryan
Experiments
The first factor, 77%,” expresses a preference for topics likely from the co-occurrence information, whereas the second one, pig, favors the choice of topics which are predictive of the observable sentiment ratings.
Introduction
First, ratable aspects normally represent coherent topics which can be potentially discovered from co-occurrence information in the text.
The Model
Importantly, the fact that windows overlap permits the model to exploit a larger co-occurrence domain.
co-occurrence is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Perek, Florent
Distributional measure of semantic similarity
A wide range of distributional information can be employed in vector-based models; the present study uses the ‘bag of words’ approach, which is based on the frequency of co-occurrence of words within a given context window.
Distributional measure of semantic similarity
The part-of-speech annotated lemma of each collocate within a 5-word window was extracted from the COCA data to build the co-occurrence matrix recording the frequency of co-occurrence of each verb with its collocates.
Distributional measure of semantic similarity
The co-occurrence matrix was transformed by applying a Point-wise Mutual Information weighting scheme, using the DISSECT toolkit (Dinu et al., 2013), to turn the raw frequencies into weights that reflect how distinctive a collocate is for a given target word with respect to the other target words under consideration.
co-occurrence is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Huang, Jian and Taylor, Sarah M. and Smith, Jonathan L. and Fotiadis, Konstantinos A. and Giles, C. Lee
Experiments
To decide whether two names in the co-occurrence or family relationship match, we use the SoftTFIDF measure (Cohen et al., 2003), which is a hybrid matching scheme that combines the token-based TFIDF with the Jam-Winkler string distance metric.
Methods 2.1 Document Level and Profile Based CDC
For instance, the similarity between the occupations ‘President’ and ‘Commander in Chief’ can be computed using the JC semantic distance (J iang and Conrath, 1997) with WordNet; the similarity of co-occurrence with other people can be measured by the J accard coefficient.
Methods 2.1 Document Level and Profile Based CDC
a match in a family relationship is considered more important than in a co-occurrence relationship.
co-occurrence is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Mitra, Sunny and Mitra, Ritwik and Riedl, Martin and Biemann, Chris and Mukherjee, Animesh and Goyal, Pawan
Introduction
In Section 3 we briefly describe the datasets and outline the process of co-occurrence graph construction.
Tracking sense changes
We use the co-occurrence based graph clustering framework introduced in (Biemann, 2006).
Tracking sense changes
Firstly, a co-occurrence graph is created for every target word found in DT.
co-occurrence is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Ó Séaghdha, Diarmuid
Experimental setup
Following their description, we use a 2,000-dimensional space of syntactic co-occurrence features appropriate to the relation being predicted, weight features with the G2 transformation and compute similarity with the cosine measure.
Results
30 predicates were selected for each relation; each predicate was matched with three arguments from different co-occurrence bands in the BNC, e.g., naughty-girl (high frequency), naughty-dog (medium) and naughty-lunch (low).
Three selectional preference models
Further differences are that information about predicate-argument co-occurrence is only shared within a given interaction class rather than across the whole dataset and that the distribution (Dz is not specific to the predicate 2) but rather to the relation 7“.
co-occurrence is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Huang, Hongzhao and Wen, Zhen and Yu, Dian and Ji, Heng and Sun, Yizhou and Han, Jiawei and Li, He
Experiments
Retweets and redundant web documents are filtered to ensure more reliable frequency counting of co-occurrence relations.
Introduction
Thus, the co-occurrence of a morph and its target is quite low in the vast amount of information in social media.
Target Candidate Ranking
After applying the same annotation techniques as tweets for uncensored data sets, sentence-level co-occurrence relations are extracted and integrated into the network as shown in Figure 3.
co-occurrence is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Hovy, Dirk
Abstract
To constrain training, it extracts co-occurrence dictionaries of entities and common nouns from the data.
Introduction
(2011) proposed an approach that uses co-occurrence patterns to find entity type candidates, and then learns their applicability to relation arguments by using them as latent variables in a first-order HMM.
Model
To restrict the search space and improve learning, we first have to learn which types modify entities and record their co-occurrence , and use this as dictionary.
co-occurrence is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Pantel, Patrick and Fuxman, Ariel
Entity Recommendation
Step 2 — Session Analysis: We build a query-entity frequency co-occurrence matrix, A, consisting of lel rows and nIEl columns, where each row corresponds to a query and each column to an entity.
Experimental Results
3'Note that this co-occurrence occurs because q’ was annotated with entity 6 in the same session as q occurred.
Experimental Results
5.3.1 Experimental Setup We instantiate our recommendation algorithm from Section 4.2 using session co-occurrence frequencies
co-occurrence is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Fyshe, Alona and Talukdar, Partha P. and Murphy, Brian and Mitchell, Tom M.
Data
Document statistics are word-document co-occurrence counts.
Introduction
The basic assumption is that semantics drives a person’s language production behavior, and as a result co-occurrence patterns in written text indirectly encode word meaning.
Introduction
The raw co-occurrence statistics are unwieldy, but in the compressed
co-occurrence is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Chen, David
Online Lexicon Learning Algorithm
10: for connected subgraph g of p; such that the size of g is less than or equal to m do 11: Increase the co-occurrence count of g and w by l 12: end for 13: end for
Online Lexicon Learning Algorithm
From the corresponding navigation plan, we find all connected subgraphs of size less than or equal to m. We then update the co-occurrence counts between all the n-grams w and all the connected subgraphs 9.
Online Lexicon Learning Algorithm
This allows us to determine if two graphs are identical in constant time and also lets us use a hash table to quickly update the co-occurrence and subgraph counts.
co-occurrence is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Kuznetsova, Polina and Ordonez, Vicente and Berg, Alexander and Berg, Tamara and Choi, Yejin
Surface Realization
NPMI(ngr) = (21) Where NPMI is the normalized point-wise mutual information.4 Co—occurrence Cohesion Score: To capture long-distance cohesion, we introduce a co-occurrence-based score, which measures order-preserved co-occurrence statistics between the head words hsij and hqu 5.
Surface Realization
co-occurrence cohesion is computed as:
Surface Realization
Final Cohesion Score: Finally, the pairwise phrase cohesion score 177;qu is a weighted sum of n-gram and co-occurrence cohesion scores: NGRAM CO a ° Fsiqu + fl ° Faiqu (23) a + fl
co-occurrence is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Weller, Marion and Fraser, Alexander and Schulte im Walde, Sabine
Using subcategorization information
In contrast, noun modification in noun-nounGen construction is represented by co-occurrence frequencies.7
Using subcategorization information
andalsof =0: this representation allows for a more fine- grained distinction in the low-to-mid frequency range, providing a good basis for the decision of whether a given noun-noun pair is a true noun-nounaen structure or just a random co-occurrence of two nouns.
Using subcategorization information
The word Technologie (technology) has been marked as a candidate for a genitive in a noun-nounGen constructions; the co-occurrence frequency of the tuple Einfi‘ihrung-Technologie (introduction - technology) lies in the bucket 11. .
co-occurrence is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Mukherjee, Arjun and Liu, Bing
Experiments
Thus, having only one seed per seed set will result in sampling that single word whenever that seed set is chosen which will not have the effect of correlating seed words so as to pull other words based on co-occurrence with constrained seed words.
Proposed Seeded Models
The standard LDA and existing aspect and sentiment models (ASMs) are mostly governed by the phenomenon called “higher-order co-occurrence” (Heinrich, 2009), i.e., based on how often terms co-occur in different contexts].
Proposed Seeded Models
W1 co-occumng With W2 Wthh in turn co-occurs With W3 denotes a second-order co-occurrence between W1 and W3.
co-occurrence is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Xiao, Xinyan and Xiong, Deyi and Zhang, Min and Liu, Qun and Lin, Shouxun
Estimation
In the first step, we estimate the correspondence probability by the co-occurrence of the source-side and the target-side topic assignment of the word-aligned corpus.
Estimation
Thus, the co-occurrence of a source-side topic with index kf and a target-side
Estimation
We then compute the probability of P(z = kf|z 2 Ice) by normalizing the co-occurrence count.
co-occurrence is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: