Abstract | In this paper, we explore the use of distance and co-occurrence information of word—pairs for language modeling. |
Introduction | the distance is described regardless the actual frequency of the history-word, while the co-occurrence is described regardless the actual position of the history-word. |
Language Modeling with TD and TO | In Eq.3, we have decoupled the observation of a word-pair into the events of distance and co-occurrence . |
Language Modeling with TD and TO | The TD likelihood for a distance k given the co-occurrence of the word-pair (wi_k, wt) can be estimated from counts as follows: |
Language Modeling with TD and TO | zero co-occurrence C(wi_k E h,t = w,) = O, which results in a division by zero. |
Motivation of the Proposed Approach | The attributes of distance and co-occurrence are exploited and modeled differently in each language modeling approach. |
Motivation of the Proposed Approach | Both, the conventional trigger model and the latent-semantic model capture the co-occurrence information while ignoring the distance information. |
Motivation of the Proposed Approach | On the other hand, distant-bigram models and distance-dependent trigger models make use of both, distance and co-occurrence , information up to window sizes of ten to twenty. |
Experiments | In this section, we compare the translation accuracies for character-based translation using the phrasal ITG model with and without the proposed improvements of substring co-occurrence priors and lookahead parsing as described in Sections 4 and 5.2. |
Experiments | Table 5: METEOR scores for alignment with and without lookahead and co-occurrence priors. |
Introduction | This method is attractive, as it is theoretically able to handle all sparsity phenomena in a single unified framework, but has only been shown feasible between similar language pairs such as Spanish-Catalan (Vilar et al., 2007), Swedish-Norwegian (Tiedemann, 2009), and Thai-Lao (Somlertlamvanich et al., 2008), which have a strong co-occurrence between single characters. |
Substring Prior Probabilities | In this section, we overview an existing method used to calculate these prior probabilities, and also propose a new way to calculate priors based on substring co-occurrence statistics. |
Substring Prior Probabilities | 5.2 Substring Co-occurrence Priors |
Substring Prior Probabilities | Instead, we propose a method for using raw substring co-occurrence statistics to bias alignments towards substrings that often co-occur in the entire training corpus. |
Bilingual Lexicon Extraction | For each word 2' of the source and the target languages, we obtain a context vector U;- which gathers the set of co-occurrence words j associated with the number of times that j and 2' occur together 0000(2', j). |
Bilingual Lexicon Extraction | One way to deal with this problem is to reestimate co-occurrence counts by a prediction function (Hazem and Morin, 2013). |
Bilingual Lexicon Extraction | This consists in assigning to each observed co-occurrence count of a small comparable corpora, a new value learned beforehand from a large training corpus. |
Experiments and Results | The aim of this experiment is twofold: first, we want to evaluate the usefulness of predicting word co-occurrence counts and second, we want to find out whether it is more appropriate to apply prediction to the source side, the target side or both sides of the bilingual comparable corpora. |
Experiments and Results | We applied the same regression function to all co-occurrence counts while learning models for low and high frequencies should have been more appropriate. |
Image Clustering with Annotated Auxiliary Data | Intuitively, our algorithm aPLSA performs PLSA analysis on the target images, which are converted to an image instance-to-feature co-occurrence matrix. |
Image Clustering with Annotated Auxiliary Data | At the same time, PLSA is also applied to the annotated image data from social Web, which is converted into a text-to-image-feature co-occurrence matrix. |
Image Clustering with Annotated Auxiliary Data | Based on the image data set V, we can estimate an image instance-to-feature co-occurrence matrix AMXIJEl 6 WWW, where each element Ala-(1 g i g |V| and l g j g |.7-"|) in the matrix A is the frequency of the feature fj appearing in the instance 21,-. |
Experiments | Co-occurrence 0.47 0.56 0.45 0.41 0.41 |
Experiments | Co-occurrence 0.34 0.36 0.34 0.31 0.31 |
Experiments | Both co-occurrence and lexico-syntactic patterns work well for all three types of relations. |
Introduction | The common types of features include contextual (Lin, 1998), co-occurrence (Yang and Callan, 2008), and syntactic dependency (Pantel and Lin, 2002; Pantel and Ravichandran, 2004). |
Introduction | The framework integrates contextual, co-occurrence , syntactic dependency, lexi-cal-syntactic patterns, and other features to learn an ontology metric, a score indicating semantic distance, for each pair of terms in a taxonomy; it then incrementally clusters terms based on their ontology metric scores. |
Related Work | Inspired by the conjunction and appositive structures, Riloff and Shepherd (1997), Roark and Charniak (1998) used co-occurrence statistics in local context to discover sibling relations. |
Related Work | Besides contextual features, the vectors can also be represented by verb-noun relations (Pereira et al., 1993), syntactic dependency (Pantel and Ravichandran, 2004; Snow et al., 2005), co-occurrence (Yang and Callan, 2008), conjunction and appositive features (Caraballo, 1999). |
The Features | The features include contextual, co-occurrence , syntactic dependency, lexical-syntactic patterns, and miscellaneous. |
The Features | The second set of features is co-occurrence . |
The Features | In our work, co-occurrence is measured by point-wise mutual information between two terms: |
Distributional semantic models | In all cases, occurrence and co-occurrence statistics are extracted from the freely available ukWaC and Wackypedia corpora combined (size: 1.9B and 820M tokens, respectively).1 Moreover, in all models the raw co-occurrence counts are transformed into nonnegative Local Mutual Information (LMI) scores.2 Finally, in all models we harvest vector representations for the same words (lemmas), namely the top 20K most frequent nouns, 5K most frequent adjectives and 5K most frequent verbs in the combined corpora (for coherence with the vision-based models, that cannot exploit contextual information to distinguish nouns and adjectives, we merge nominal and adjectival usages of the color adjectives in the text-based models as well). |
Distributional semantic models | Window2 records sentence-internal co-occurrence with the nearest 2 content words to the left and right of each target concept, a narrow context definition expected to capture taxonomic relations. |
Distributional semantic models | We further introduce hybrid models that exploit the patterns of co-occurrence of words as tags of the same images. |
Experiment 2 | 11We also experimented with a model based on direct co-occurrence of adjectives and nouns, obtaining promising results in a preliminary version of Exp. |
Experiment 2 | our results suggest that co-occurrence in an image label can be used as a surrogate of true visual information to some extent, but the behavior of hybrid models depends on ad-hoc aspects of the labeled dataset, and, from an empirical perspective, they are more limited than truly multimodal models, because they require large amounts of rich verbal picture descriptions to reach good coverage. |
Introduction | Traditional semantic space models represent meaning on the basis of word co-occurrence statistics in large text corpora (Tumey and Pantel, 2010). |
Introduction | We also show that “hybrid” models exploiting the patterns of co-occurrence of words as tags of the same images can be a powerful surrogate of visual information under certain circumstances. |
Related work | Like us, Ozbal and colleagues use both a textual model and a visual model (as well as Google adjective-noun co-occurrence counts) to find the typical color of an object. |
Abstract | Specifically, we exploit short-distance cues to hypernymy, semantic compatibility, and semantic context, as well as general lexical co-occurrence . |
Introduction | For example, we can collect the co-occurrence statistics of an anaphor with various candidate antecedents to judge relative surface affinities (i.e., (Obama, president) versus (Jobs, president)). |
Introduction | We can also count co-occurrence statistics of competing antecedents when placed in the context of an anaphoric pronoun (i.e., Obama ’s election campaign versus Jobs’ election campaign). |
Introduction | We explore five major categories of semantically informative Web features, based on (1) general lexical affinities (via generic co-occurrence statistics), (2) lexical relations (via Hearst-style hypernymy patterns), (3) similarity of entity-based context (e.g., common values of y for |
Semantics via Web Features | The first four types are most intuitive for mention pairs where both members are non-pronominal, but, aside from the general co-occurrence group, helped for all mention pair types. |
Semantics via Web Features | 3.1 General co-occurrence |
Semantics via Web Features | These features capture co-occurrence statistics of the two headwords, i.e., how often hl and hg are seen adjacent or nearly adjacent on the Web. |
Abstract | Instead of taking a broad view of topic context in spoken documents, variability of word co-occurrence statistics across corpora leads us to focus instead the on phenomenon of word repetition within single documents. |
Introduction | In order to arrive at our eventual solution, we take the BABEL Tagalog corpus and analyze word co-occurrence and repetition statistics in detail. |
Introduction | Our observation of the variability in co-occurrence statistics between Tagalog training and development partitions leads us to narrow the scope of document context to same word co-occurrences, i.e. |
Motivation | Given the rise of unsupervised latent topic modeling with Latent Dirchlet Allocation (Blei et al., 2003) and similar latent variable approaches for discovering meaningful word co-occurrence patterns in large text corpora, we ought to be able to leverage these topic contexts instead of merely N-grams. |
Motivation | The difficulty in this approach arises from the variability in word co-occurrence statistics. |
Motivation | Unfortunately, estimates of co-occurrence from small corpora are not very consistent, and often over- or underestimate concurrence probabilities needed for term detection. |
Term Detection Re-scoring | As with word co-occurrence , we consider if estimates of Padapt(w) from training data are consistent when estimated on development data. |
Algorithm | where Pr(w1, 2122) is the co-occurrence count, and Pr(wi) is the total number of appearance of w,-in the corpus (Church and Hanks, 1990). |
Conclusion | Our results confirm that alignment is problematic in using co-occurrence methods across languages, at least in our settings. |
Introduction | While co-occurrence scores are used to compute signatures, signatures, unlike context vectors, do not contain the score values. |
Lexicon Generation Experiments | In the case of context vectors, the vector indices, or keys, are words, and their values are co-occurrence based scores. |
Lexicon Generation Experiments | The window size for co-occurrence counting, k, was 4. |
Lexicon Generation Experiments | In the three co-occurrence based methods, NAS similarity, cosine distance and and city block distance, the highest ranking translation was selected. |
Previous Work | (2009) replaced the traditional window-based co-occurrence counting with dependency-tree based counting, while Pekar et al. |
Previous Work | (2006) predicted missing co-occurrence values based on similar words in the same language. |
Experiment | By this, constructing a graph based on word co-occurrence of each 3 sentences in a document works well to rank important words, taking account of the context of the word. |
Introduction | In our study, we express the relation of word co-occurrence in the form of a graph. |
Related studies | (2011) has detected topics in a document by constructing a graph of word co-occurrence and applied the PageRank algorithm on it. |
Related studies | The graph used in our method is constructed based on word co-occurrence so that important words which are sensitive to latent information can be extracted by the PageRank algorithm. |
Techniques for text classification | According to (Newman et al., 2010), topic coherence is related to word co-occurrence . |
Techniques for text classification | The refined documents are composed of the important sentences extracted from a viewpoint of latent information, i.e., word co-occurrence , so they are proper to be classified based on latent information. |
Techniques for text classification | In our study, we construct a graph based on word co-occurrence . |
Experimental setup | Co-occurrence features (C) |
Experimental setup | Co-occurrence and IR effectiveness prediction features (CI) was the most influential class, and accounted for 70% of all features in the model. |
Related work | In ad hoc IR, most models of term dependence use word co-occurrence and proximity (Song and Croft, 1999; Metzler and Croft, 2005; Srikanth and Srihari, 2002; van Rijsbergen, 1993). |
Selection method for catenae | Co-occurrence features: A governor wl tends to subcategorize for its dependents wn. |
Selection method for catenae | We conclude that co-occurrence is an important feature of dependency relations (Mel’cuk, 2003). |
Selection method for catenae | In addition, term frequencies and inverse document frequencies calculated using word co-occurrence measures are commonly used in IR. |
Application to Essay Scoring | 8It is also possible that some of the instances with very high PMI are pairs that contain low frequency words for which the database predicts a spuriously high PMI based on a single (and atypical) co-occurrence that happens to repeat in an essay — similar to the Schwartz eschews example in (Manning and Schiitze, 1999, Table 5.16, p. 181). |
Introduction | Thus, co-occurrence of words in n-word windows, syntactic structures, sentences, paragraphs, and even whole documents is captured in vector-space models built from text corpora (Turney and Pan-tel, 2010; Basili and Pennacchiotti, 2010; Erk and Pado, 2008; Mitchell and Lapata, 2008; Bullinaria and Levy, 2007; Jones and Mewhort, 2007; Pado and Lapata, 2007; Lin, 1998; Landauer and Dumais, 1997; Lund and Burgess, 1996; Salton et al., 1975). |
Introduction | However, little is known about typical profiles of texts in terms of co-occurrence behavior of their words. |
Introduction | The cited approaches use topic models that are in turn estimated using word co-occurrence . |
Methodology | The first decision is how to quantify the extent of co-occurrence between two words; we will use point-wise mutual information (PMI) estimated from a large and diverse corpus of texts. |
Methodology | The third decision is how to represent the co-occurrence profiles; we use a histogram where each bin represents the proportion of word pairs in the given interval of PMI values. |
Methodology | To obtain comprehensive information about typical co-occurrence behavior of words of English, we build a first-order co-occurrence word-space model (Turney and Pantel, 2010; Baroni and Lenci, 2010). |
Experiments | There were three knowledge sources we used for our experiments: the WordNet 3.0; the Sep. 9, 2007 English version of Wikipedia; and the Web pages of each ambiguous name in WePS datasets as the NE Co-occurrence Corpus. |
Introduction | This model measures similarity based on only the co-occurrence statistics of terms, without considering all the semantic relations like social relatedness between named entities, associative relatedness between concepts, and lexical relatedness (e.g., acronyms, synonyms) between key terms. |
Related Work | (2007) used the co-occurrence statistics between named entities in the Web. |
The Structural Semantic Relatedness Measure | We extract three types of semantic relations (semantic relatedness between Wikipedia concepts, lexical relatedness between WordNet concepts and social relatedness between NEs) correspondingly from three knowledge sources: Wikipedia, WordNet and NE Co-occurrence Corpus. |
The Structural Semantic Relatedness Measure | NE Co-occurrence Corpus, a corpus of documents for capturing the social relatedness between named entities. |
The Structural Semantic Relatedness Measure | According to the fuzzy set theory (Baeza-Yates et al., 1999), the degree of named entities co-occurrence in a corpus is a measure of the relatedness between them. |
Experiments | The spin model approach uses word glosses, WordNet synonym, hypernym, and antonym relations, in addition to co-occurrence statistics extracted from corpus. |
Experiments | Adding co-occurrence statistics slightly improved performance, while using glosses did not help at all. |
Experiments | No glosses or co-occurrence statistics are used. |
Related Work | To get co-occurrence statistics, they submit several queries to a search engine. |
Related Work | They construct a network of words using gloss definitions, thesaurus, and co-occurrence statistics. |
Word Polarity | Another source of links between words is co-occurrence statistics from corpus. |
Word Polarity | We study the effect of using co-occurrence statistics to connect words later at the end of our experiments. |
Experiments: Ranking Paraphrases | As for the full model, we use pmi values rather than raw frequency counts as co-occurrence statistics. |
Introduction | In the standard approach, word meaning is represented by feature vectors, with large sets of context words as dimensions, and their co-occurrence frequencies as values. |
Introduction | This allows us to model the semantic interaction between the meaning of a head word and its dependent at the microlevel of relation-specific co-occurrence frequencies. |
Related Work | Figure l: Co-occurrence graph of a small sample corpus of dependency trees. |
The model | The basis for the construction of both kinds of vector representations are co-occurrence graphs. |
The model | Figure 1 shows the co-occurrence graph of a small sample corpus of dependency trees: Words are represented as nodes in the graph, possible dependency relations between them are drawn as labeled edges, with weights corresponding to the observed frequencies. |
The model | introduce another kind of vectors capturing infor-mations about all words that can be reached with two steps in the co-occurrence graph. |
The S-Space Framework | We divide the algorithms into four categories based on their structural similarity: document-based, co-occurrence , approximation, and Word Sense Induction (WSI) models. |
The S-Space Framework | Co-occurrence models build the vector space using the distribution of co-occurring words in a context, which is typically defined as a region around a word or paths rooted in a parse tree. |
The S-Space Framework | co-occurrence data rather than model it explicitly in order to achieve better scalability for larger data sets. |
Word Space Models | Later models have expanded the notion of co-occurrence but retain the premise that distributional similarity can be used to extract meaningful relationships between words. |
Word Space Models | Common approaches use a lexical distance, syntactic relation, or document co-occurrence to define the context. |
Word Space Models | Co-occurrence Models HAL (Burgess and Lund, 1997) COALS (Rohde et a1., 2009) |
Experiments | But they captured relations only using co-occurrence statistics. |
Experiments | Second, our method captures semantic relations using topic modeling and captures opinion relations through word alignments, which are more precise than Hai which merely uses co-occurrence information to indicate such relations among words. |
Introduction | In traditional extraction strategy, opinion associations are usually computed based on the co-occurrence frequency. |
Related Work | They usually captured different relations using co-occurrence information. |
The Proposed Method | Each opinion target can find its corresponding modifiers in sentences through alignment, in which multiple factors are considered globally, such as co-occurrence information, word position in sentence, etc. |
The Proposed Method | p(vt, 210) is the co-occurrence probability of v75 and 110 based on the opinion relation identification results. |
Related Work | Moreover, the HMM model is computationally-expensive and unable to exploit the data co-occurrence phenomena that we |
Unsupervised Translation Induction for Chinese Abbreviations | 3.3.1 Data Co-occurrence |
Unsupervised Translation Induction for Chinese Abbreviations | In a monolingual corpus, relevant words tend to appear together (i.e., co-occurrence ). |
Unsupervised Translation Induction for Chinese Abbreviations | The co-occurrence may imply a relationship (e.g., Bill Gates is the founder of Microsoft). |
Distribution Prediction | main, we construct a feature co-occurrence matrix A in which columns correspond to unigram features and rows correspond to either unigram or bigram features. |
Distribution Prediction | The value of the element aij in the co-occurrence matrix A is set to the number of sentences in which the i-th and j-th features co-occur. |
Distribution Prediction | We apply Positive Pointwise Mutual Information (PPMI) to the co-occurrence matrix A. |
Experiments and Results | For each domain D in the SANCL (POS tagging) and Amazon review (sentiment classification) datasets, we create a PPMI weighted co-occurrence matrix FD. |
RSP: A Random Walk Model for SP | We initiate the links E with the raw co-occurrence counts of seen predicate-argument pairs in a given generalization data. |
RSP: A Random Walk Model for SP | But in SP, the preferences between the predicates and arguments are implicit: their co-occurrence counts follow the power law distribution and vary greatly. |
RSP: A Random Walk Model for SP | investigate the correlations between the co-occurrence counts (CT) C(q, a), or smoothed counts with the human plausibility judgements (Lapata et al., 1999; Lapata et al., 2001). |
Related Work 2.1 WordNet-based Approach | (1999) introduce a general similarity-based model for word co-occurrence probabilities, which can be interpreted for SP. |
Related Work | Table 2 Context of a particular noun represented as a co-occurrence vector |
Related Work | Context is represented as co-occurrence vectors that are based on syntactic dependencies. |
Related Work | Table 3 Context of a particular noun represented as a co-occurrence vector |
Related Work | Turney (2007) measured the semantic orientation for sentiment classification using co-occurrence statistics obtained from the search engines. |
Related Work | Besides, there are some work exploring the word-to-word co-occurrence derived from the web-scale data or a fixed size of corpus (Calvo and Gelbukh, 2004; Calvo and Gelbukh, 2006; Yates et al., 2006; Drabek and Zhou, 2000; van Noord, 2007) for PP attachment ambiguities or shallow parsing. |
Related Work | Abekawa and Oku-mura (2006) improved Japanese dependency parsing by using the co-occurrence information derived from the results of automatic dependency parsing of large-scale corpora. |
Web-Derived Selectional Preference Features | Co-occurrence probabilities can be calculated directly from the N- gram counts. |
Web-Derived Selectional Preference Features | where p(“X y”) is the co-occurrence probabilities. |
Models of Processing Difficulty | To give a concrete example, Latent Semantic Analysis (LSA, Landauer and Dumais 1997) creates a meaning representation for words by constructing a word-document co-occurrence matrix from a large collection of documents. |
Models of Processing Difficulty | Like LSA, ICD is based on word co-occurrence vectors, however it does not employ singular value decomposition, and constructs a word-word rather than a word-document co-occurrence matrix. |
Models of Processing Difficulty | Importantly, composition models are not defined with a specific semantic space in mind, they could easily be adapted to LSA, or simple co-occurrence vectors, or more sophisticated semantic representations (e.g., Griffiths et al. |
Evaluation for SS | After the SS model learns the co-occurrence of words from WN definitions, in the testing phase, given an ON definition d, the SS algorithm needs to identify the equivalent WN definitions by computing the similarity values between all WN definitions and the ON definition d, then sorting the values in decreasing order. |
Experiments and Results | For the Brown corpus, each sentence is treated as a document in order to create more coherent co-occurrence values. |
Introduction | 2. word co-occurrence information is not sufficiently exploited. |
Limitations of Topic Models and LSA for Modeling Sentences | LSA and PLSNLDA work on a word-sentence co-occurrence matrix. |
Limitations of Topic Models and LSA for Modeling Sentences | The yielded M X N co-occurrence matrix X comprises the TF-IDF values in each X ij cell, namely that TF-IDF value of word w, in sentence 83-. |
Composition methods | Distributional semantic models (DSMs), also known as vector-space models, semantic spaces, or by the names of famous incarnations such as Latent Semantic Analysis or Topic Models, approximate the meaning of words with vectors that record their patterns of co-occurrence with corpus context features (often, other words). |
Experimental setup | We collect co-occurrence statistics for the top 20K content words (adjectives, adverbs, nouns, verbs) |
Experimental setup | Due to differences in co-occurrence weighting schemes (we use a logarithmically scaled measure, they do not), their multiplicative model is closer to our additive one. |
Introduction | Distributional semantic models (DSMs) in particular represent the meaning of a word by a vector, the dimensions of which encode corpus-extracted co-occurrence statistics, under the assumption that words that are semantically similar will occur in similar contexts (Tumey and Pantel, 2010). |
Introduction | Trying to represent the meaning of arbitrarily long constructions by directly collecting co-occurrence statistics is obviously ineffective and thus methods have been developed to derive the meaning of larger constructions as a function of the meaning of their constituents (Baroni and Zamparelli, 2010; Coecke et al., 2010; Mitchell and Lapata, 2008; Mitchell and Lapata, 2010; Socher et al., 2012). |
Term Weighting and Sentiment Analysis | Statistical measures of associations between terms include estimations by the co-occurrence in the whole collection, such as Point-wise Mutual Information (PMI) and Latent Semantic Analysis (LSA). |
Term Weighting and Sentiment Analysis | Another way is to use co-occurrence statistics |
Term Weighting and Sentiment Analysis | where K is the maximum window size for the co-occurrence and is arbitrarily set to 3 in our experiments. |
Baselines | One is word co-occurrence (if word w and word wj occur in the same sentence or in the adjacent sentences, Sim(wi,wj) increases 1), and the other is WordNet (Miller, 1995) based similarity. |
Baselines | 1 Total weight of words in the focus candidate using the co-occurrence similarity. |
Baselines | 2 Max weight of words in the focus candidate using the co-occurrence similarity. |
Experiments | Afterwards, word-syntactic pattern co-occurrence statistic is used as feature for a semi-supervised classifier TSVM (J oachims, 1999) to further refine the results. |
Experiments | In contrast, CONT exploits latent semantics of each word in context, and LEX takes advantage of word embedding, which is induced from global word co-occurrence statistic. |
Introduction | 0 It exploits semantic similarity between words to capture lexical clues, which is shown to be more effective than co-occurrence relation between words and syntactic patterns. |
Related Work | A recent research (Xu et al., 2013) extracted infrequent product features by a semi-supervised classifier, which used word-syntactic pattern co-occurrence statistics as features for the classifier. |
Anchor Words: Scalable Topic Models | Rethinking Data: Word Co-occurrence Inference in topic models can be viewed as a black box: given a set of documents, discover the topics that best explain the data. |
Anchor Words: Scalable Topic Models | The difference between anchor and conventional inference is that while conventional methods take a collection of documents as input, anchor takes word co-occurrence statistics. |
Anchor Words: Scalable Topic Models | (3) The anchor method is fast, as it only depends on the size of the vocabulary once the co-occurrence statistics Q are obtained. |
Introduction | This approach is fast and effective; because it only uses word co-occurrence information, it can scale to much larger datasets than MCMC or EM alternatives. |
Collocational Lexicon Induction | A distributional profile (DP) of a word or phrase type is a co-occurrence vector created by combining all co-occurrence vectors of the tokens of that phrase type. |
Collocational Lexicon Induction | These co-occurrence counts are converted to an association measure (Section 2.2) that encodes the relatedness of each pair of words or phrases. |
Collocational Lexicon Induction | A(-, is an association measure and can simply be defined as co-occurrence counts within sliding windows. |
Related work | They used a graph based on context similarity as well as co-occurrence graph in propagation process. |
Features | Presence and distance: For each potential edge (stump, we mine patterns from all abstracts in which the two terms co-occur in either order, allowing a maximum term distance of 20 (because beyond that, co-occurrence may not imply a relation). |
Introduction | Our model is also the first to directly learn relational patterns as part of the process of training an end-to-end taxonomic induction system, rather than using patterns that were hand-selected or learned via pairwise classifiers on manually annotated co-occurrence patterns. |
Related Work | Both of these systems use a process that starts by finding basic level terms (leaves of the final taxonomy tree, typically) and then using relational patterns (hand-selected ones in the case of Kozareva and Hovy (2010), and ones learned separately by a pairwise classifier on manually annotated co-occurrence patterns for Navigli and Velardi (2010), Navigli et al. |
Related Work | Our model also automatically learns relational patterns as a part of the taxonomic training phase, instead of relying on handpicked rules or pairwise classifiers on manually annotated co-occurrence patterns, and it is the first end-to-end (i.e., non-incremental) system to include heterogeneous relational information via sibling (e.g., coordination) patterns. |
Analysis | As discussed before, the relationship between two candidates is traditionally established using co-occurrence information. |
Analysis | However, using co-occurrence windows has its shortcomings. |
Keyphrase Extraction Approaches | Researchers have computed relatedness between candidates using co-occurrence counts (Mihalcea and Tarau, 2004; Matsuo and Ishizuka, 2004) and semantic relatedness (Grineva et al., 2009), and represented the relatedness information collected from a document as a graph (Mihalcea and Tarau, 2004; Wan and Xiao, 2008a; Wan and Xiao, 2008b; Bougouin et al., 2013). |
Keyphrase Extraction Approaches | Finally, an edge weight in a WW graph denotes the co-occurrence or knowledge-based similarity between the two connected words. |
Integration of Syntactic and Lexical Information | Co-occurrence (CO): CO features mostly convey lexical information only and are generally considered not particularly sensitive to argument structures (Rohde et al., 2004). |
Integration of Syntactic and Lexical Information | Adapted co-occurrence (ACO): Conventional CO features generally adopt a stop list to filter out function words. |
Results and Discussion | On the other hand, the co-occurrence feature (CO), which is believed to convey only lexical information, outperforms SCF on every n-way classification when n 2 10, suggesting that verbs in the same Levin classes tend to share their neighboring words. |
Results and Discussion | In fact, even the simple co-occurrence feature (CO) yields a better performance (42.4%) than these Levin-selected SCF sets. |
Experimental Setup | The second one creates a story randomly without taking any co-occurrence frequency into account. |
Introduction | Our generator operates over predicate-argument and predicate-predicate co-occurrence statistics gathered from corpora. |
Story Ranking | As explained earlier, our generator produces stories stochastically, by relying on co-occurrence frequencies collected from the training corpus. |
The Story Generator | A fragment of the action graph is shown in Figure 3 (for simplicity, the edges in the example are weighted with co-occurrence frequencies). |
Experimental Setup | We apply Local Mutual Information (LMI, (Evert, 2005)) as weighting scheme and reduce the full co-occurrence space to 300 dimensions using the Singular Value Decomposition. |
Experimental Setup | For constructing the text-based vectors, we follow a standard pipeline in distributional semantics (Turney and Pantel, 2010) without tuning its parameters and collect co-occurrence statistics from the concatenation of ukWaC4 and the Wikipedia, amounting to 2.7 billion tokens in total. |
Introduction | However, the models induce the meaning of words entirely from their co-occurrence with other words, without links to the external world. |
Regression Model for Alteration Selection | For example, for the query “controlling acid rain”, the coherence of the alteration “acidic” is measured by the logarithm of its co-occurrence with the other query terms within a predefined window (90 words) in the corpus. |
Regression Model for Alteration Selection | where P(controlling...acidic...rain|window) is the co-occurrence probability of the trigram containing acidic within a predefined window (50 words). |
Regression Model for Alteration Selection | On the other hand, the second feature helps because it can capture some co-occurrence information no matter how long the query is. |
Relation Mapping | Treating the aligned pairs as observation, the co-occurrence matrix between aligning relations and words was computed. |
Relation Mapping | 5. lirom the co-occurrence matrix ~we computed P(w | R), P(R), P(w | 7“) and PO“). |
Relation Mapping | ReverbMapping does the same, except that we took a uniform distribution on 15(w | R) and 15(R) since the contributed dataset did not include co-occurrence counts to estimate these probabilities.7 Note that the median rank from CluewebMapping is only 12, indicating that half of all answer relations are ranked in the top 12. |
Composition Models | We formulate semantic composition as a function of two vectors, u and v. We assume that individual words are represented by vectors acquired from a corpus following any of the parametrisa-tions that have been suggested in the literature.1 We briefly note here that a word’s vector typically represents its co-occurrence with neighboring words. |
Composition Models | The construction of the semantic space depends on the definition of linguistic context (e.g., neighbour-ing words can be documents or collocations), the number of components used (e.g., the k most frequent words in a corpus), and their values (e.g., as raw co-occurrence frequencies or ratios of probabilities). |
Composition Models | Here, the space has only five dimensions, and the matrix cells denote the co-occurrence of the target words (horse and run) with the context words animal, stable, and so on. |
Experiments | The first factor, 77%,” expresses a preference for topics likely from the co-occurrence information, whereas the second one, pig, favors the choice of topics which are predictive of the observable sentiment ratings. |
Introduction | First, ratable aspects normally represent coherent topics which can be potentially discovered from co-occurrence information in the text. |
The Model | Importantly, the fact that windows overlap permits the model to exploit a larger co-occurrence domain. |
Distributional measure of semantic similarity | A wide range of distributional information can be employed in vector-based models; the present study uses the ‘bag of words’ approach, which is based on the frequency of co-occurrence of words within a given context window. |
Distributional measure of semantic similarity | The part-of-speech annotated lemma of each collocate within a 5-word window was extracted from the COCA data to build the co-occurrence matrix recording the frequency of co-occurrence of each verb with its collocates. |
Distributional measure of semantic similarity | The co-occurrence matrix was transformed by applying a Point-wise Mutual Information weighting scheme, using the DISSECT toolkit (Dinu et al., 2013), to turn the raw frequencies into weights that reflect how distinctive a collocate is for a given target word with respect to the other target words under consideration. |
Experiments | To decide whether two names in the co-occurrence or family relationship match, we use the SoftTFIDF measure (Cohen et al., 2003), which is a hybrid matching scheme that combines the token-based TFIDF with the Jam-Winkler string distance metric. |
Methods 2.1 Document Level and Profile Based CDC | For instance, the similarity between the occupations ‘President’ and ‘Commander in Chief’ can be computed using the JC semantic distance (J iang and Conrath, 1997) with WordNet; the similarity of co-occurrence with other people can be measured by the J accard coefficient. |
Methods 2.1 Document Level and Profile Based CDC | a match in a family relationship is considered more important than in a co-occurrence relationship. |
Introduction | In Section 3 we briefly describe the datasets and outline the process of co-occurrence graph construction. |
Tracking sense changes | We use the co-occurrence based graph clustering framework introduced in (Biemann, 2006). |
Tracking sense changes | Firstly, a co-occurrence graph is created for every target word found in DT. |
Experimental setup | Following their description, we use a 2,000-dimensional space of syntactic co-occurrence features appropriate to the relation being predicted, weight features with the G2 transformation and compute similarity with the cosine measure. |
Results | 30 predicates were selected for each relation; each predicate was matched with three arguments from different co-occurrence bands in the BNC, e.g., naughty-girl (high frequency), naughty-dog (medium) and naughty-lunch (low). |
Three selectional preference models | Further differences are that information about predicate-argument co-occurrence is only shared within a given interaction class rather than across the whole dataset and that the distribution (Dz is not specific to the predicate 2) but rather to the relation 7“. |
Experiments | Retweets and redundant web documents are filtered to ensure more reliable frequency counting of co-occurrence relations. |
Introduction | Thus, the co-occurrence of a morph and its target is quite low in the vast amount of information in social media. |
Target Candidate Ranking | After applying the same annotation techniques as tweets for uncensored data sets, sentence-level co-occurrence relations are extracted and integrated into the network as shown in Figure 3. |
Abstract | To constrain training, it extracts co-occurrence dictionaries of entities and common nouns from the data. |
Introduction | (2011) proposed an approach that uses co-occurrence patterns to find entity type candidates, and then learns their applicability to relation arguments by using them as latent variables in a first-order HMM. |
Model | To restrict the search space and improve learning, we first have to learn which types modify entities and record their co-occurrence , and use this as dictionary. |
Entity Recommendation | Step 2 — Session Analysis: We build a query-entity frequency co-occurrence matrix, A, consisting of lel rows and nIEl columns, where each row corresponds to a query and each column to an entity. |
Experimental Results | 3'Note that this co-occurrence occurs because q’ was annotated with entity 6 in the same session as q occurred. |
Experimental Results | 5.3.1 Experimental Setup We instantiate our recommendation algorithm from Section 4.2 using session co-occurrence frequencies |
Data | Document statistics are word-document co-occurrence counts. |
Introduction | The basic assumption is that semantics drives a person’s language production behavior, and as a result co-occurrence patterns in written text indirectly encode word meaning. |
Introduction | The raw co-occurrence statistics are unwieldy, but in the compressed |
Online Lexicon Learning Algorithm | 10: for connected subgraph g of p; such that the size of g is less than or equal to m do 11: Increase the co-occurrence count of g and w by l 12: end for 13: end for |
Online Lexicon Learning Algorithm | From the corresponding navigation plan, we find all connected subgraphs of size less than or equal to m. We then update the co-occurrence counts between all the n-grams w and all the connected subgraphs 9. |
Online Lexicon Learning Algorithm | This allows us to determine if two graphs are identical in constant time and also lets us use a hash table to quickly update the co-occurrence and subgraph counts. |
Surface Realization | NPMI(ngr) = (21) Where NPMI is the normalized point-wise mutual information.4 Co—occurrence Cohesion Score: To capture long-distance cohesion, we introduce a co-occurrence-based score, which measures order-preserved co-occurrence statistics between the head words hsij and hqu 5. |
Surface Realization | co-occurrence cohesion is computed as: |
Surface Realization | Final Cohesion Score: Finally, the pairwise phrase cohesion score 177;qu is a weighted sum of n-gram and co-occurrence cohesion scores: NGRAM CO a ° Fsiqu + fl ° Faiqu (23) a + fl |
Using subcategorization information | In contrast, noun modification in noun-nounGen construction is represented by co-occurrence frequencies.7 |
Using subcategorization information | andalsof =0: this representation allows for a more fine- grained distinction in the low-to-mid frequency range, providing a good basis for the decision of whether a given noun-noun pair is a true noun-nounaen structure or just a random co-occurrence of two nouns. |
Using subcategorization information | The word Technologie (technology) has been marked as a candidate for a genitive in a noun-nounGen constructions; the co-occurrence frequency of the tuple Einfi‘ihrung-Technologie (introduction - technology) lies in the bucket 11. . |
Experiments | Thus, having only one seed per seed set will result in sampling that single word whenever that seed set is chosen which will not have the effect of correlating seed words so as to pull other words based on co-occurrence with constrained seed words. |
Proposed Seeded Models | The standard LDA and existing aspect and sentiment models (ASMs) are mostly governed by the phenomenon called “higher-order co-occurrence” (Heinrich, 2009), i.e., based on how often terms co-occur in different contexts]. |
Proposed Seeded Models | W1 co-occumng With W2 Wthh in turn co-occurs With W3 denotes a second-order co-occurrence between W1 and W3. |
Estimation | In the first step, we estimate the correspondence probability by the co-occurrence of the source-side and the target-side topic assignment of the word-aligned corpus. |
Estimation | Thus, the co-occurrence of a source-side topic with index kf and a target-side |
Estimation | We then compute the probability of P(z = kf|z 2 Ice) by normalizing the co-occurrence count. |