Bilingual Lexicon Extraction | For each word 2' of the source and the target languages, we obtain a context vector U;- which gathers the set of co-occurrence words j associated with the number of times that j and 2' occur together 0000(2', j). |
Bilingual Lexicon Extraction | One way to deal with this problem is to reestimate co-occurrence counts by a prediction function (Hazem and Morin, 2013). |
Bilingual Lexicon Extraction | This consists in assigning to each observed co-occurrence count of a small comparable corpora, a new value learned beforehand from a large training corpus. |
Experiments and Results | The aim of this experiment is twofold: first, we want to evaluate the usefulness of predicting word co-occurrence counts and second, we want to find out whether it is more appropriate to apply prediction to the source side, the target side or both sides of the bilingual comparable corpora. |
Experiments and Results | We applied the same regression function to all co-occurrence counts while learning models for low and high frequencies should have been more appropriate. |
Abstract | Instead of taking a broad view of topic context in spoken documents, variability of word co-occurrence statistics across corpora leads us to focus instead the on phenomenon of word repetition within single documents. |
Introduction | In order to arrive at our eventual solution, we take the BABEL Tagalog corpus and analyze word co-occurrence and repetition statistics in detail. |
Introduction | Our observation of the variability in co-occurrence statistics between Tagalog training and development partitions leads us to narrow the scope of document context to same word co-occurrences, i.e. |
Motivation | Given the rise of unsupervised latent topic modeling with Latent Dirchlet Allocation (Blei et al., 2003) and similar latent variable approaches for discovering meaningful word co-occurrence patterns in large text corpora, we ought to be able to leverage these topic contexts instead of merely N-grams. |
Motivation | The difficulty in this approach arises from the variability in word co-occurrence statistics. |
Motivation | Unfortunately, estimates of co-occurrence from small corpora are not very consistent, and often over- or underestimate concurrence probabilities needed for term detection. |
Term Detection Re-scoring | As with word co-occurrence , we consider if estimates of Padapt(w) from training data are consistent when estimated on development data. |
Distribution Prediction | main, we construct a feature co-occurrence matrix A in which columns correspond to unigram features and rows correspond to either unigram or bigram features. |
Distribution Prediction | The value of the element aij in the co-occurrence matrix A is set to the number of sentences in which the i-th and j-th features co-occur. |
Distribution Prediction | We apply Positive Pointwise Mutual Information (PPMI) to the co-occurrence matrix A. |
Experiments and Results | For each domain D in the SANCL (POS tagging) and Amazon review (sentiment classification) datasets, we create a PPMI weighted co-occurrence matrix FD. |
Experiments | But they captured relations only using co-occurrence statistics. |
Experiments | Second, our method captures semantic relations using topic modeling and captures opinion relations through word alignments, which are more precise than Hai which merely uses co-occurrence information to indicate such relations among words. |
Introduction | In traditional extraction strategy, opinion associations are usually computed based on the co-occurrence frequency. |
Related Work | They usually captured different relations using co-occurrence information. |
The Proposed Method | Each opinion target can find its corresponding modifiers in sentences through alignment, in which multiple factors are considered globally, such as co-occurrence information, word position in sentence, etc. |
The Proposed Method | p(vt, 210) is the co-occurrence probability of v75 and 110 based on the opinion relation identification results. |
Features | Presence and distance: For each potential edge (stump, we mine patterns from all abstracts in which the two terms co-occur in either order, allowing a maximum term distance of 20 (because beyond that, co-occurrence may not imply a relation). |
Introduction | Our model is also the first to directly learn relational patterns as part of the process of training an end-to-end taxonomic induction system, rather than using patterns that were hand-selected or learned via pairwise classifiers on manually annotated co-occurrence patterns. |
Related Work | Both of these systems use a process that starts by finding basic level terms (leaves of the final taxonomy tree, typically) and then using relational patterns (hand-selected ones in the case of Kozareva and Hovy (2010), and ones learned separately by a pairwise classifier on manually annotated co-occurrence patterns for Navigli and Velardi (2010), Navigli et al. |
Related Work | Our model also automatically learns relational patterns as a part of the taxonomic training phase, instead of relying on handpicked rules or pairwise classifiers on manually annotated co-occurrence patterns, and it is the first end-to-end (i.e., non-incremental) system to include heterogeneous relational information via sibling (e.g., coordination) patterns. |
Analysis | As discussed before, the relationship between two candidates is traditionally established using co-occurrence information. |
Analysis | However, using co-occurrence windows has its shortcomings. |
Keyphrase Extraction Approaches | Researchers have computed relatedness between candidates using co-occurrence counts (Mihalcea and Tarau, 2004; Matsuo and Ishizuka, 2004) and semantic relatedness (Grineva et al., 2009), and represented the relatedness information collected from a document as a graph (Mihalcea and Tarau, 2004; Wan and Xiao, 2008a; Wan and Xiao, 2008b; Bougouin et al., 2013). |
Keyphrase Extraction Approaches | Finally, an edge weight in a WW graph denotes the co-occurrence or knowledge-based similarity between the two connected words. |
Anchor Words: Scalable Topic Models | Rethinking Data: Word Co-occurrence Inference in topic models can be viewed as a black box: given a set of documents, discover the topics that best explain the data. |
Anchor Words: Scalable Topic Models | The difference between anchor and conventional inference is that while conventional methods take a collection of documents as input, anchor takes word co-occurrence statistics. |
Anchor Words: Scalable Topic Models | (3) The anchor method is fast, as it only depends on the size of the vocabulary once the co-occurrence statistics Q are obtained. |
Introduction | This approach is fast and effective; because it only uses word co-occurrence information, it can scale to much larger datasets than MCMC or EM alternatives. |
Experiments | Afterwards, word-syntactic pattern co-occurrence statistic is used as feature for a semi-supervised classifier TSVM (J oachims, 1999) to further refine the results. |
Experiments | In contrast, CONT exploits latent semantics of each word in context, and LEX takes advantage of word embedding, which is induced from global word co-occurrence statistic. |
Introduction | 0 It exploits semantic similarity between words to capture lexical clues, which is shown to be more effective than co-occurrence relation between words and syntactic patterns. |
Related Work | A recent research (Xu et al., 2013) extracted infrequent product features by a semi-supervised classifier, which used word-syntactic pattern co-occurrence statistics as features for the classifier. |
Baselines | One is word co-occurrence (if word w and word wj occur in the same sentence or in the adjacent sentences, Sim(wi,wj) increases 1), and the other is WordNet (Miller, 1995) based similarity. |
Baselines | 1 Total weight of words in the focus candidate using the co-occurrence similarity. |
Baselines | 2 Max weight of words in the focus candidate using the co-occurrence similarity. |
Data | Document statistics are word-document co-occurrence counts. |
Introduction | The basic assumption is that semantics drives a person’s language production behavior, and as a result co-occurrence patterns in written text indirectly encode word meaning. |
Introduction | The raw co-occurrence statistics are unwieldy, but in the compressed |
Abstract | To constrain training, it extracts co-occurrence dictionaries of entities and common nouns from the data. |
Introduction | (2011) proposed an approach that uses co-occurrence patterns to find entity type candidates, and then learns their applicability to relation arguments by using them as latent variables in a first-order HMM. |
Model | To restrict the search space and improve learning, we first have to learn which types modify entities and record their co-occurrence , and use this as dictionary. |
Experimental Setup | We apply Local Mutual Information (LMI, (Evert, 2005)) as weighting scheme and reduce the full co-occurrence space to 300 dimensions using the Singular Value Decomposition. |
Experimental Setup | For constructing the text-based vectors, we follow a standard pipeline in distributional semantics (Turney and Pantel, 2010) without tuning its parameters and collect co-occurrence statistics from the concatenation of ukWaC4 and the Wikipedia, amounting to 2.7 billion tokens in total. |
Introduction | However, the models induce the meaning of words entirely from their co-occurrence with other words, without links to the external world. |
Introduction | In Section 3 we briefly describe the datasets and outline the process of co-occurrence graph construction. |
Tracking sense changes | We use the co-occurrence based graph clustering framework introduced in (Biemann, 2006). |
Tracking sense changes | Firstly, a co-occurrence graph is created for every target word found in DT. |
Distributional measure of semantic similarity | A wide range of distributional information can be employed in vector-based models; the present study uses the ‘bag of words’ approach, which is based on the frequency of co-occurrence of words within a given context window. |
Distributional measure of semantic similarity | The part-of-speech annotated lemma of each collocate within a 5-word window was extracted from the COCA data to build the co-occurrence matrix recording the frequency of co-occurrence of each verb with its collocates. |
Distributional measure of semantic similarity | The co-occurrence matrix was transformed by applying a Point-wise Mutual Information weighting scheme, using the DISSECT toolkit (Dinu et al., 2013), to turn the raw frequencies into weights that reflect how distinctive a collocate is for a given target word with respect to the other target words under consideration. |
Relation Mapping | Treating the aligned pairs as observation, the co-occurrence matrix between aligning relations and words was computed. |
Relation Mapping | 5. lirom the co-occurrence matrix ~we computed P(w | R), P(R), P(w | 7“) and PO“). |
Relation Mapping | ReverbMapping does the same, except that we took a uniform distribution on 15(w | R) and 15(R) since the contributed dataset did not include co-occurrence counts to estimate these probabilities.7 Note that the median rank from CluewebMapping is only 12, indicating that half of all answer relations are ranked in the top 12. |