Abstract | In this paper, we explore the use of distance and co-occurrence information of word—pairs for language modeling. |
Introduction | the distance is described regardless the actual frequency of the history-word, while the co-occurrence is described regardless the actual position of the history-word. |
Language Modeling with TD and TO | In Eq.3, we have decoupled the observation of a word-pair into the events of distance and co-occurrence . |
Language Modeling with TD and TO | The TD likelihood for a distance k given the co-occurrence of the word-pair (wi_k, wt) can be estimated from counts as follows: |
Language Modeling with TD and TO | zero co-occurrence C(wi_k E h,t = w,) = O, which results in a division by zero. |
Motivation of the Proposed Approach | The attributes of distance and co-occurrence are exploited and modeled differently in each language modeling approach. |
Motivation of the Proposed Approach | Both, the conventional trigger model and the latent-semantic model capture the co-occurrence information while ignoring the distance information. |
Motivation of the Proposed Approach | On the other hand, distant-bigram models and distance-dependent trigger models make use of both, distance and co-occurrence , information up to window sizes of ten to twenty. |
Application to Essay Scoring | 8It is also possible that some of the instances with very high PMI are pairs that contain low frequency words for which the database predicts a spuriously high PMI based on a single (and atypical) co-occurrence that happens to repeat in an essay — similar to the Schwartz eschews example in (Manning and Schiitze, 1999, Table 5.16, p. 181). |
Introduction | Thus, co-occurrence of words in n-word windows, syntactic structures, sentences, paragraphs, and even whole documents is captured in vector-space models built from text corpora (Turney and Pan-tel, 2010; Basili and Pennacchiotti, 2010; Erk and Pado, 2008; Mitchell and Lapata, 2008; Bullinaria and Levy, 2007; Jones and Mewhort, 2007; Pado and Lapata, 2007; Lin, 1998; Landauer and Dumais, 1997; Lund and Burgess, 1996; Salton et al., 1975). |
Introduction | However, little is known about typical profiles of texts in terms of co-occurrence behavior of their words. |
Introduction | The cited approaches use topic models that are in turn estimated using word co-occurrence . |
Methodology | The first decision is how to quantify the extent of co-occurrence between two words; we will use point-wise mutual information (PMI) estimated from a large and diverse corpus of texts. |
Methodology | The third decision is how to represent the co-occurrence profiles; we use a histogram where each bin represents the proportion of word pairs in the given interval of PMI values. |
Methodology | To obtain comprehensive information about typical co-occurrence behavior of words of English, we build a first-order co-occurrence word-space model (Turney and Pantel, 2010; Baroni and Lenci, 2010). |
Experimental setup | Co-occurrence features (C) |
Experimental setup | Co-occurrence and IR effectiveness prediction features (CI) was the most influential class, and accounted for 70% of all features in the model. |
Related work | In ad hoc IR, most models of term dependence use word co-occurrence and proximity (Song and Croft, 1999; Metzler and Croft, 2005; Srikanth and Srihari, 2002; van Rijsbergen, 1993). |
Selection method for catenae | Co-occurrence features: A governor wl tends to subcategorize for its dependents wn. |
Selection method for catenae | We conclude that co-occurrence is an important feature of dependency relations (Mel’cuk, 2003). |
Selection method for catenae | In addition, term frequencies and inverse document frequencies calculated using word co-occurrence measures are commonly used in IR. |
Experiment | By this, constructing a graph based on word co-occurrence of each 3 sentences in a document works well to rank important words, taking account of the context of the word. |
Introduction | In our study, we express the relation of word co-occurrence in the form of a graph. |
Related studies | (2011) has detected topics in a document by constructing a graph of word co-occurrence and applied the PageRank algorithm on it. |
Related studies | The graph used in our method is constructed based on word co-occurrence so that important words which are sensitive to latent information can be extracted by the PageRank algorithm. |
Techniques for text classification | According to (Newman et al., 2010), topic coherence is related to word co-occurrence . |
Techniques for text classification | The refined documents are composed of the important sentences extracted from a viewpoint of latent information, i.e., word co-occurrence , so they are proper to be classified based on latent information. |
Techniques for text classification | In our study, we construct a graph based on word co-occurrence . |
Related Work | Table 2 Context of a particular noun represented as a co-occurrence vector |
Related Work | Context is represented as co-occurrence vectors that are based on syntactic dependencies. |
Related Work | Table 3 Context of a particular noun represented as a co-occurrence vector |
RSP: A Random Walk Model for SP | We initiate the links E with the raw co-occurrence counts of seen predicate-argument pairs in a given generalization data. |
RSP: A Random Walk Model for SP | But in SP, the preferences between the predicates and arguments are implicit: their co-occurrence counts follow the power law distribution and vary greatly. |
RSP: A Random Walk Model for SP | investigate the correlations between the co-occurrence counts (CT) C(q, a), or smoothed counts with the human plausibility judgements (Lapata et al., 1999; Lapata et al., 2001). |
Related Work 2.1 WordNet-based Approach | (1999) introduce a general similarity-based model for word co-occurrence probabilities, which can be interpreted for SP. |
Composition methods | Distributional semantic models (DSMs), also known as vector-space models, semantic spaces, or by the names of famous incarnations such as Latent Semantic Analysis or Topic Models, approximate the meaning of words with vectors that record their patterns of co-occurrence with corpus context features (often, other words). |
Experimental setup | We collect co-occurrence statistics for the top 20K content words (adjectives, adverbs, nouns, verbs) |
Experimental setup | Due to differences in co-occurrence weighting schemes (we use a logarithmically scaled measure, they do not), their multiplicative model is closer to our additive one. |
Introduction | Distributional semantic models (DSMs) in particular represent the meaning of a word by a vector, the dimensions of which encode corpus-extracted co-occurrence statistics, under the assumption that words that are semantically similar will occur in similar contexts (Tumey and Pantel, 2010). |
Introduction | Trying to represent the meaning of arbitrarily long constructions by directly collecting co-occurrence statistics is obviously ineffective and thus methods have been developed to derive the meaning of larger constructions as a function of the meaning of their constituents (Baroni and Zamparelli, 2010; Coecke et al., 2010; Mitchell and Lapata, 2008; Mitchell and Lapata, 2010; Socher et al., 2012). |
Collocational Lexicon Induction | A distributional profile (DP) of a word or phrase type is a co-occurrence vector created by combining all co-occurrence vectors of the tokens of that phrase type. |
Collocational Lexicon Induction | These co-occurrence counts are converted to an association measure (Section 2.2) that encodes the relatedness of each pair of words or phrases. |
Collocational Lexicon Induction | A(-, is an association measure and can simply be defined as co-occurrence counts within sliding windows. |
Related work | They used a graph based on context similarity as well as co-occurrence graph in propagation process. |
Experiments | Retweets and redundant web documents are filtered to ensure more reliable frequency counting of co-occurrence relations. |
Introduction | Thus, the co-occurrence of a morph and its target is quite low in the vast amount of information in social media. |
Target Candidate Ranking | After applying the same annotation techniques as tweets for uncensored data sets, sentence-level co-occurrence relations are extracted and integrated into the network as shown in Figure 3. |
Using subcategorization information | In contrast, noun modification in noun-nounGen construction is represented by co-occurrence frequencies.7 |
Using subcategorization information | andalsof =0: this representation allows for a more fine- grained distinction in the low-to-mid frequency range, providing a good basis for the decision of whether a given noun-noun pair is a true noun-nounaen structure or just a random co-occurrence of two nouns. |
Using subcategorization information | The word Technologie (technology) has been marked as a candidate for a genitive in a noun-nounGen constructions; the co-occurrence frequency of the tuple Einfi‘ihrung-Technologie (introduction - technology) lies in the bucket 11. . |