Image Clustering with Annotated Auxiliary Data | Intuitively, our algorithm aPLSA performs PLSA analysis on the target images, which are converted to an image instance-to-feature co-occurrence matrix. |
Image Clustering with Annotated Auxiliary Data | At the same time, PLSA is also applied to the annotated image data from social Web, which is converted into a text-to-image-feature co-occurrence matrix. |
Image Clustering with Annotated Auxiliary Data | Based on the image data set V, we can estimate an image instance-to-feature co-occurrence matrix AMXIJEl 6 WWW, where each element Ala-(1 g i g |V| and l g j g |.7-"|) in the matrix A is the frequency of the feature fj appearing in the instance 21,-. |
Experiments | Co-occurrence 0.47 0.56 0.45 0.41 0.41 |
Experiments | Co-occurrence 0.34 0.36 0.34 0.31 0.31 |
Experiments | Both co-occurrence and lexico-syntactic patterns work well for all three types of relations. |
Introduction | The common types of features include contextual (Lin, 1998), co-occurrence (Yang and Callan, 2008), and syntactic dependency (Pantel and Lin, 2002; Pantel and Ravichandran, 2004). |
Introduction | The framework integrates contextual, co-occurrence , syntactic dependency, lexi-cal-syntactic patterns, and other features to learn an ontology metric, a score indicating semantic distance, for each pair of terms in a taxonomy; it then incrementally clusters terms based on their ontology metric scores. |
Related Work | Inspired by the conjunction and appositive structures, Riloff and Shepherd (1997), Roark and Charniak (1998) used co-occurrence statistics in local context to discover sibling relations. |
Related Work | Besides contextual features, the vectors can also be represented by verb-noun relations (Pereira et al., 1993), syntactic dependency (Pantel and Ravichandran, 2004; Snow et al., 2005), co-occurrence (Yang and Callan, 2008), conjunction and appositive features (Caraballo, 1999). |
The Features | The features include contextual, co-occurrence , syntactic dependency, lexical-syntactic patterns, and miscellaneous. |
The Features | The second set of features is co-occurrence . |
The Features | In our work, co-occurrence is measured by point-wise mutual information between two terms: |
Term Weighting and Sentiment Analysis | Statistical measures of associations between terms include estimations by the co-occurrence in the whole collection, such as Point-wise Mutual Information (PMI) and Latent Semantic Analysis (LSA). |
Term Weighting and Sentiment Analysis | Another way is to use co-occurrence statistics |
Term Weighting and Sentiment Analysis | where K is the maximum window size for the co-occurrence and is arbitrarily set to 3 in our experiments. |
Experimental Setup | The second one creates a story randomly without taking any co-occurrence frequency into account. |
Introduction | Our generator operates over predicate-argument and predicate-predicate co-occurrence statistics gathered from corpora. |
Story Ranking | As explained earlier, our generator produces stories stochastically, by relying on co-occurrence frequencies collected from the training corpus. |
The Story Generator | A fragment of the action graph is shown in Figure 3 (for simplicity, the edges in the example are weighted with co-occurrence frequencies). |
Experiments | To decide whether two names in the co-occurrence or family relationship match, we use the SoftTFIDF measure (Cohen et al., 2003), which is a hybrid matching scheme that combines the token-based TFIDF with the Jam-Winkler string distance metric. |
Methods 2.1 Document Level and Profile Based CDC | For instance, the similarity between the occupations ‘President’ and ‘Commander in Chief’ can be computed using the JC semantic distance (J iang and Conrath, 1997) with WordNet; the similarity of co-occurrence with other people can be measured by the J accard coefficient. |
Methods 2.1 Document Level and Profile Based CDC | a match in a family relationship is considered more important than in a co-occurrence relationship. |