Heterogeneous Transfer Learning for Image Clustering via the SocialWeb

In this paper, we present a new learning scenario, heterogeneous transfer learning, which improves learning performance when the data can be in different feature spaces and where no correspondence between data instances in these spaces is provided.

Traditional machine learning relies on the availability of a large amount of data to train a model, which is then applied to test data in the same feature space.

2.1 Heterogeneous Transfer Learning Between Languages

In this section, we present our annotation-based probabilistic latent semantic analysis algorithm (aPLSA), which extends the traditional PLSA model by incorporating annotated auxiliary image data.

In this section, we empirically evaluate the aP L SA algorithm together with some state-of-art baseline methods on two widely used image corpora, to demonstrate the effectiveness of our algorithm aPLSA.

In this paper, we proposed a new learning scenario called heterogeneous transfer learning and illustrated its application to image clustering.

Appears in 15 sentences as: feature space (10) feature spaces (6)

In *Heterogeneous Transfer Learning for Image Clustering via the SocialWeb*

- In this paper, we present a new learning scenario, heterogeneous transfer learning, which improves learning performance when the data can be in different feature spaces and where no correspondence between data instances in these spaces is provided.Page 1, “Abstract”
- Traditional machine learning relies on the availability of a large amount of data to train a model, which is then applied to test data in the same feature space .Page 1, “Introduction”
- A commonality among these methods is that they all require the training data and test data to be in the same feature space .Page 1, “Introduction”
- However, in practice, we often face the problem where the labeled data are scarce in their own feature space, whereas there may be a large amount of labeled heterogeneous data in another feature space .Page 1, “Introduction”
- To learn from heterogeneous data, researchers have previously proposed multi-view learning (Blum and Mitchell, 1998; Nigam and Ghani, 2000) in which each instance has multiple views in different feature spaces .Page 1, “Introduction”
- Different from previous works, we focus on the problem of heterogeneous transfer learning, which is designed for situation when the training data are in one feature space (such as text), and the test data are in another (such as images), and there may be no correspondence between instances in these spaces.Page 1, “Introduction”
- This example is related to our image clustering problem because they both rely on data from different feature spaces .Page 2, “Related Works”
- Most learning algorithms for dealing with cross-language heterogeneous data require a translator to convert the data to the same feature space .Page 3, “Related Works”
- For those data that are in different feature spaces where no translator is available, Davis and Domingos (2008) proposed a Markov-logic-based transfer learning algorithm, which is called deep transfer, for transferring knowledge between biological domains and Web domains.Page 3, “Related Works”
- a novel learning paradigm, known as translated learning, to deal with the problem of learning heterogeneous data that belong to quite different feature spaces by using a risk minimization framework.Page 3, “Related Works”
- Let .7: = { be an image feature space , and V = {vfigll be the image data set.Page 4, “Image Clustering with Annotated Auxiliary Data”

See all papers in *Proc. ACL 2009* that mention feature space.

See all papers in *Proc. ACL* that mention feature space.

Back to top.

Appears in 13 sentences as: (2) co-occurrence (13)

In *Heterogeneous Transfer Learning for Image Clustering via the SocialWeb*

- Intuitively, our algorithm aPLSA performs PLSA analysis on the target images, which are converted to an image instance-to-feature co-occurrence matrix.Page 4, “Image Clustering with Annotated Auxiliary Data”
- At the same time, PLSA is also applied to the annotated image data from social Web, which is converted into a text-to-image-feature co-occurrence matrix.Page 4, “Image Clustering with Annotated Auxiliary Data”
- Based on the image data set V, we can estimate an image instance-to-feature co-occurrence matrix AMXIJEl 6 WWW, where each element Ala-(1 g i g |V| and l g j g |.7-"|) in the matrix A is the frequency of the feature fj appearing in the instance 21,-.Page 4, “Image Clustering with Annotated Auxiliary Data”
- The annotated image data allow us to obtain the co-occurrence information between images 2) and text features 212 E W. An example of annotated image data is the Flickr (http: //www .Page 4, “Image Clustering with Annotated Auxiliary Data”
- By extracting image features from the annotated images 2), we can estimate a text-to-image feature co-occurrence matrix BIWIXUEl E lelxlfl, where each element Bij (l g i S and l g j g |.7-"|) in the matrix B is the frequency of the text feature 212,- and the image feature fj occurring together in the annotated image data set.Page 4, “Image Clustering with Annotated Auxiliary Data”
- Our objective is to estimate a clustering function 9 : V |—> Z with the help of the two co-occurrence matrices A and B as defined above.Page 4, “Image Clustering with Annotated Auxiliary Data”
- In our image clustering task, PLSA decomposes the instance-feature co-occurrence matrix A under the assumption of conditional independence of image instances V and image features .7-", given the latent variables Z.Page 4, “Image Clustering with Annotated Auxiliary Data”
- feature co-occurrence matrix.Page 4, “Image Clustering with Annotated Auxiliary Data”
- where AMXUEl 6 WWW is the image instance-feature co-occurrence matrix, and BIWIXUEl E lelxlfl is the text-to-image feature-level co-occurrence matrix.Page 5, “Image Clustering with Annotated Auxiliary Data”
- Furthermore, A acts as a tradeoff parameter between the co-occurrence matrices A and B.Page 5, “Image Clustering with Annotated Auxiliary Data”
- Input: The V-.7-" co-occurrence matrix A and W-]: co-occurrence matrix B.Page 6, “Image Clustering with Annotated Auxiliary Data”

See all papers in *Proc. ACL 2009* that mention co-occurrence.

See all papers in *Proc. ACL* that mention co-occurrence.

Back to top.

Appears in 12 sentences as: (1) latent variable (5) latent variables (6)

In *Heterogeneous Transfer Learning for Image Clustering via the SocialWeb*

- In order to unify those two separate PLSA models, these two steps are done simultaneously with common latent variables used as a bridge linking them.Page 4, “Image Clustering with Annotated Auxiliary Data”
- Through these common latent variables , which are now constrained by both target image data and auxiliary annotation data, a better clustering result is expected for the target data.Page 4, “Image Clustering with Annotated Auxiliary Data”
- Let Z = be the latent variable set in our aPLSA model.Page 4, “Image Clustering with Annotated Auxiliary Data”
- In clustering, each latent variable 21- E Z corresponds to a certain cluster.Page 4, “Image Clustering with Annotated Auxiliary Data”
- In our image clustering task, PLSA decomposes the instance-feature co-occurrence matrix A under the assumption of conditional independence of image instances V and image features .7-", given the latent variables Z.Page 4, “Image Clustering with Annotated Auxiliary Data”
- In Figure 3, the latent variables Z depend not only on the correlation between image instances V and image features .73, but also the correlation between text features W and image features .73.Page 5, “Image Clustering with Annotated Auxiliary Data”
- Therefore, the auxiliary socially-annotated image data can be used to help the target image clustering performance by estimating good set of latent variables Z.Page 5, “Image Clustering with Annotated Auxiliary Data”
- 0 E-Step: calculate the posterior probability of each latent variable 2 given the observation of image features f, image instances 2) and text features 212 based on the old estimate of P(f|z), and P(z|v):Page 5, “Image Clustering with Annotated Auxiliary Data”
- and conditional probability P(fj |zk), which is a mixture portion of posterior probability of latent variablesPage 5, “Image Clustering with Annotated Auxiliary Data”
- Output: A clustering (partition) function g : V |—> Z, which maps an image instance 2) E V to a latent variable 2 E Z.Page 6, “Image Clustering with Annotated Auxiliary Data”
- The entropy of g on a single latent variable 2 is defined to be H (g, z) é — 2666 P(c|z) log2 P(c|z), where C is the classPage 7, “Experiments”

See all papers in *Proc. ACL 2009* that mention latent variables.

See all papers in *Proc. ACL* that mention latent variables.

Back to top.

Appears in 7 sentences as: Latent Semantic (1) latent semantic (7)

In *Heterogeneous Transfer Learning for Image Clustering via the SocialWeb*

- What we need to do is to uncover this latent semantic information by finding out what is common among them.Page 2, “Related Works”
- Probabilistic latent semantic analysis (PLSA) is a widely used probabilistic model (Hofmann, 1999), and could be considered as a probabilistic implementation of latent semantic analysis (L SA) (Deerwester et al., 1990).Page 3, “Related Works”
- In this section, we present our annotation-based probabilistic latent semantic analysis algorithm (aPLSA), which extends the traditional PLSA model by incorporating annotated auxiliary image data.Page 4, “Image Clustering with Annotated Auxiliary Data”
- 3.1 Probabilistic Latent Semantic AnalysisPage 4, “Image Clustering with Annotated Auxiliary Data”
- To formally introduce the aPLSA model, we start from the probabilistic latent semantic analysis (PLSA) (Hofmann, 1999) model.Page 4, “Image Clustering with Annotated Auxiliary Data”
- PLSA is a probabilistic implementation of latent semantic analysis (LSA) (Deerwester et al., 1990).Page 4, “Image Clustering with Annotated Auxiliary Data”
- From the above equations, we can derive our annotation-based probabilistic latent semantic analysis (aPLSA) algorithm.Page 5, “Image Clustering with Annotated Auxiliary Data”

See all papers in *Proc. ACL 2009* that mention latent semantic.

See all papers in *Proc. ACL* that mention latent semantic.

Back to top.

Appears in 5 sentences as: development sets (5)

In *Heterogeneous Transfer Learning for Image Clustering via the SocialWeb*

- To empirically investigate the parameter A and the convergence of our algorithm aPLSA, we generated five more date sets as the development sets .Page 6, “Experiments”
- The detailed description of these five development sets , namely tunel to tune5 is listed in Table 1 as well.Page 6, “Experiments”
- We have done some experiments on the development sets to investigate how different A affect the performance of aPLSA.Page 8, “Experiments”
- The result in average entropy over all development sets is shown in Figure 4(b).Page 8, “Experiments”
- Figure 4(c) shows the average entropy curve given by aPLSA over all development sets .Page 8, “Experiments”

See all papers in *Proc. ACL 2009* that mention development sets.

See all papers in *Proc. ACL* that mention development sets.

Back to top.

Appears in 4 sentences as: Graphical model (2) graphical model (2)

In *Heterogeneous Transfer Learning for Image Clustering via the SocialWeb*

- Figure 2: Graphical model representation of P L SA model.Page 4, “Image Clustering with Annotated Auxiliary Data”
- The graphical model representation of PLSA is shown in Figure 2.Page 4, “Image Clustering with Annotated Auxiliary Data”
- 69% Figure 3: Graphical model representation of aPLSA model.Page 5, “Image Clustering with Annotated Auxiliary Data”
- Based on the graphical model representation in Figure 3, we derive the log-likelihood objective function, in a similar way as in (Cohn and Hofmann, 2000), as followsPage 5, “Image Clustering with Annotated Auxiliary Data”

See all papers in *Proc. ACL 2009* that mention Graphical model.

See all papers in *Proc. ACL* that mention Graphical model.

Back to top.

Appears in 4 sentences as: machine learning (4)

In *Heterogeneous Transfer Learning for Image Clustering via the SocialWeb*

- Traditional machine learning relies on the availability of a large amount of data to train a model, which is then applied to test data in the same feature space.Page 1, “Introduction”
- Various machine learning strategies have been proposed to address this problem, including semi-supervised learning (Zhu, 2007), domain adaptation (Wu and Diet-terich, 2004; Blitzer et al., 2006; Blitzer et al., 2007; Arnold et al., 2007; Chan and Ng, 2007; Daume, 2007; Jiang and Zhai, 2007; ReichartPage 1, “Introduction”
- To consider how heterogeneous transfer learning relates to other types of learning, Figure 1 presents an intuitive illustration of four learning strategies, including traditional machine learning , transfer learning across different distributions, multi-view learning and heterogeneous transfer learning.Page 1, “Introduction”
- However, because the labeled Chinese Web pages are still not sufficient, we often find it difficult to achieve high accuracy by applying traditional machine learning algorithms to the Chinese Web pages directly.Page 2, “Related Works”

See all papers in *Proc. ACL 2009* that mention machine learning.

See all papers in *Proc. ACL* that mention machine learning.

Back to top.

Appears in 3 sentences as: conditional probabilities (2) conditional probability (1)

In *Heterogeneous Transfer Learning for Image Clustering via the SocialWeb*

- Then we apply the EM algorithm (Dempster et al., 1977) to estimate the conditional probabilities P(f|z), and P(z|v) with respect to each dependence in Figure 3 as follows.Page 5, “Image Clustering with Annotated Auxiliary Data”
- o M-Step: re-estimates conditional probabilities P(zk|vi) and P(zk|wl):Page 5, “Image Clustering with Annotated Auxiliary Data”
- and conditional probability P(fj |zk), which is a mixture portion of posterior probability of latent variablesPage 5, “Image Clustering with Annotated Auxiliary Data”

See all papers in *Proc. ACL 2009* that mention conditional probabilities.

See all papers in *Proc. ACL* that mention conditional probabilities.

Back to top.

Appears in 3 sentences as: learning algorithm (1) learning algorithms (2)

In *Heterogeneous Transfer Learning for Image Clustering via the SocialWeb*

- However, because the labeled Chinese Web pages are still not sufficient, we often find it difficult to achieve high accuracy by applying traditional machine learning algorithms to the Chinese Web pages directly.Page 2, “Related Works”
- Most learning algorithms for dealing with cross-language heterogeneous data require a translator to convert the data to the same feature space.Page 3, “Related Works”
- For those data that are in different feature spaces where no translator is available, Davis and Domingos (2008) proposed a Markov-logic-based transfer learning algorithm , which is called deep transfer, for transferring knowledge between biological domains and Web domains.Page 3, “Related Works”

See all papers in *Proc. ACL 2009* that mention learning algorithms.

See all papers in *Proc. ACL* that mention learning algorithms.

Back to top.

Appears in 3 sentences as: objective function (3)

In *Heterogeneous Transfer Learning for Image Clustering via the SocialWeb*

- Based on the graphical model representation in Figure 3, we derive the log-likelihood objective function , in a similar way as in (Cohn and Hofmann, 2000), as followsPage 5, “Image Clustering with Annotated Auxiliary Data”
- objective function ignores all the biases from thePage 5, “Image Clustering with Annotated Auxiliary Data”
- points based on the objective function £ in Equation (5).Page 5, “Image Clustering with Annotated Auxiliary Data”

See all papers in *Proc. ACL 2009* that mention objective function.

See all papers in *Proc. ACL* that mention objective function.

Back to top.

Appears in 3 sentences as: probabilistic model (4)

In *Heterogeneous Transfer Learning for Image Clustering via the SocialWeb*

- Probabilistic latent semantic analysis (PLSA) is a widely used probabilistic model (Hofmann, 1999), and could be considered as a probabilistic implementation of latent semantic analysis (L SA) (Deerwester et al., 1990).Page 3, “Related Works”
- An extension to PLSA was proposed in (Cohn and Hofmann, 2000), which incorporated the hyperlink connectivity in the PLSA model by using a joint probabilistic model for connectivity and content.Page 3, “Related Works”
- So we design a new PLSA model by joining the probabilistic model in Equation (1) and the probabilistic model in Equation (4) into a unified model, as shown in Figure 3.Page 5, “Image Clustering with Annotated Auxiliary Data”

See all papers in *Proc. ACL 2009* that mention probabilistic model.

See all papers in *Proc. ACL* that mention probabilistic model.

Back to top.