Abstract | In this paper, we present a new learning scenario, heterogeneous transfer learning, which improves learning performance when the data can be in different feature spaces and where no correspondence between data instances in these spaces is provided. |
Image Clustering with Annotated Auxiliary Data | Let .7: = { be an image feature space , and V = {vfigll be the image data set. |
Introduction | Traditional machine learning relies on the availability of a large amount of data to train a model, which is then applied to test data in the same feature space . |
Introduction | A commonality among these methods is that they all require the training data and test data to be in the same feature space . |
Introduction | However, in practice, we often face the problem where the labeled data are scarce in their own feature space, whereas there may be a large amount of labeled heterogeneous data in another feature space . |
Related Works | This example is related to our image clustering problem because they both rely on data from different feature spaces . |
Related Works | Most learning algorithms for dealing with cross-language heterogeneous data require a translator to convert the data to the same feature space . |
Related Works | For those data that are in different feature spaces where no translator is available, Davis and Domingos (2008) proposed a Markov-logic-based transfer learning algorithm, which is called deep transfer, for transferring knowledge between biological domains and Web domains. |
Abstract | Our method is based on recent advances in the field of statistical machine learning (multivariate capabilities of Support Vector Machines) and a rich feature space . |
Building a Discourse Parser | This makes SVM well-fitted to treat classification problems involving relatively large feature spaces such as ours (% 105 features). |
Evaluation | The feature space dimension is 136,987. |
Methods 2.1 Document Level and Profile Based CDC | Kemelization (Scholkopf and Smola, 2002) is a machine learning technique to transform patterns in the data space to a high-dimensional feature space so that the structure of the data can be more easily and adequately discovered. |
Methods 2.1 Document Level and Profile Based CDC | Using the kernel trick, the squared distance between (19(rj) and (l)(wi) in the feature space H can be computed as: |
Methods 2.1 Document Level and Profile Based CDC | We measure the kemelized XBI (KXBI) in the feature space as, |
Introduction | In NLP applications, the dimension of the feature space tends to be very largeāit can easily become several millions, so the application of L1 penalty to all features significantly slows down the weight updating process. |
Log-Linear Models | Since the dimension of the feature space can be very large, it can significantly slow down the weight update process. |
Log-Linear Models | Another merit is that it allows us to perform the application of L1 penalty in a lazy fashion, so that we do not need to update the weights of the features that are not used in the current sample, which leads to much faster training when the dimension of the feature space is large. |