Cross-Language Text Classification Using Structural Correspondence Learning
Prettenhofer, Peter and Stein, Benno

Article Structure

Abstract

We present a new approach to cross-language text classification that builds on structural correspondence learning, a recently proposed theory for domain adaptation.

Introduction

This paper deals with cross-language text classification problems.

Related Work

Cross-Language Text Classification Bel et al.

Cross-Language Text Classification

This section introduces basic models and terminology.

Cross-Language Structural Correspondence Learning

We now present a novel method for learning a map 6 by exploiting relations from unlabeled documents written in S and ’2'.

Experiments

We evaluate CL—SCL for the task of cross-language sentiment classification using English as source language and German, French, and Japanese as target languages.

Conclusion

The paper introduces a novel approach to cross-language text classification, called cross-language structural correspondence learning.

Topics

text classification

Appears in 25 sentences as: Text Classification (1) text classification (23) text classifier (1)
In Cross-Language Text Classification Using Structural Correspondence Learning
  1. We present a new approach to cross-language text classification that builds on structural correspondence learning, a recently proposed theory for domain adaptation.
    Page 1, “Abstract”
  2. This paper deals with cross-language text classification problems.
    Page 1, “Introduction”
  3. Stated precisely: We are given a text classification task 7 in a target language ’2' for which no labeled documents are available.
    Page 1, “Introduction”
  4. Such type of cross-language text classification problems are addressed by constructing a classifier f5 with training documents written in S and by applying f3 to unlabeled documents written in ’2'.
    Page 1, “Introduction”
  5. Here we propose a different approach to cross-language text classification which adopts ideas from the field of multitask learning (Ando and Zhang, 2005a).
    Page 1, “Introduction”
  6. Contributions Our contributions to the outlined field are threefold: First, the identification and utilization of the theory of SCL to cross-language text classification , which has, to the best of our knowledge, not been investigated before.
    Page 2, “Introduction”
  7. Second, the further development and adaptation of SCL towards a technology that is competitive with the state-of-the-art in cross-language text classification .
    Page 2, “Introduction”
  8. Section 3 states the terminology for cross-language text classification .
    Page 2, “Introduction”
  9. Section 4 describes our main contribution, a new approach to cross-language text classification based on structural correspondence learning.
    Page 2, “Introduction”
  10. Cross-Language Text Classification Bel et al.
    Page 2, “Related Work”
  11. (2003) belong to the first who explicitly considered the problem of cross-language text classification .
    Page 2, “Related Work”

See all papers in Proc. ACL 2010 that mention text classification.

See all papers in Proc. ACL that mention text classification.

Back to top.

cross-lingual

Appears in 15 sentences as: Cross-Lingual (1) cross-lingual (15)
In Cross-Language Text Classification Using Structural Correspondence Learning
  1. The approach uses unlabeled documents, along with a simple word translation oracle, in order to induce task-specific, cross-lingual word correspondences.
    Page 1, “Abstract”
  2. As we will see, a small number of pivots can capture a sufficiently large part of the correspondences between S and ’2' in order to (1) construct a cross-lingual representation and (2) learn a classifier fly for the task 7 that operates on this representation.
    Page 1, “Introduction”
  3. Third, an in-depth analysis with respect to important hyperparameters such as the ratio of labeled and unlabeled documents, the number of pivots, and the optimum dimensionality of the cross-lingual representation.
    Page 2, “Introduction”
  4. One way to overcome this “feature barrier” is to find a cross-lingual representation for documents written in S and ’2', which enables the transfer of classification knowledge between the two languages.
    Page 3, “Cross-Language Text Classification”
  5. Intuitively, one can understand such a cross-lingual representation as a concept space that underlies both languages.
    Page 3, “Cross-Language Text Classification”
  6. In the following, we will use 6 to denote a map that associates the original |V|-dimensional representation of a document d written in S or ’2' with its cross-lingual representation.
    Page 3, “Cross-Language Text Classification”
  7. Once such a mapping is found the cross-language text classification problem reduces to a standard classification problem in the cross-lingual space.
    Page 3, “Cross-Language Text Classification”
  8. The resulting classifier fST, which will operate in the cross-lingual setting, is defined as follows:
    Page 5, “Cross-Language Structural Correspondence Learning”
  9. In this study, we are interested in whether the cross-lingual representation induced by CL-SCL captures the difference between positive and negative reviews; by balancing the reviews we ensure that the imbalance does not affect the learned model.
    Page 6, “Experiments”
  10. Since SGD is sensitive to feature scaling the projection 6X is post-processed as follows: (1) Each feature of the cross-lingual representation is standardized to zero mean and unit variance, where mean and variance are estimated on D5 U Du.
    Page 6, “Experiments”
  11. (2) The cross-lingual document representations are scaled by a constant 04 such that IDsl‘l 2x619.
    Page 6, “Experiments”

See all papers in Proc. ACL 2010 that mention cross-lingual.

See all papers in Proc. ACL that mention cross-lingual.

Back to top.

sentiment classification

Appears in 9 sentences as: sentiment classification (9)
In Cross-Language Text Classification Using Structural Correspondence Learning
  1. We conduct experiments in the field of cross-language sentiment classification , employing English as source language, and German, French, and Japanese as target languages.
    Page 1, “Abstract”
  2. 7 may be a spam filtering task, a topic categorization task, or a sentiment classification task.
    Page 1, “Introduction”
  3. In this connection we compile extensive corpora in the languages English, German, French, and Japanese, and for different sentiment classification tasks.
    Page 2, “Introduction”
  4. Section 5 presents experimental results in the context of cross-language sentiment classification .
    Page 2, “Introduction”
  5. Considering our sentiment classification example, the word pair {excellent3, exzellentT} satisfies both conditions: (1) the words are strong indicators of positive sentiment,
    Page 4, “Cross-Language Structural Correspondence Learning”
  6. We evaluate CL—SCL for the task of cross-language sentiment classification using English as source language and German, French, and Japanese as target languages.
    Page 6, “Experiments”
  7. We compiled a new dataset for cross-language sentiment classification by crawling product reviews from Amazon.
    Page 6, “Experiments”
  8. Statistical machine translation technology offers a straightforward solution to the problem of cross-language text classification and has been used in a number of cross-language sentiment classification studies (Hiroshi et al., 2004; Bautin et al., 2008; Wan, 2009).
    Page 7, “Experiments”
  9. The analysis covers performance and robustness issues in the context of cross-language sentiment classification with English as source language and German, French, and Japanese as target languages.
    Page 9, “Conclusion”

See all papers in Proc. ACL 2010 that mention sentiment classification.

See all papers in Proc. ACL that mention sentiment classification.

Back to top.

machine translation

Appears in 8 sentences as: machine translation (8)
In Cross-Language Text Classification Using Structural Correspondence Learning
  1. For the application of f3 under language ’2' different approaches are current practice: machine translation of unlabeled documents from ’2' to S, dictionary-based translation of unlabeled
    Page 1, “Introduction”
  2. The approach solves the classification problem directly, instead of resorting to a more general and potentially much harder problem such as machine translation .
    Page 2, “Introduction”
  3. Recent work in cross-language text classification focuses on the use of automatic machine translation technology.
    Page 2, “Related Work”
  4. Most of these methods involve two steps: (1) translation of the documents into the source or the target language, and (2) dimensionality reduction or semi-supervised learning to reduce the noise introduced by the machine translation .
    Page 2, “Related Work”
  5. We chose a machine translation baseline to compare CL—SCL to another cross-language method.
    Page 7, “Experiments”
  6. Statistical machine translation technology offers a straightforward solution to the problem of cross-language text classification and has been used in a number of cross-language sentiment classification studies (Hiroshi et al., 2004; Bautin et al., 2008; Wan, 2009).
    Page 7, “Experiments”
  7. This difference can be explained by the fact that machine translation works better for European than for Asian languages such as Japanese.
    Page 7, “Experiments”
  8. The results show that CL—SCL is competitive with state-of-the-art machine translation technology while requiring fewer resources.
    Page 9, “Conclusion”

See all papers in Proc. ACL 2010 that mention machine translation.

See all papers in Proc. ACL that mention machine translation.

Back to top.

unlabeled data

Appears in 8 sentences as: Unlabeled Data (1) Unlabeled data (1) unlabeled data (6)
In Cross-Language Text Classification Using Structural Correspondence Learning
  1. We report on analyses that reveal quantitative insights about the use of unlabeled data and the complexity of inter-language correspondence modeling.
    Page 1, “Abstract”
  2. In the basic domain adaptation setting we are given labeled data from the source domain and unlabeled data from the target domain, and the goal is to train a classifier for the target domain.
    Page 2, “Related Work”
  3. SCL then models the correlation between the pivots and all other features by training linear classifiers on the unlabeled data from both domains.
    Page 3, “Related Work”
  4. Ando and Zhang (2005b) present a semi-supervised learning method based on this paradigm, which generates related tasks from unlabeled data .
    Page 3, “Related Work”
  5. Note that the support of 2125 and 2127 can be determined from the unlabeled data Du.
    Page 4, “Cross-Language Structural Correspondence Learning”
  6. Input: Labeled source data D5 Unlabeled data Du 2 D5,” U D1,,
    Page 5, “Cross-Language Structural Correspondence Learning”
  7. Due to the use of task-specific, unlabeled data , relevant characteristics are captured by the pivot classifiers.
    Page 8, “Experiments”
  8. Unlabeled Data The first row of Figure 2 shows the performance of CL-SCL as a function of the ratio of labeled and unlabeled documents.
    Page 9, “Experiments”

See all papers in Proc. ACL 2010 that mention unlabeled data.

See all papers in Proc. ACL that mention unlabeled data.

Back to top.

domain adaptation

Appears in 7 sentences as: Domain Adaptation (1) Domain adaptation (1) domain adaptation (6)
In Cross-Language Text Classification Using Structural Correspondence Learning
  1. We present a new approach to cross-language text classification that builds on structural correspondence learning, a recently proposed theory for domain adaptation .
    Page 1, “Abstract”
  2. Our approach builds upon structural correspondence learning, SCL, a recently proposed theory for domain adaptation in the field of natural language processing (Blitzer et al., 2006).
    Page 1, “Introduction”
  3. Domain Adaptation Domain adaptation refers to the problem of adapting a statistical classifier trained on data from one (or more) source domains (e.g., newswire texts) to a different target domain (e.g., legal texts).
    Page 2, “Related Work”
  4. In the basic domain adaptation setting we are given labeled data from the source domain and unlabeled data from the target domain, and the goal is to train a classifier for the target domain.
    Page 2, “Related Work”
  5. The latter setting is referred to as unsupervised domain adaptation .
    Page 2, “Related Work”
  6. Note that, cross-language text classification can be cast as an unsupervised domain adaptation problem by considering each language as a separate domain.
    Page 3, “Related Work”
  7. (2006) propose an effective algorithm for unsupervised domain adaptation , called structural correspondence learning.
    Page 3, “Related Work”

See all papers in Proc. ACL 2010 that mention domain adaptation.

See all papers in Proc. ACL that mention domain adaptation.

Back to top.

SVD

Appears in 6 sentences as: SVD (6)
In Cross-Language Text Classification Using Structural Correspondence Learning
  1. UZVT = SVD (W)
    Page 5, “Cross-Language Structural Correspondence Learning”
  2. COMPUTESVD(W, k) UZVT = SVD (W)
    Page 5, “Cross-Language Structural Correspondence Learning”
  3. By computing SVD (W) one obtains a compact representation of this column space in the form of an orthonormal basis 6T.
    Page 5, “Cross-Language Structural Correspondence Learning”
  4. The computational bottleneck of CL-SCL is the SVD of the dense parameter matrix W. Here we follow Blitzer et al.
    Page 6, “Experiments”
  5. For the SVD computation the Lanczos algorithm provided by SVDLIBC is employed.4 We investigated an alternative approach to obtain a sparse W by directly enforcing sparse pivot predictors w; through Ll-regularization (Tsuruoka et al., 2009), but didn’t pursue this strategy due to unstable results.
    Page 6, “Experiments”
  6. Obviously the SVD is crucial to the success of CL-SCL if m is sufficiently large.
    Page 9, “Experiments”

See all papers in Proc. ACL 2010 that mention SVD.

See all papers in Proc. ACL that mention SVD.

Back to top.

classification task

Appears in 5 sentences as: classification task (4) classification tasks (1)
In Cross-Language Text Classification Using Structural Correspondence Learning
  1. Stated precisely: We are given a text classification task 7 in a target language ’2' for which no labeled documents are available.
    Page 1, “Introduction”
  2. 7 may be a spam filtering task, a topic categorization task, or a sentiment classification task .
    Page 1, “Introduction”
  3. In this connection we compile extensive corpora in the languages English, German, French, and Japanese, and for different sentiment classification tasks .
    Page 2, “Introduction”
  4. We refer to this classification task as the target task.
    Page 4, “Cross-Language Structural Correspondence Learning”
  5. For each of the nine target-language-category-combinations a text classification task is created by taking the training set of the product category in S and the test set of the same product category in ’2'.
    Page 6, “Experiments”

See all papers in Proc. ACL 2010 that mention classification task.

See all papers in Proc. ACL that mention classification task.

Back to top.

hyperparameters

Appears in 5 sentences as: hyperparameter (1) hyperparameters (4)
In Cross-Language Text Classification Using Structural Correspondence Learning
  1. Third, an in-depth analysis with respect to important hyperparameters such as the ratio of labeled and unlabeled documents, the number of pivots, and the optimum dimensionality of the cross-lingual representation.
    Page 2, “Introduction”
  2. Special emphasis is put on corpus construction, determination of upper bounds and baselines, and a sensitivity analysis of important hyperparameters .
    Page 6, “Experiments”
  3. SGD receives two hyperparameters as input: the number of iterations T, and the regularization parameter A.
    Page 6, “Experiments”
  4. Recall that CL-SCL receives three hyperparameters as input: the number of pivots m, the dimensionality of the cross-lingual representation k,
    Page 7, “Experiments”
  5. In the following we discuss the sensitivity of each hyperparameter in isolation while keeping
    Page 8, “Experiments”

See all papers in Proc. ACL 2010 that mention hyperparameters.

See all papers in Proc. ACL that mention hyperparameters.

Back to top.

parallel corpus

Appears in 5 sentences as: parallel corpus (5)
In Cross-Language Text Classification Using Structural Correspondence Learning
  1. The approach uses unlabeled documents from both languages along with a small number (100 - 500) of translated words, instead of employing a parallel corpus or an extensive bilingual dictionary.
    Page 2, “Introduction”
  2. (1997) is considered as seminal work in CLIR: they propose a method which induces semantic correspondences between two languages by performing latent semantic analysis, LSA, on a parallel corpus .
    Page 2, “Related Work”
  3. The major limitation of these approaches is their computational complexity and, in particular, the dependence on a parallel corpus , which is hard to obtain—especially for less resource-rich languages.
    Page 2, “Related Work”
  4. Gliozzo and Strapparava (2005) circumvent the dependence on a parallel corpus by using so-called multilingual domain models, which can be acquired from comparable corpora in an unsupervised manner.
    Page 2, “Related Work”
  5. For instance, cross-language latent semantic indexing (Dumais et al., 1997) and cross-language explicit semantic analysis (Potthast et al., 2008) estimate 6 using a parallel corpus .
    Page 3, “Cross-Language Text Classification”

See all papers in Proc. ACL 2010 that mention parallel corpus.

See all papers in Proc. ACL that mention parallel corpus.

Back to top.

labeled data

Appears in 4 sentences as: labeled data (4)
In Cross-Language Text Classification Using Structural Correspondence Learning
  1. In the basic domain adaptation setting we are given labeled data from the source domain and unlabeled data from the target domain, and the goal is to train a classifier for the target domain.
    Page 2, “Related Work”
  2. Beyond this setting one can further distinguish whether a small amount of labeled data from the target domain is available (Daume, 2007; Finkel and Manning, 2009) or not (Blitzer et al., 2006; Jiang and Zhai, 2007).
    Page 2, “Related Work”
  3. (2007) apply structural learning to image classification in settings where little labeled data is given.
    Page 3, “Related Work”
  4. The confidence, however, can only be determined for 2125 since the setting gives us access to labeled data from 8 only.
    Page 4, “Cross-Language Structural Correspondence Learning”

See all papers in Proc. ACL 2010 that mention labeled data.

See all papers in Proc. ACL that mention labeled data.

Back to top.

feature space

Appears in 3 sentences as: feature space (3)
In Cross-Language Text Classification Using Structural Correspondence Learning
  1. Le, documents from the training set and the test set map on two non-overlapping regions of the feature space .
    Page 3, “Cross-Language Text Classification”
  2. MASK(x, pl) is a function that returns a copy of x where the components associated with the two words in p; are set to zero—which is equivalent to removing these words from the feature space .
    Page 5, “Cross-Language Structural Correspondence Learning”
  3. Since (6Tv)T = VT6 it follows that this view of CL-SCL corresponds to the induction of a new feature space given by Equation 2.
    Page 5, “Cross-Language Structural Correspondence Learning”

See all papers in Proc. ACL 2010 that mention feature space.

See all papers in Proc. ACL that mention feature space.

Back to top.

feature vector

Appears in 3 sentences as: feature vector (3)
In Cross-Language Text Classification Using Structural Correspondence Learning
  1. In standard text classification, a document d is represented under the bag-of-words model as |V|-dimensional feature vector x E X, where V, the vocabulary, denotes an ordered set of words, :ci 6 x denotes the normalized frequency of word 2' in d, and X is an inner product space.
    Page 3, “Cross-Language Text Classification”
  2. D3 denotes the training set and comprises tuples of the form (X, 3/), which associate a feature vector x E X with a class label 3/ E Y.
    Page 3, “Cross-Language Text Classification”
  3. A document d is described as normalized feature vector x under a unigram bag-of-words document representation.
    Page 6, “Experiments”

See all papers in Proc. ACL 2010 that mention feature vector.

See all papers in Proc. ACL that mention feature vector.

Back to top.

loss function

Appears in 3 sentences as: loss function (3)
In Cross-Language Text Classification Using Structural Correspondence Learning
  1. L is a loss function that measures the quality of the classifier, A is a nonnegative regularization parameter that penalizes model complexity, and ||w||2 = wTw.
    Page 3, “Cross-Language Text Classification”
  2. Different choices for L entail different classifier types; e.g., when choosing the hinge loss function for L one obtains the popular Support Vector Machine classifier (Zhang, 2004).
    Page 3, “Cross-Language Text Classification”
  3. In particular, the learning rate schedule from PEGASOS is adopted (Shalev-Shwartz et al., 2007), and the modified Huber loss, introduced by Zhang (2004), is chosen as loss function L.3
    Page 6, “Experiments”

See all papers in Proc. ACL 2010 that mention loss function.

See all papers in Proc. ACL that mention loss function.

Back to top.

weight vector

Appears in 3 sentences as: weight vector (2) weight vectors (1)
In Cross-Language Text Classification Using Structural Correspondence Learning
  1. wis a weight vector that parameterizes the classifier, denotes the matrix transpose.
    Page 3, “Cross-Language Text Classification”
  2. to constrain the hypothesis space, i.e., the space of possible weight vectors , of the target task by considering multiple different but related prediction tasks.
    Page 5, “Cross-Language Structural Correspondence Learning”
  3. The subspace is used to constrain the learning of the target task by restricting the weight vector w to lie in the subspace defined by 6T.
    Page 5, “Cross-Language Structural Correspondence Learning”

See all papers in Proc. ACL 2010 that mention weight vector.

See all papers in Proc. ACL that mention weight vector.

Back to top.

word pairs

Appears in 3 sentences as: word pair (1) word pairs (2)
In Cross-Language Text Classification Using Structural Correspondence Learning
  1. CL-SCL comprises three steps: In the first step, CL-SCL selects word pairs {2123,2127}, called pivots, where 7.03 E V3 and 2127 6 V7.
    Page 4, “Cross-Language Structural Correspondence Learning”
  2. Considering our sentiment classification example, the word pair {excellent3, exzellentT} satisfies both conditions: (1) the words are strong indicators of positive sentiment,
    Page 4, “Cross-Language Structural Correspondence Learning”
  3. Second, for each word 1115 6 VP we find its translation in the target vocabulary V7 by querying the translation oracle; we refer to the resulting set of word pairs as the candidate pivots, P’ :
    Page 4, “Cross-Language Structural Correspondence Learning”

See all papers in Proc. ACL 2010 that mention word pairs.

See all papers in Proc. ACL that mention word pairs.

Back to top.