Learning to Predict Distributions of Words Across Domains
Bollegala, Danushka and Weir, David and Carroll, John

Article Structure

Abstract

Although the distributional hypothesis has been applied successfully in many natural language processing tasks, systems using distributional information have been limited to a single domain because the distribution of a word can vary between domains as the word’s predominant meaning changes.

Introduction

The Distributional Hypothesis, summarised by the memorable line of Firth (1957) — You shall know a word by the company it keeps — has inspired a diverse range of research in natural language processing.

Related Work

Learning semantic representations for words using documents from a single domain has received much attention lately (Vincent et al., 2010; Socher et al., 2013; Baroni and Lenci, 2010).

Distribution Prediction

3.1 In-domain Feature Vector Construction

Domain Adaptation

The main reason that a model trained only on the source domain labeled data performs poorly in the target domain is the feature mismatch — few features in target domain test instances appear in source domain training instances.

Datasets

To evaluate DA for POS tagging, following Blitzer et al.

Experiments and Results

For each domain D in the SANCL (POS tagging) and Amazon review (sentiment classification) datasets, we create a PPMI weighted co-occurrence matrix FD.

O \

Topics

sentiment classification

Appears in 25 sentences as: Sentiment Classification (2) sentiment classification (22) sentiment classifier (3)
In Learning to Predict Distributions of Words Across Domains
  1. We evaluate our method on two tasks: cross-domain part-of-speech tagging and cross-domain sentiment classification .
    Page 1, “Abstract”
  2. For example, unsupervised cross-domain sentiment classification (Blitzer et al., 2007; Aue and Gamon, 2005) involves using sentiment-labeled user reviews from the source domain, and unlabeled reviews from both the source and the target domains to learn a sentiment classifier for the target domain.
    Page 1, “Introduction”
  3. Domain adaptation (DA) of sentiment classification becomes extremely challenging when the distributions of words in the source and the target domains are very different, because the features learnt from the source domain labeled reviews might not appear in the target domain reviews that must be classified.
    Page 1, “Introduction”
  4. 0 Using the learnt distribution prediction model, we propose a method to learn a cross-domain sentiment classifier .
    Page 2, “Introduction”
  5. Prior knowledge of the sentiment of words, such as sentiment lexicons, has been incorporated into cross-domain sentiment classification .
    Page 3, “Related Work”
  6. Although incorporation of prior sentiment knowledge is a promising technique to improve accuracy in cross-domain sentiment classification , it is complementary to our task of distribution prediction across domains.
    Page 3, “Related Work”
  7. Bigram features capture negations more accurately than unigrams, and have been found to be useful for sentiment classification tasks.
    Page 3, “Distribution Prediction”
  8. As we go on to show in Section 6, this enables us to use the same distribution prediction method for both POS tagging and sentiment classification .
    Page 5, “Distribution Prediction”
  9. We consider two DA tasks: (a) cross-domain POS tagging (Section 4.1), and (b) cross-domain sentiment classification (Section 4.2).
    Page 5, “Domain Adaptation”
  10. 4.2 Cross-Domain Sentiment Classification
    Page 5, “Domain Adaptation”
  11. Unlike in POS tagging, where we must individually tag each word in a target domain test sentence, in sentiment classification we must classify the sentiment for the entire review.
    Page 5, “Domain Adaptation”

See all papers in Proc. ACL 2014 that mention sentiment classification.

See all papers in Proc. ACL that mention sentiment classification.

Back to top.

POS tagging

Appears in 22 sentences as: POS tagged (1) POS tagger (2) POS taggers (1) POS Tagging (2) POS tagging (16)
In Learning to Predict Distributions of Words Across Domains
  1. 0 Using the learnt distribution prediction model, we propose a method to learn a cross-domain POS tagger .
    Page 2, “Introduction”
  2. words that appear in both the source and target domains) to adapt a POS tagger to a target domain.
    Page 2, “Related Work”
  3. Choi and Palmer (2012) propose a cross-domain POS tagging method by training two separate models: a generalised model and a domain-specific model.
    Page 2, “Related Work”
  4. Adding latent states to the smoothing model further improves the POS tagging accuracy (Huang and Yates, 2012).
    Page 2, “Related Work”
  5. As we go on to show in Section 6, this enables us to use the same distribution prediction method for both POS tagging and sentiment classification.
    Page 5, “Distribution Prediction”
  6. We consider two DA tasks: (a) cross-domain POS tagging (Section 4.1), and (b) cross-domain sentiment classification (Section 4.2).
    Page 5, “Domain Adaptation”
  7. 4.1 Cross-Domain POS Tagging
    Page 5, “Domain Adaptation”
  8. manually POS tagged ) sentence, we select its neighbours 7N) in the source domain as additional features.
    Page 5, “Domain Adaptation”
  9. Unlike in POS tagging , where we must individually tag each word in a target domain test sentence, in sentiment classification we must classify the sentiment for the entire review.
    Page 5, “Domain Adaptation”
  10. For both POS tagging and sentiment classification, we experimented with several alternative approaches for feature weighting, representation, and similarity measures using development data, which we randomly selected from the training instances from the datasets described in Section 5.
    Page 6, “Domain Adaptation”
  11. For POS tagging , we measured the effect of varying 7“, the number of distributional features, using a development dataset.
    Page 6, “Domain Adaptation”

See all papers in Proc. ACL 2014 that mention POS tagging.

See all papers in Proc. ACL that mention POS tagging.

Back to top.

bigrams

Appears in 15 sentences as: Bigram (1) bigram (5) Bigrams (1) bigrams (11)
In Learning to Predict Distributions of Words Across Domains
  1. For this purpose, we represent a word 21) using unigrams and bigrams that co-occur with w in a sentence as follows.
    Page 3, “Distribution Prediction”
  2. Next, we generate bigrams of word lemmas and remove any bigrams that consists only of stop words.
    Page 3, “Distribution Prediction”
  3. Bigram features capture negations more accurately than unigrams, and have been found to be useful for sentiment classification tasks.
    Page 3, “Distribution Prediction”
  4. Table 1 shows the unigram and bigram features we extract for a sentence using this procedure.
    Page 3, “Distribution Prediction”
  5. unigrams this, be, an, interest, and, well, research, book (lemma) unigrams interest, well, research, book (features) bigrams this+be, be+an, an+interest, interest+and, (lemma) and+well, well+research, research+book bigrams an+interest, interest+and, and+well, (features) well+research, research+book
    Page 3, “Distribution Prediction”
  6. Table 1: Extracting unigram and bigram features.
    Page 3, “Distribution Prediction”
  7. main, we construct a feature co-occurrence matrix A in which columns correspond to unigram features and rows correspond to either unigram or bigram features.
    Page 3, “Distribution Prediction”
  8. Typically, the number of unique bigrams is much larger than that of unigrams.
    Page 3, “Distribution Prediction”
  9. Moreover, co-occurrences of bigrams are rare compared to co-occurrences of unigrams, and co-occurrences involving a unigram and a bigram .
    Page 3, “Distribution Prediction”
  10. Consequently, in matrix A, we consider co-occurrences only between unigrams vs. unigrams, and bigrams vs. unigrams.
    Page 3, “Distribution Prediction”
  11. unigrams or bigrams ) in a particular domain over the unigram features extracted from that domain (represented by the columns of A).
    Page 3, “Distribution Prediction”

See all papers in Proc. ACL 2014 that mention bigrams.

See all papers in Proc. ACL that mention bigrams.

Back to top.

unigrams

Appears in 15 sentences as: unigram (8) unigrams (14)
In Learning to Predict Distributions of Words Across Domains
  1. For this purpose, we represent a word 21) using unigrams and bigrams that co-occur with w in a sentence as follows.
    Page 3, “Distribution Prediction”
  2. Using a standard stop word list, we filter out frequent non-content unigrams and select the remainder as unigram features to represent a sentence.
    Page 3, “Distribution Prediction”
  3. Bigram features capture negations more accurately than unigrams , and have been found to be useful for sentiment classification tasks.
    Page 3, “Distribution Prediction”
  4. Table 1 shows the unigram and bigram features we extract for a sentence using this procedure.
    Page 3, “Distribution Prediction”
  5. unigrams this, is, an, interesting, and, well, researched, (surface) book
    Page 3, “Distribution Prediction”
  6. unigrams this, be, an, interest, and, well, research, book (lemma) unigrams interest, well, research, book (features) bigrams this+be, be+an, an+interest, interest+and, (lemma) and+well, well+research, research+book bigrams an+interest, interest+and, and+well, (features) well+research, research+book
    Page 3, “Distribution Prediction”
  7. Table 1: Extracting unigram and bigram features.
    Page 3, “Distribution Prediction”
  8. main, we construct a feature co-occurrence matrix A in which columns correspond to unigram features and rows correspond to either unigram or bigram features.
    Page 3, “Distribution Prediction”
  9. Typically, the number of unique bigrams is much larger than that of unigrams .
    Page 3, “Distribution Prediction”
  10. Moreover, co-occurrences of bigrams are rare compared to co-occurrences of unigrams, and co-occurrences involving a unigram and a bigram.
    Page 3, “Distribution Prediction”
  11. Consequently, in matrix A, we consider co-occurrences only between unigrams vs. unigrams, and bigrams vs. unigrams .
    Page 3, “Distribution Prediction”

See all papers in Proc. ACL 2014 that mention unigrams.

See all papers in Proc. ACL that mention unigrams.

Back to top.

feature space

Appears in 11 sentences as: feature space (9) feature spaces (4)
In Learning to Predict Distributions of Words Across Domains
  1. tent feature spaces separately for the source and the target domains using Singular Value Decomposition (SVD).
    Page 2, “Introduction”
  2. Second, we learn a mapping from the source domain latent feature space to the target domain latent feature space using Partial Least Square Regression (PLSR).
    Page 2, “Introduction”
  3. The SVD smoothing in the first step both reduces the data sparseness in distributional representations of individual words, as well as the dimensionality of the feature space , thereby enabling us to efficiently and accurately learn a prediction model using PLSR in the second step.
    Page 2, “Introduction”
  4. To reduce the dimensionality of the feature space , and create dense representations for words, we perform SVD on F. We use the left singular vectors corresponding to the k largest singular values to compute a rank k approximation F, of F. We perform truncated SVD using SVDLIBCZ.
    Page 4, “Distribution Prediction”
  5. Each row in F is considered as representing a word in a lower k (<<nc) dimensional feature space corresponding to a particular domain.
    Page 4, “Distribution Prediction”
  6. Distribution prediction in this lower dimensional feature space is preferrable to prediction over the original feature space because there are reductions in overfit-ting, feature sparseness, and the learning time.
    Page 4, “Distribution Prediction”
  7. Chemometrics (Geladi and Kowalski, 1986), producing stable prediction models even when the number of samples is considerably smaller than the dimensionality of the feature space .
    Page 4, “Distribution Prediction”
  8. increased the train time due to the larger feature space .
    Page 6, “Domain Adaptation”
  9. Therefore, when the overlap be-ween the vocabularies used in the source and the arget domains is small, fired cannot reduce the mismatch between the feature spaces .
    Page 7, “Experiments and Results”
  10. All methods are evalu-ted under the same settings, including train/test plit, feature spaces , pivots, and classification al-;orithms so that any differences in performance an be directly attributable to their domain adapt-,bility.
    Page 7, “Experiments and Results”
  11. Because the dimensionality of the source and target domain feature spaces is equal to h, the complexity of the least square regression problem increases with h. Therefore, larger k values result in overfitting to the train data and classification accuracy is reduced on the target test data.
    Page 9, “O \”

See all papers in Proc. ACL 2014 that mention feature space.

See all papers in Proc. ACL that mention feature space.

Back to top.

labeled data

Appears in 10 sentences as: labeled data (10)
In Learning to Predict Distributions of Words Across Domains
  1. Our proposed cross-domain word distribution prediction method is unsupervised in the sense that it does not require any labeled data in either of the two steps.
    Page 2, “Introduction”
  2. (2006) append the source domain labeled data with predicted pivots (i.e.
    Page 2, “Related Work”
  3. The unsupervised DA setting that we consider does not assume the availability of labeled data for the target domain.
    Page 3, “Related Work”
  4. However, if a small amount of labeled data is available for the target domain, it can be used to further improve the performance of DA tasks (Xiao et al., 2013; Daume III, 2007).
    Page 3, “Related Work”
  5. Our distribution prediction learning method is unsupervised in the sense that it does not require manually labeled data for a particular task from any of the domains.
    Page 5, “Distribution Prediction”
  6. The main reason that a model trained only on the source domain labeled data performs poorly in the target domain is the feature mismatch — few features in target domain test instances appear in source domain training instances.
    Page 5, “Domain Adaptation”
  7. (2006), we use sections 2 — 21 from Wall Street Journal (WSJ) as the source domain labeled data .
    Page 6, “Datasets”
  8. For each domain, the accuracy obtained y a classifier trained using labeled data from that
    Page 7, “Experiments and Results”
  9. This upper baseline represents the classification accuracy we could hope to obtain if we were to have labeled data for the target domain.
    Page 8, “Experiments and Results”
  10. Unlike our distribution prediction method, which is unsupervised, SST requires labeled data for the source domain to learn a feature mapping between a source and a target domain in the form of a thesaurus.
    Page 8, “O \”

See all papers in Proc. ACL 2014 that mention labeled data.

See all papers in Proc. ACL that mention labeled data.

Back to top.

SVD

Appears in 9 sentences as: SVD (10)
In Learning to Predict Distributions of Words Across Domains
  1. tent feature spaces separately for the source and the target domains using Singular Value Decomposition ( SVD ).
    Page 2, “Introduction”
  2. The SVD smoothing in the first step both reduces the data sparseness in distributional representations of individual words, as well as the dimensionality of the feature space, thereby enabling us to efficiently and accurately learn a prediction model using PLSR in the second step.
    Page 2, “Introduction”
  3. Linear predictors are then learnt to predict the occurrence of those pivots, and SVD is used to construct a lower dimensional representation in which a binary classifier is trained.
    Page 2, “Related Work”
  4. To reduce the dimensionality of the feature space, and create dense representations for words, we perform SVD on F. We use the left singular vectors corresponding to the k largest singular values to compute a rank k approximation F, of F. We perform truncated SVD using SVDLIBCZ.
    Page 4, “Distribution Prediction”
  5. The number of singular vectors k selected in SVD , and the number of PLSR dimensions L are set respectively to 1000 and 50 for the remainder of the experiments described in the paper.
    Page 7, “Experiments and Results”
  6. To evaluate the overall effect of the number of singular vectors k used in the SVD step, and the number of PLSR components L used in Algorithm 1, we conduct two experiments.
    Page 8, “O \”
  7. 2000 SVD dimensions
    Page 9, “O \”
  8. Figure 3: The effect of SVD dimensions.
    Page 9, “O \”
  9. To evaluate the effect of the SVD dimensions, we fixed L = 100 and measured the cross-domain sentiment classification accuracy for different k values as shown in Figure 3.
    Page 9, “O \”

See all papers in Proc. ACL 2014 that mention SVD.

See all papers in Proc. ACL that mention SVD.

Back to top.

feature vectors

Appears in 8 sentences as: Feature Vector (2) feature vector (2) feature vectors (4)
In Learning to Predict Distributions of Words Across Domains
  1. The created thesaurus is used to expand feature vectors during train and test stages in a binary classifier.
    Page 3, “Related Work”
  2. 3.1 In-domain Feature Vector Construction
    Page 3, “Distribution Prediction”
  3. 3.2 Cross-Domain Feature Vector Prediction
    Page 4, “Distribution Prediction”
  4. We model distribution prediction as a multivariate regression problem where, given a set {(109,109) £121 consisting of pairs of feature vectors selected from each domain for the pivots in W, we learn a mapping
    Page 4, “Distribution Prediction”
  5. In particular, PLSR fits a smaller number of latent variables (10 — 100 in practice) such that the correlation between the feature vectors for pivots in the two domains are maximised in this latent space.
    Page 4, “Distribution Prediction”
  6. First, we lemmatise each word in a source domain labeled review 335;), and extract both unigrams and bigrams as features to represent mg) by a binary-valued feature vector .
    Page 5, “Domain Adaptation”
  7. Next, we train a binary classification model, 6, using those feature vectors .
    Page 5, “Domain Adaptation”
  8. At test time, we represent a test target review H using a binary-valued feature vector h of unigrams and bigrams of lemmas of the words in H, as we did for source domain labeled train reviews.
    Page 5, “Domain Adaptation”

See all papers in Proc. ACL 2014 that mention feature vectors.

See all papers in Proc. ACL that mention feature vectors.

Back to top.

co-occurrence

Appears in 6 sentences as: co-occurrence (7)
In Learning to Predict Distributions of Words Across Domains
  1. main, we construct a feature co-occurrence matrix A in which columns correspond to unigram features and rows correspond to either unigram or bigram features.
    Page 3, “Distribution Prediction”
  2. The value of the element aij in the co-occurrence matrix A is set to the number of sentences in which the i-th and j-th features co-occur.
    Page 3, “Distribution Prediction”
  3. We apply Positive Pointwise Mutual Information (PPMI) to the co-occurrence matrix A.
    Page 3, “Distribution Prediction”
  4. Let F be the matrix that results when PPMI is applied to A. Matrix F has the same number of rows, mo, and columns, no, as the raw co-occurrence matrix A.
    Page 3, “Distribution Prediction”
  5. For example, one can limit the definition of co-occurrence to words that are linked by some dependency relation (Pado and Lapata, 2007), or extend the window of co-occurrence to the entire document (Baroni and Lenci, 2010).
    Page 3, “Distribution Prediction”
  6. For each domain D in the SANCL (POS tagging) and Amazon review (sentiment classification) datasets, we create a PPMI weighted co-occurrence matrix FD.
    Page 6, “Experiments and Results”

See all papers in Proc. ACL 2014 that mention co-occurrence.

See all papers in Proc. ACL that mention co-occurrence.

Back to top.

binary classifier

Appears in 5 sentences as: binary classification (2) binary classifier (3)
In Learning to Predict Distributions of Words Across Domains
  1. Linear predictors are then learnt to predict the occurrence of those pivots, and SVD is used to construct a lower dimensional representation in which a binary classifier is trained.
    Page 2, “Related Work”
  2. The created thesaurus is used to expand feature vectors during train and test stages in a binary classifier .
    Page 3, “Related Work”
  3. Next, we train a binary classification model, 6, using those feature vectors.
    Page 5, “Domain Adaptation”
  4. Any binary classification algorithm can be used to learn 6.
    Page 5, “Domain Adaptation”
  5. Finally, we classify h using the trained binary classifier 6.
    Page 6, “Domain Adaptation”

See all papers in Proc. ACL 2014 that mention binary classifier.

See all papers in Proc. ACL that mention binary classifier.

Back to top.

ua

Appears in 5 sentences as: ua (6)
In Learning to Predict Distributions of Words Across Domains
  1. Specifically, we measure the similarity, sim(ug), 103), between the source domain distributions of ua) and w, and select the top 7“ similar neighbours ua ) for each word 21) as additional features for 212.
    Page 5, “Domain Adaptation”
  2. The value of a neighbour ua ) selected as a distributional feature is set to its similarity score sim(ug), 103).
    Page 5, “Domain Adaptation”
  3. At test time, for each word 21) that appears in a target domain test sentence, we measure the similarity, sim(Mug), 107), and select the most similar 7“ words ua ) in the source domain labeled sentences as the distributional features for 212, with their values set to sim(Mug), wT).
    Page 5, “Domain Adaptation”
  4. sim(Mug), my) ), between the target domain distribution of 7.00), and each feature (unigram or bigram) ua ) in the source domain labeled reviews.
    Page 6, “Domain Adaptation”
  5. For representation, we considered distributional features ua ) in descending order of their scores given by Equation 4, and then taking the inverse-rank as the values for the distributional features (Bollegala et al., 2011).
    Page 6, “Domain Adaptation”

See all papers in Proc. ACL 2014 that mention ua.

See all papers in Proc. ACL that mention ua.

Back to top.

CRF

Appears in 5 sentences as: CRF (5)
In Learning to Predict Distributions of Words Across Domains
  1. Huang and Yates (2009) train a Conditional Random Field ( CRF ) tagger with features retrieved from a smoothing model trained using both source and target domain unlabeled data.
    Page 2, “Related Work”
  2. Next, we train a CRF model using all features (i.e.
    Page 5, “Domain Adaptation”
  3. Finally, the trained CRF model is applied to a target domain test sentence.
    Page 5, “Domain Adaptation”
  4. The L-BFGS (Liu and Nocedal, 1989) method is used to train the CRF and logistic regression models.
    Page 7, “Experiments and Results”
  5. Specifically, in POS tagging, a CRF trained on source domain labeled sentences is applied to target domain test sentences, whereas in sentiment classification, a logistic regression classifier trained using source domain labeled reviews is applied to the target domain test reviews.
    Page 7, “Experiments and Results”

See all papers in Proc. ACL 2014 that mention CRF.

See all papers in Proc. ACL that mention CRF.

Back to top.

similarity measures

Appears in 4 sentences as: similarity measure (2) similarity measures (3)
In Learning to Predict Distributions of Words Across Domains
  1. For both POS tagging and sentiment classification, we experimented with several alternative approaches for feature weighting, representation, and similarity measures using development data, which we randomly selected from the training instances from the datasets described in Section 5.
    Page 6, “Domain Adaptation”
  2. With respect to similarity measures, we experimented with cosine similarity and the similarity measure proposed by Lin (1998); cosine similarity performed consistently well over all the experimental settings.
    Page 6, “Domain Adaptation”
  3. The feature representation was held fixed during these similarity measure comparisons.
    Page 6, “Domain Adaptation”
  4. As an example of the distribution prediction method, in Table 3 we show the top 3 similar distributional features u in the books (source) domain, predicted for the electronics (target) domain word 21) = lightweight, by different similarity measures .
    Page 9, “O \”

See all papers in Proc. ACL 2014 that mention similarity measures.

See all papers in Proc. ACL that mention similarity measures.

Back to top.

distributional representations

Appears in 4 sentences as: distributional representation (1) Distributional representations (1) distributional representations (2)
In Learning to Predict Distributions of Words Across Domains
  1. Distributional representations of words have been successfully used in many language processing tasks such as entity set expansion (Pantel et al., 2009), part-of-speech (POS) tagging and chunking (Huang and Yates, 2009), ontology learning (Curran, 2005), computing semantic textual similarity (Besancon et al., 1999), and lexical inference (Kotlerman et al., 2012).
    Page 1, “Introduction”
  2. Consequently, the distributional representations of the word lightweight will differ considerably between the two domains.
    Page 1, “Introduction”
  3. The SVD smoothing in the first step both reduces the data sparseness in distributional representations of individual words, as well as the dimensionality of the feature space, thereby enabling us to efficiently and accurately learn a prediction model using PLSR in the second step.
    Page 2, “Introduction”
  4. We first create a distributional representation for a word using the data from a single domain, and then learn a Partial Least Square Regression (PLSR) model to predict the distribution of a word in a target domain given its distribution in a source domain.
    Page 9, “O \”

See all papers in Proc. ACL 2014 that mention distributional representations.

See all papers in Proc. ACL that mention distributional representations.

Back to top.

logistic regression

Appears in 3 sentences as: logistic regression (3)
In Learning to Predict Distributions of Words Across Domains
  1. In our experiments, we used L2 reg-ularised logistic regression .
    Page 5, “Domain Adaptation”
  2. The L-BFGS (Liu and Nocedal, 1989) method is used to train the CRF and logistic regression models.
    Page 7, “Experiments and Results”
  3. Specifically, in POS tagging, a CRF trained on source domain labeled sentences is applied to target domain test sentences, whereas in sentiment classification, a logistic regression classifier trained using source domain labeled reviews is applied to the target domain test reviews.
    Page 7, “Experiments and Results”

See all papers in Proc. ACL 2014 that mention logistic regression.

See all papers in Proc. ACL that mention logistic regression.

Back to top.

sentiment lexicon

Appears in 3 sentences as: sentiment lexicon (2) sentiment lexicons (1)
In Learning to Predict Distributions of Words Across Domains
  1. Prior knowledge of the sentiment of words, such as sentiment lexicons , has been incorporated into cross-domain sentiment classification.
    Page 3, “Related Work”
  2. (2011) propose a joint sentiment-topic model that imposes a sentiment-prior depending on the occurrence of a word in a sentiment lexicon .
    Page 3, “Related Work”
  3. A sentiment lexicon is used to create features for a document.
    Page 3, “Related Work”

See all papers in Proc. ACL 2014 that mention sentiment lexicon.

See all papers in Proc. ACL that mention sentiment lexicon.

Back to top.

significantly outperforms

Appears in 3 sentences as: significantly outperform (1) significantly outperforms (2)
In Learning to Predict Distributions of Words Across Domains
  1. In both tasks, our method significantly outperforms competitive baselines and returns results that are statistically comparable to current state-of-the-art methods, while requiring no task-specific customisations.
    Page 1, “Abstract”
  2. Without requiring any task specific customisations, systems based on our distribution prediction method significantly outperform competitive baselines in both tasks.
    Page 2, “Introduction”
  3. Except for the DE setting in which Proposed method significantly outperforms both SFA and SCL, the performance of the Proposed method is not statistically significantly different to that of SFA or SCL.
    Page 8, “Experiments and Results”

See all papers in Proc. ACL 2014 that mention significantly outperforms.

See all papers in Proc. ACL that mention significantly outperforms.

Back to top.

domain adaptation

Appears in 3 sentences as: Domain adaptation (1) domain adaptation (2)
In Learning to Predict Distributions of Words Across Domains
  1. Domain adaptation (DA) of sentiment classification becomes extremely challenging when the distributions of words in the source and the target domains are very different, because the features learnt from the source domain labeled reviews might not appear in the target domain reviews that must be classified.
    Page 1, “Introduction”
  2. We evaluated the proposed method in two domain adaptation tasks: cross-domain POS tagging and cross-domain sentiment classification.
    Page 9, “O \”
  3. Our experiments show that without requiring any task-specific customisations to our distribution prediction method, it outperforms competitive baselines and achieves comparable results to the current state-of-the-art domain adaptation methods.
    Page 9, “O \”

See all papers in Proc. ACL 2014 that mention domain adaptation.

See all papers in Proc. ACL that mention domain adaptation.

Back to top.