Abstract | This study presents a novel approach to the problem of system portability across different domains: a sentiment annotation system that integrates a corpus-based classifier trained on a small set of annotated in-domain data and a lexicon-based system trained on WordNet. |
Abstract | The paper explores the challenges of system portability across domains and text genres (movie reviews, news, blogs, and product reviews), highlights the factors affecting system performance on out-of-domain and small—set in-domain data, and presents a new system consisting of the ensemble of two classifiers with precision-based vote weighting, that provides significant gains in accuracy and recall over the corpus-based classifier and the lexicon-based system taken individually. |
Domain Adaptation in Sentiment Research | There are two alternatives to supervised machine learning that can be used to get around this problem: on the one hand, general lists of sentiment clues/features can be acquired from domain-independent sources such as dictionaries or the Internet, on the other hand, unsupervised and weakly-supervised approaches can be used to take advantage of a small number of annotated in-domain examples and/or of unlabelled in-domain data. |
Domain Adaptation in Sentiment Research | But such general word lists were shown to perform worse than statistical models built on sufficiently large in-domain training sets of movie reviews (Pang et al., 2002). |
Domain Adaptation in Sentiment Research | For instance, Aue and Gamon (2005) proposed training on a samll number of labeled examples and large quantities of unlabelled in-domain data. |
Introduction | Many applications require reliable processing of heterogeneous corpora, such as the World Wide Web, where the diversity of genres and domains present in the Internet limits the feasibility of in-domain training. |
Introduction | A number of methods has been proposed in order to overcome this system portability limitation by using out-of-domain data, unlabelled in-domain corpora or a combination of in-domain and out-of-domain examples (Aue and Gamon, 2005; Bai et al., 2005; Drezde et al., 2007; Tan et al., 2007). |
Introduction | The information contained in lexicographical sources, such as WordNet, reflects a lay person’s general knowledge about the world, while domain-specific knowledge can be acquired through classifier training on a small set of in-domain data. |
Abstract | Most current data selection methods solely use language models trained on a small scale in-domain data to select domain-relevant sentence pairs from general-domain parallel corpus. |
Experiments | The in-domain data is collected from CWMT09, which consists of spoken dialogues in a travel setting, containing approximately 50,000 parallel sentence pairs in English and Chinese. |
Experiments | Bilingual Cor- #sentence #token P113 Eng Chn Eng Chn In-domain 50K 50K 360K 310K General-domain 1 6M 1 6M 3933M 3602M |
Experiments | Our work relies on the use of in-domain language models and translation models to rank the sentence pairs from the general-domain bilingual training set. |
Introduction | Current data selection methods mostly use language models trained on small scale in-domain data to measure domain relevance and select domain-relevant parallel sentence pairs to expand training corpora. |
Related Work | (2010) ranked the sentence pairs in the general-domain corpus according to the perplexity scores of sentences, which are computed with respect to in-domain language models. |
Training Data Selection Methods | These methods are based on language model and translation model, which are trained on small in-domain parallel data. |
Training Data Selection Methods | t(ej|fi) is the translation probability of word 61- conditioned on word fiand is estimated from the small in-domain parallel data. |
Training Data Selection Methods | The sentence pair with higher score is more likely to be generated by in-domain translation model, thus, it is more relevant to the in-domain corpus and will be remained to expand the training data. |
Experiments | We have introduced the out-of-domain data and the electronic in-domain lexicon in Section 3. |
Experiments | Besides the in-domain lexicon, we have collected respectively 1 million monolingual sentences in electronic area from the web. |
Experiments | We construct two kinds of phrase-based models using Moses (Koehn et al., 2007): one uses out-of-domain data and the other uses in-domain data. |
Introduction | Finally, they used the learned translation model directly to translate unseen data (Ravi and Knight, 2011; Nuhn et al., 2012) or incorporated the learned bilingual lexicon as a new in-domain translation resource into the phrase-based model which is trained with out-of-domain data to improve the domain adaptation performance in machine translation (Dou and Knight, 2012). |
Introduction | Since many researchers have studied the bilingual lexicon induction, in this paper, we mainly concentrate ourselves on phrase pair induction given a probabilistic bilingual lexicon and two in-domain large monolingual data (source and target language). |
Probabilistic Bilingual Lexicon Acquisition | In order to induce the phrase pairs from the in-domain monolingual data for domain adaptation, the probabilistic bilingual lexicon is essential. |
Probabilistic Bilingual Lexicon Acquisition | In this paper, we acquire the probabilistic bilingual lexicon from two approaches: 1) build a bilingual lexicon from large-scale out-of-domain parallel data; 2) adopt a manually collected in-domain lexicon. |
Probabilistic Bilingual Lexicon Acquisition | This paper uses Chinese-to-English translation as a case study and electronic data is the in-domain data we focus on. |
Related Work | One is using an in-domain probabilistic bilingual lexicon to extract sub-sentential parallel fragments from comparable corpora (Munteanu and Marcu, 2006; Quirk et al., 2007; Cettolo et al., 2010). |
Related Work | Munteanu and Marcu (2006) first extract the candidate parallel sentences from the comparable corpora and further extract the accurate sub-sentential bilingual fragments from the candidate parallel sentences using the in-domain probabilistic bilingual lexicon. |
Conclusion | However, our results are also positive in that we find that nearly all model summary caseframes can be found in the source text together with some in-domain documents. |
Conclusion | Domain inference, on the other hand, and a greater use of in-domain documents as a knowledge source for domain inference, are very promising indeed. |
Experiments | Traditional systems that perform semantic inference do so from a set of known facts about the domain in the form of a knowledge base, but as we have seen, most extractive summarization systems do not make much use of in-domain corpora. |
Experiments | We examine adding in-domain text to the source text to see how this would affect coverage. |
Experiments | As shown in Table 6, the effect of adding more in-domain text on caseframe coverage is substantial, and noticeably more than using out-of-domain text. |
Introduction | Third, we consider how domain knowledge may be useful as a resource for an abstractive system, by showing that key parts of model summaries can be reconstructed from the source plus related in-domain documents. |
Related Work | They were originally proposed in the context of single-document summarization, where they were calculated using in-domain (relevant) vs. out-of-domain (irrelevant) text. |
Related Work | In multi-document summarization, the in-domain text has been replaced by the source text cluster (Conroy et al., 2006), thus they are now |
Experiments | We test our system on two scenarios: In-domain : the system is trained and evaluated on the source domain (bn+nw, 5-fold cross validation); Out-of-domain: the system is trained on the source domain and evaluated on the target development set of bc (bc dev). |
Experiments | 5All the in-domain improvement in rows 2, 6, 7 of Table 2 are significant at confidence levels 2 95%. |
Experiments | HLBL embeddings of 50 and 100 dimensions, the most effective way to introduce word embeddings is to add embeddings to the heads of the two mentions (row 2; both in-domain and out-of-domain) although it is less pronounced for HLBL embedding with 50 dimensions. |
Introduction | In recent semantic role labeling (SRL) competitions such as the shared tasks of CoNLL 2005 and CoNLL 2008, supervised SRL systems have been trained on newswire text, and then tested on both an in-domain test set (Wall Street Journal text) and an out-of-domain test set (fiction). |
Introduction | We aim to build an open-domain supervised SRL system; that is, one whose performance on out-of-domain tests approaches the same level of performance as that of state-of-the-art systems on in-domain tests. |
Introduction | Experiments on out-of-domain test sets show that our learned representations can dramatically improve out—of-domain performance, and narrow the gap between in-domain and out-of-domain performance by half. |
Abstract | We study the polarity-bearing topics extracted by J ST and show that by augmenting the original feature space with polarity-bearing topics, the in-domain supervised classifiers learned from augmented feature representation achieve the state-of-the-art performance of 95% on the movie review data and an average of 90% on the multi-domain sentiment dataset. |
Introduction | We study the polarity-bearing topics extracted by the JST model and show that by augmenting the original feature space with polarity-bearing topics, the performance of in-domain supervised classifiers learned from augmented feature representation improves substantially, reaching the state-of-the-art results of 95% on the movie review data and an average of 90% on the multi-domain sentiment dataset. |
Joint Sentiment-Topic (J ST) Model | The adaptation loss is calculated with respect to the in-domain gold standard classification result. |
Joint Sentiment-Topic (J ST) Model | For example, the in-domain goal standard for the Book domain is 79.96%. |
Joint Sentiment-Topic (J ST) Model | Table 3: Adaptation loss with respect to the in-domain gold standard. |
Abstract | We achieve strong gains in NER performance on news, in-domain and out-of-domain, and on web queries. |
Conclusion | Even in-domain , we were able to get a smaller, but still noticeable improvement of 4.2% due to piggyback features. |
Experimental data | In our experiments, we train an NER classifier on an in-domain data set and test it on two different out-of-domain data sets. |
Experimental data | 3A reviewer points out that we use the terms in-domain and out-of-domain somewhat liberally. |
Experimental setup | For our in-domain evaluation, we tune T on a 10% development sample of the CoNLL data and test on the remaining 10%. |
Related work | Another source of world knowledge for NER is Wikipedia: Kazama and Torisawa (2007) show that pseudocategories extracted from Wikipedia help for in-domain NER. |
Results and discussion | Even though the emphasis of this paper is on cross-domain robustness, we can see that our approach also has clear in-domain benefits. |
Results and discussion | ment due to piggyback features increases as out-of-domain data become more different from the in-domain training set, performance declines in absolute terms from .930 (CoNLL) to .681 (IEER) and .438 (KDD-T). |
Language model adaptation | Many methods (Lin et al., 1997; Gao et al., 2002; Klakow, 2000; Moore and Lewis, 2010; AX-elrod et al., 2011) rank sentences in the general-domain data according to their similarity to the in-domain data and select only those with score higher than some threshold. |
Language model adaptation | However, sometimes it is hard to say whether a sentence is totally in-domain or out-of-domain; for example, quoted speech in a news report might be partly in-domain if the domain of interest is broadcast conversation. |
Language model adaptation | They first train two language models, pin on a set of in-domain data, and pout on a set of general-domain data. |
Predicting politeness | Classification results We evaluate the classifiers both in an in-domain setting, with a standard leave-one-out cross validation procedure, and in a cross-domain setting, where we train on one domain and test on the other (Table 4). |
Predicting politeness | For both our development and our test domains, and in both the in-domain and cross-domain settings, the linguistically informed features give 3-4% absolute improvement over the bag of words model. |
Predicting politeness | While the in-domain results are within 3% of human performance, the greater room for improvement in the cross-domain setting motivates further research on linguistic cues of politeness. |
Relation to social factors | Encouraged by the close-to-human performance of our in-domain classifiers, we use them to assign politeness labels to our full dataset and then compare these labels to independent measures of power and status in our data. |
Relation to social factors | In-domain Cross-domain Train Wiki SE Wiki SE Test Wiki SE SE Wiki |
Relation to social factors | Table 4: Accuracies of our two classifiers for Wikipedia (Wiki) and Stack Exchange (SE), for in-domain and cross-domain settings. |
Abstract | Pure statistical parsing systems achieves high in-domain accuracy but performs poorly out-domain. |
Dependency Parsing with HPSG | With the extra features, we hope that the training of the statistical model will not overfit the in-domain data, but be able to deal with domain independent linguistic phenomena as well. |
Experiment Results & Error Analyses | With both parsers, we see slight performance drops with both HP SG feature models on in-domain tests (WSJ), compared with the original models. |
Experiment Results & Error Analyses | When we look at the performance difference between in-domain and out-domain tests for each feature model, we observe that the drop is significantly smaller for the extended models with HP SG features. |
Experiment Results & Error Analyses | Admittedly the results on PCHEMTB are lower than the best reported results in CoNLL 2007 Shared Task, we shall note that we are not using any in-domain unlabeled data. |
Abstract | The general idea is first to create a vector profile for the in-domain development (“dev”) set. |
Experiments | The other is a linear combination of TMs trained on each subcorpus, with the weights of each model learned with an EM algorithm to maximize the likelihood of joint empirical phrase pair counts for in-domain deV data. |
Introduction | In transductive learning, an MT system trained on general domain data is used to translate in-domain monolingual data. |
Introduction | Data selection approaches (Zhao et al., 2004; Hildebrand et al., 2005; Lu et al., 2007; Moore and Lewis, 2010; Axelrod et al., 2011) search for bilingual sentence pairs that are similar to the in-domain “dev” data, then add them to the training data. |
Vector space model adaptation | For the in-domain dev set, we first run word alignment and phrases extracting in the usual way for the dev set, then sum the distribution of each phrase pair (fj, 6k) extracted from the dev data across subcorpora to represent its domain information. |
Introduction | The resulting systems yield results comparable to those from the same system trained on in-domain data, and statistically significantly outperform supervised extractive summarization approaches trained on in-domain data. |
Results | Table 3 indicates that, with both true clusterings and system clusterings, our system trained on out-of—domain data achieves comparable performance with the same system trained on in-domain data. |
Results | for OUR SYSTEM trained on IN-domain data |
Results | and OUT-of-domain data, and for the utterance-level extraction system (SVM-DA) trained on in-domain data. |
Conclusions | This paper has presented a constraint-based approach to in-domain relation discovery. |
Experimental Setup | methods ultimately aim to capture domain-specific relations expressed with varying verbalizations, and both operate over in-domain input corpora supplemented with syntactic information. |
Introduction | In this paper, we introduce a novel approach for the unsupervised learning of relations and their instantiations from a set of in-domain documents. |
Introduction | Clusters of similar in-domain documents are |
Model | Our work performs in-domain relation discovery by leveraging regularities in relation expression at the lexical, syntactic, and discourse levels. |
Experiments | In-domain multiclass classifier This is Support-vector-machine (Fan et al., 2008, SVM) using the one-versus-rest decoding without removing positive labeled data (Jiang and Zhai, 2007b) from the target domain. |
Experiments | From Table 3 and Table 5, we see that the proposed method has the best F1 among all the other methods, except for the supervised upper bound ( In-domain ). |
Experiments | Performance Gap From Tables 2 to 4, we observe that the smallest performance gap between RDA and the in-domain settings is still high (about 12% with k = 5) on ACE 2004. |
Inferring a learning curve from mostly monolingual data | Given a small “seed” parallel corpus, the translation system can be used to train small in-domain models and the evaluation score can be measured at a few initial sample sizes {($1,y1), ($2, yg)...(acp, yp)}. |
Inferring a learning curve from mostly monolingual data | For the cases where a slightly larger in-domain “seed” parallel corpus is available, we introduced an extrapolation method and a combined method yielding high-precision predictions: using models trained on up to 20K sentence pairs we can predict performance on a given test set with a root mean squared error in the order of l BLEU point at 75K sentence pairs, and in the order of 2-4 BLEU points at 500K. |
Introduction | This prediction, or more generally the prediction of the learning curve of an SMT system as a function of available in-domain parallel data, is the objective of this paper. |
Introduction | In the second scenario (S2), an additional small seed parallel corpus is given that can be used to train small in-domain models and measure (with some variance) the evaluation score at a few points on the initial portion of the learning curve. |
Empirical Evaluation | We compare them with two supervised methods: a supervised model (Base) which is trained on the source domain data only, and another supervised model ( In-domain ) which is learned on the labeled data from the target domain. |
Empirical Evaluation | The Base model can be regarded as a natural baseline model, whereas the In-domain model is essentially an upper-bound for any domain-adaptation method. |
Empirical Evaluation | First, observe that the total drop in the accuracy when moving to the target domain is 8.9%: from 84.6% demonstrated by the In-domain classifier to 75.6% shown by the non-adapted Base classifier. |
Related Work | 5 The drop in accuracy for the SCL method in Table 1 is is computed with respect to the less accurate supervised in-domain classifier considered in Blitzer et a1. |
Comparative Evaluation | Next, for each domain and language, we manually calculated the fraction of terms for which an in-domain definition was provided by Google Define and GlossBoot. |
Results and Discussion | In Table 5 we show examples of the possible scenarios for terms: in-domain extracted terms |
Results and Discussion | which are also found in the gold standard (column 2), in-domain extracted terms but not in the gold standard (column 3), out-of—domain extracted terms (column 4), and domain terms in the gold standard but not extracted by our approach (column 5). |
Results and Discussion | Table 6), because the retrieved glosses of domain terms are usually in-domain too, and follow a definitional style because they come from glossaries. |
Empirical Analysis | The aim of the evaluation is to measure the reachable accuracy of the simple model proposed and to compare its impact over in-domain and out-of-domain semantic role labeling tasks. |
Empirical Analysis | The in-domain test has been run over the FrameNet annotated corpus, derived from the British National Corpus (BNC). |
Empirical Analysis | training FN—BNC 134,697 271,560 test in-domain FN—BNC 14,952 30,173 t d . |
Experiments | We start off by presenting the results for the traditional in-domain setting, where both TRAIN and TEST come from the same domain, e.g., AUTO or TABLETS. |
Experiments | 5.3.1 In-domain experiments |
Experiments | Figure 2: In-domain learning curves. |
Introduction | Domain adaptation techniques aim at finding ways to adjust an out-of-domain (OUT) model to represent a target domain ( in-domain or IN). |
Introduction | In addition to the basic approach of concatenation of in-domain and out-of-domain data, we also trained a log-linear mixture model (Foster and Kuhn, 2007) |
Related Work 5.1 Domain Adaptation | Other methods include using self-training techniques to exploit monolingual in-domain data (Ueffing et al., 2007; |
Introduction | — We used phrase-table merging (Nakov and Ng, 2009) to utilize MSA/English parallel data with the available in-domain parallel data. |
Previous Work | In contrast, we showed that training on in-domain dialectal data irrespective of its small size is better than training on large MSA/English data. |
Previous Work | Our LM experiments also affirmed the importance of in-domain English LMs. |
CR + LS + DMM + DPM 39.32* +24% 47.86* +20% | The ensemble model without LS (third line) has a nearly identical P@1 score as the equivalent in-domain model (line 13 in Table 1), while slightly surpassing in-domain MRR performance. |
CR + LS + DMM + DPM 39.32* +24% 47.86* +20% | The in-domain performance of the ensemble model is similar to that of the single classifier in both YA and Bio HOW so we omit these results here for simplicity. |
Related Work | Inspired by this previous work and recent work in discourse parsing (Feng and Hirst, 2012), our work is the first to systematically explore structured discourse features driven by several discourse representations, combine discourse with lexical semantic models, and evaluate these representations on thousands of questions using both in-domain and cross-domain experiments. |
Introduction | While most previous work focus on in-domain sequential labelling or cross-domain classification tasks, we are the first to learn representations for web-domain structured prediction. |
Introduction | Our results suggest that while both strategies improve in-domain tagging accuracies, keeping the learned representation unchanged consistently results in better cross-domain accuracies. |
Related Work | (2010) learn word embeddings to improve the performance of in-domain POS tagging, named entity recognition, chunking and semantic role labelling. |
Experimental Evaluation | Listed together with their PARSEVAL F-measures these are: gold-standard parses from the treebank (GoldSyn, 100%), a parser trained on WSJ plus a small number of in-domain training sentences required to achieve good performance, 20 for CLANG (Syn20, 88.21%) and 40 for GEOQUERY (Syn40, 91.46%), and a parser trained on no in-domain data (Syn0, 82.15% for CLANG and 76.44% for GEOQUERY). |
Experimental Evaluation | ones trained on more in-domain data) improved our approach. |
Experimental Evaluation | Table 5: Performance on GEO25 0 (20 in-domain sentences are used in SYN20 to train the syntactic parser). |