Index of papers in Proc. ACL 2008 that mention
  • in-domain
Andreevskaia, Alina and Bergler, Sabine
Abstract
This study presents a novel approach to the problem of system portability across different domains: a sentiment annotation system that integrates a corpus-based classifier trained on a small set of annotated in-domain data and a lexicon-based system trained on WordNet.
Abstract
The paper explores the challenges of system portability across domains and text genres (movie reviews, news, blogs, and product reviews), highlights the factors affecting system performance on out-of-domain and small—set in-domain data, and presents a new system consisting of the ensemble of two classifiers with precision-based vote weighting, that provides significant gains in accuracy and recall over the corpus-based classifier and the lexicon-based system taken individually.
Domain Adaptation in Sentiment Research
There are two alternatives to supervised machine learning that can be used to get around this problem: on the one hand, general lists of sentiment clues/features can be acquired from domain-independent sources such as dictionaries or the Internet, on the other hand, unsupervised and weakly-supervised approaches can be used to take advantage of a small number of annotated in-domain examples and/or of unlabelled in-domain data.
Domain Adaptation in Sentiment Research
But such general word lists were shown to perform worse than statistical models built on sufficiently large in-domain training sets of movie reviews (Pang et al., 2002).
Domain Adaptation in Sentiment Research
For instance, Aue and Gamon (2005) proposed training on a samll number of labeled examples and large quantities of unlabelled in-domain data.
Introduction
Many applications require reliable processing of heterogeneous corpora, such as the World Wide Web, where the diversity of genres and domains present in the Internet limits the feasibility of in-domain training.
Introduction
A number of methods has been proposed in order to overcome this system portability limitation by using out-of-domain data, unlabelled in-domain corpora or a combination of in-domain and out-of-domain examples (Aue and Gamon, 2005; Bai et al., 2005; Drezde et al., 2007; Tan et al., 2007).
Introduction
The information contained in lexicographical sources, such as WordNet, reflects a lay person’s general knowledge about the world, while domain-specific knowledge can be acquired through classifier training on a small set of in-domain data.
in-domain is mentioned in 26 sentences in this paper.
Topics mentioned in this paper: