Index of papers in Proc. ACL 2014 that mention
  • in-domain
Liu, Le and Hong, Yu and Liu, Hao and Wang, Xing and Yao, Jianmin
Abstract
Most current data selection methods solely use language models trained on a small scale in-domain data to select domain-relevant sentence pairs from general-domain parallel corpus.
Experiments
The in-domain data is collected from CWMT09, which consists of spoken dialogues in a travel setting, containing approximately 50,000 parallel sentence pairs in English and Chinese.
Experiments
Bilingual Cor- #sentence #token P113 Eng Chn Eng Chn In-domain 50K 50K 360K 310K General-domain 1 6M 1 6M 3933M 3602M
Experiments
Our work relies on the use of in-domain language models and translation models to rank the sentence pairs from the general-domain bilingual training set.
Introduction
Current data selection methods mostly use language models trained on small scale in-domain data to measure domain relevance and select domain-relevant parallel sentence pairs to expand training corpora.
Related Work
(2010) ranked the sentence pairs in the general-domain corpus according to the perplexity scores of sentences, which are computed with respect to in-domain language models.
Training Data Selection Methods
These methods are based on language model and translation model, which are trained on small in-domain parallel data.
Training Data Selection Methods
t(ej|fi) is the translation probability of word 61- conditioned on word fiand is estimated from the small in-domain parallel data.
Training Data Selection Methods
The sentence pair with higher score is more likely to be generated by in-domain translation model, thus, it is more relevant to the in-domain corpus and will be remained to expand the training data.
in-domain is mentioned in 22 sentences in this paper.
Topics mentioned in this paper:
Nguyen, Thien Huu and Grishman, Ralph
Experiments
We test our system on two scenarios: In-domain : the system is trained and evaluated on the source domain (bn+nw, 5-fold cross validation); Out-of-domain: the system is trained on the source domain and evaluated on the target development set of bc (bc dev).
Experiments
5All the in-domain improvement in rows 2, 6, 7 of Table 2 are significant at confidence levels 2 95%.
Experiments
HLBL embeddings of 50 and 100 dimensions, the most effective way to introduce word embeddings is to add embeddings to the heads of the two mentions (row 2; both in-domain and out-of-domain) although it is less pronounced for HLBL embedding with 50 dimensions.
in-domain is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Zhang, Hui and Chiang, David
Language model adaptation
Many methods (Lin et al., 1997; Gao et al., 2002; Klakow, 2000; Moore and Lewis, 2010; AX-elrod et al., 2011) rank sentences in the general-domain data according to their similarity to the in-domain data and select only those with score higher than some threshold.
Language model adaptation
However, sometimes it is hard to say whether a sentence is totally in-domain or out-of-domain; for example, quoted speech in a news report might be partly in-domain if the domain of interest is broadcast conversation.
Language model adaptation
They first train two language models, pin on a set of in-domain data, and pout on a set of general-domain data.
in-domain is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Nguyen, Minh Luan and Tsang, Ivor W. and Chai, Kian Ming A. and Chieu, Hai Leong
Experiments
In-domain multiclass classifier This is Support-vector-machine (Fan et al., 2008, SVM) using the one-versus-rest decoding without removing positive labeled data (Jiang and Zhai, 2007b) from the target domain.
Experiments
From Table 3 and Table 5, we see that the proposed method has the best F1 among all the other methods, except for the supervised upper bound ( In-domain ).
Experiments
Performance Gap From Tables 2 to 4, we observe that the smallest performance gap between RDA and the in-domain settings is still high (about 12% with k = 5) on ACE 2004.
in-domain is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Severyn, Aliaksei and Moschitti, Alessandro and Uryupina, Olga and Plank, Barbara and Filippova, Katja
Experiments
We start off by presenting the results for the traditional in-domain setting, where both TRAIN and TEST come from the same domain, e.g., AUTO or TABLETS.
Experiments
5.3.1 In-domain experiments
Experiments
Figure 2: In-domain learning curves.
in-domain is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Jansen, Peter and Surdeanu, Mihai and Clark, Peter
CR + LS + DMM + DPM 39.32* +24% 47.86* +20%
The ensemble model without LS (third line) has a nearly identical P@1 score as the equivalent in-domain model (line 13 in Table 1), while slightly surpassing in-domain MRR performance.
CR + LS + DMM + DPM 39.32* +24% 47.86* +20%
The in-domain performance of the ensemble model is similar to that of the single classifier in both YA and Bio HOW so we omit these results here for simplicity.
Related Work
Inspired by this previous work and recent work in discourse parsing (Feng and Hirst, 2012), our work is the first to systematically explore structured discourse features driven by several discourse representations, combine discourse with lexical semantic models, and evaluate these representations on thousands of questions using both in-domain and cross-domain experiments.
in-domain is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Ma, Ji and Zhang, Yue and Zhu, Jingbo
Introduction
While most previous work focus on in-domain sequential labelling or cross-domain classification tasks, we are the first to learn representations for web-domain structured prediction.
Introduction
Our results suggest that while both strategies improve in-domain tagging accuracies, keeping the learned representation unchanged consistently results in better cross-domain accuracies.
Related Work
(2010) learn word embeddings to improve the performance of in-domain POS tagging, named entity recognition, chunking and semantic role labelling.
in-domain is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: