Index of papers in Proc. ACL 2014 that mention
  • model training
Ma, Xuezhe and Xia, Fei
Conclusion
By presenting a model training framework, our approach can utilize parallel text to estimate transferring distribution with the help of a well-developed resource-rich language dependency parser, and use unlabeled data as entropy regularization.
Experiments
For projective parsing, several algorithms (McDonald and Pereira, 2006; Carreras, 2007; Koo and Collins, 2010; Ma and Zhao, 2012) have been proposed to solve the model training problems (calculation of objective function and gradient) for different factorizations.
Our Approach
2.2 Model Training
Our Approach
One of the most common model training methods for supervised dependency parser is Maximum conditional likelihood estimation.
Our Approach
For the purpose of transferring cross-lingual information from the English parser via parallel text, we explore the model training method proposed by Smith and Eisner (2007), which presented a generalization of K function (Abney, 2004), and related it to another semi-supervised learning technique, entropy regularization (Jiao et al., 2006; Mann and McCallum, 2007).
model training is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Li, Junhui and Marton, Yuval and Resnik, Philip and Daumé III, Hal
Discussion
Reordering accuracy analysis: The reordering type distribution on the reordering model training data in Table 3 suggests that semantic reordering is more difficult than syntactic reordering.
Experiments
4.2 Model Training
Experiments
However, our preliminary experiments showed that the reordering models trained on gold alignment yielded higher improvement.
Experiments
Table 3: Reordering type distribution over the reordering model’s training data.
model training is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Liu, Shujie and Yang, Nan and Li, Mu and Zhou, Ming
Experiments and Results
The language model is a 5-gram language model trained with the target sentences in the training data.
Experiments and Results
Our baseline decoder is an in-house implementation of Bracketing Transduction Grammar (BT-G) (Wu, 1997) in CKY-style decoding with a lexical reordering model trained with maximum entropy (Xiong et al., 2006).
Model Training
Due to the inexact search nature of SMT decoding, search errors may inevitably break theoretical properties, and the final translation results may be not suitable for model training .
Phrase Pair Embedding
Forced decoding is utilized to get positive samples, and contrastive divergence is used for model training .
Related Work
The combination of reconstruction error and reordering error is used to be the objective function for the model training .
model training is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Liu, Le and Hong, Yu and Liu, Hao and Wang, Xing and Yao, Jianmin
Abstract
Most current data selection methods solely use language models trained on a small scale in-domain data to select domain-relevant sentence pairs from general-domain parallel corpus.
Conclusion
We present three novel methods for translation model training data selection, which are based on the translation model and language model.
Experiments
Meanwhile, we use the language model training scripts integrated in the NiuTrans toolkit to train another 4-gram language model, which is used in MT tuning and decoding.
Introduction
However, domain-specific machine translation has few parallel corpora for translation model training in the domain of interest.
Introduction
Current data selection methods mostly use language models trained on small scale in-domain data to measure domain relevance and select domain-relevant parallel sentence pairs to expand training corpora.
model training is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Zhang, Zhe and Singh, Munindar P.
Experiments
Figure 7 shows the results of a 10-fold cross validation on the 200-review dataset (light grey bars show the accuracy of the model trained without using transition cue features).
Experiments
In particular, the accuracy of 0D is markedly improved by adding T. The model trained using all the feature sets yields the best accuracy.
Experiments
We compare models trained using (1) our domain-specific lexicon, (2) Affective Norms for English Words (ANEW) (Bradley and Lang, 1999), and (3) Linguistic Inquiry and Word Count (LIWC) (Tausczik and Pennebaker, 2010).
model training is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Huang, Fei and Xu, Jian-Ming and Ittycheriah, Abraham and Roukos, Salim
Adaptive MT Quality Estimation
Figure 2: Correlation coefficient 7“ between predicted TER (X-axis) and true TER (y-axis) for QB models trained from the same document (top figure) or different document (bottom figure).
Discussion and Conclusion
However, the QE model training data is no longer constant.
Document-specific MT System
ment (HMM (Vogel et al., 1996) and MaxEnt (Ittycheriah and Roukos, 2005) alignment models, phrase pair extraction, MT model training (Ittycheriah and Roukos, 2007) and LM model training .
Experiments
As our MT model training data include proprietary data, the MT performance is significantly better than publicly available MT software.
model training is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Bhat, Suma and Xue, Huichao and Yoon, Su-Youn
Experimental Setup
The ASR set, with 47,227 responses, was used for ASR training and POS similarity model training .
Experimental Setup
Although the skewed distribution limits the number of score-specific instances for the highest and lowest scores available for model training , we used the data without modifying the distribution since it is representative of responses in a large-scale language assessment scenario.
Models for Measuring Grammatical Competence
A distinguishing feature of the current study is that the measure is based on a comparison of characteristics of the test response to models trained on large amounts of data from each score point, as opposed to measures that are simply characteristics of the responses themselves (which is how measures have been considered in prior studies).
model training is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Heilman, Michael and Cahill, Aoife and Madnani, Nitin and Lopez, Melissa and Mulholland, Matthew and Tetreault, Joel
Experiments
12We selected a threshold for binarization from a grid of 1001 points from 1 to 4 that maximized the accuracy of binarized predictions from a model trained on the training set and evaluated on the binarized development set.
System Description
The model computes the following features from a 5-gram language model trained on the same three sections of English Gigaword using the SRILM toolkit (Stolcke, 2002):
System Description
Finally, the system computes the average log-probability and number of out-of-vocabulary words from a language model trained on a collection of essays written by nonnative English speakers7 (“nonnative LM”).
model training is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: