Index of papers in Proc. ACL that mention
  • model trained
Uszkoreit, Jakob and Brants, Thorsten
Abstract
In this paper we investigate the effects of applying such a technique to higher—order n—gram models trained on large corpora.
Distributed Clustering
The quality of class-based models trained using the resulting clusterings did not differ noticeably from those trained using clusterings for which the full vocabulary was considered in each iteration.
Experiments
Table 1: BLEU scores of the Arabic English system using models trained on the English en_ta7"get data set
Experiments
We used these models in addition to a word-based 6-gram model created by combining models trained on all four English data sets.
Experiments
Table 2 shows the BLEU scores of the machine translation system using only this word-based model, the scores after adding the class-based model trained on the enlarget data set and when using all three models.
Introduction
Class-based n-gram models have also been shown to benefit from their reduced number of parameters when scaling to higher-order n- grams (Goodman and Gao, 2000), and even despite the increasing size and decreasing sparsity of language model training corpora (Brants et al., 2007), class-based n—gram models might lead to improvements when increasing the n—gram order.
Introduction
We then show that using partially class-based language models trained using the resulting classifications together with word-based language models in a state-of-the-art statistical machine translation system yields improvements despite the very large size of the word-based models used.
model trained is mentioned in 14 sentences in this paper.
Topics mentioned in this paper:
Kauchak, David
Abstract
We explore the relationship between normal English and simplified English and compare language models trained on varying amounts of text from each.
Abstract
We find that a combined model using both simplified and normal English data achieves a 23% improvement in perplexity and a 24% improvement on the lexical simplification task over a model trained only on simple data.
Introduction
Finally, many recent text simplification systems have utilized language models trained only on simplified data (Zhu et al., 2010; Woodsend and Lapata, 2011; Coster and Kauchak, 2011a; Wubben et al., 2012); improvements in simple language modeling could translate into improvements for these systems.
Language Model Evaluation: Perplexity
Figure 1: Language model perplexities on the held-out test data for models trained on increasing amounts of data.
Language Model Evaluation: Perplexity
As expected, when trained on the same amount of data, the language models trained on simple data perform significantly better than language models trained on normal data.
Language Model Evaluation: Perplexity
The perplexity for the simple-ALL+norma1 model, which starts with all available simple data, continues to improve as normal data is added resulting in a 23% improvement over the model trained with only simple data (from a perplexity of 129 down to 100).
Related Work
Similarly for summarization, systems that have employed language models trained only on unsummarized text (Banko et al., 2000; Daume and Marcu, 2002).
model trained is mentioned in 22 sentences in this paper.
Topics mentioned in this paper:
Visweswariah, Karthik and Khapra, Mitesh M. and Ramanathan, Ananthakrishnan
Generating reference reordering from parallel sentences
However, as we will see in the experimental results, the quality of a reordering model trained from automatic alignments is very sensitive to the quality of alignments.
Generating reference reordering from parallel sentences
In addition to the original source and target sentence, we also feed the predictions of the reordering model trained in Step 1 to this alignment model (see section 4.2 for details of the model itself).
Generating reference reordering from parallel sentences
Step 3: Finally, we use the predictions of the alignment model trained in Step 2 to train reordering models C(773|ws,wt, a) (see section 4.3 for details on the reordering model itself).
Introduction
A reordering model trained on such incorrect reorderings would obviously perform poorly.
Introduction
Our experiments show that reordering models trained using these improved machine alignments perform significantly better than models trained only on manual word alignments.
Introduction
This results in a 1.8 BLEU point gain in machine translation performance on an Urdu-English machine translation task over a preordering model trained using only manual word alignments.
Results and Discussions
Using fewer features: We compare the performance of a model trained using lexical features for all words (Column 2 of Table l) with a model trained using lexical features only for the 1000 most frequent words (Column 3 of Table l).
Results and Discussions
Table 2: mBLEU scores for Urdu to English reordering using models trained on different data sources and tested on a development set of 8017 Urdu tokens.
Results and Discussions
Table 3: mBLEU with different methods to generate reordering model training data from a machine aligned parallel corpus in addition to manual word alignments.
model trained is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Wuebker, Joern and Mauser, Arne and Ney, Hermann
Alignment
Sentences for which the decoder can not find an alignment are discarded for the phrase model training .
Alignment
phrase model training .
Alignment
, N is used for both the initialization of the translation model p(f|é) and the phrase model training .
Conclusion
While models trained from Viterbi alignments already lead to good results, we have demonstrated
Experimental Evaluation
), the count model trained with leaving-one-out (110) and cross-validation (cv), the weighted count model and the full model.
Experimental Evaluation
Further, scores for fixed log-linear interpolation of the count model trained with leaving-one-out with the heuristic as well as a feature-wise combination are shown.
Introduction
Viterbi Word Alignment Phrase Alignment word translation models phrase translation models trained by EM Algorithm trained by EM Algorithm heuristic phrase phrase translation counts probabilities Phrase Translation Table ‘ ‘ Phrase Translation Table
Introduction
Our results show that the proposed phrase model training improves translation quality on the test set by 0.9 BLEU points over our baseline.
model trained is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Finkel, Jenny Rose and Manning, Christopher D.
Abstract
Experiments on joint parsing and named entity recognition, using the OntoNotes corpus, show that our hierarchical joint model can produce substantial gains over a joint model trained on only the jointly annotated data.
Hierarchical Joint Learning
Our resulting joint model is of higher quality than a comparable joint model trained on only the jointly-annotated data, due to all of the evidence provided by the additional single-task data.
Hierarchical Joint Learning
When we rescale the model-specific prior, we rescale based on the number of data in that model’s training set, not the total number of data in all the models combined.
Hierarchical Joint Learning
Having uniformly randomly drawn datum d E UmeM ’Dm, let m(d) E M tell us to which model’s training data the datum belongs.
Introduction
We built a joint model of parsing and named entity recognition (Finkel and Manning, 2009b), which had small gains on parse performance and moderate gains on named entity performance, when compared with single-task models trained on the same data.
Introduction
entity models trained on larger corpora, annotated with only one type of information.
Introduction
We use a hierarchical prior to link a joint model trained on jointly-annotated data with other single-task models trained on single-task annotated data.
model trained is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Ma, Xuezhe and Xia, Fei
Conclusion
By presenting a model training framework, our approach can utilize parallel text to estimate transferring distribution with the help of a well-developed resource-rich language dependency parser, and use unlabeled data as entropy regularization.
Experiments
For projective parsing, several algorithms (McDonald and Pereira, 2006; Carreras, 2007; Koo and Collins, 2010; Ma and Zhao, 2012) have been proposed to solve the model training problems (calculation of objective function and gradient) for different factorizations.
Our Approach
2.2 Model Training
Our Approach
One of the most common model training methods for supervised dependency parser is Maximum conditional likelihood estimation.
Our Approach
For the purpose of transferring cross-lingual information from the English parser via parallel text, we explore the model training method proposed by Smith and Eisner (2007), which presented a generalization of K function (Abney, 2004), and related it to another semi-supervised learning technique, entropy regularization (Jiao et al., 2006; Mann and McCallum, 2007).
model trained is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Kummerfeld, Jonathan K. and Roesner, Jessika and Dawborn, Tim and Haggerty, James and Curran, James R. and Clark, Stephen
Conclusion
Models trained on parser-annotated Wikipedia text and MEDLINE text had improved performance on these target domains, in terms of both speed and accuracy.
Results
For all four algorithms the training time is proportional to the amount of data, but the GIS and BFGS models trained on only CCGbank took 4,500 and 4,200 seconds to train, while the equivalent perceptron and MIRA models took 90 and 95 seconds to train.
Results
For speed improvement these were MIRA models trained on 4,000,000 parser-
Results
In particular, note that models trained on Wikipedia or the biomedical data produce lower F-scores3 than the baseline on newswire.
model trained is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Mairesse, François and Walker, Marilyn
Conclusion
Additionally, our data-driven approach can be applied to any dimension that is meaningful to human judges, and it provides an elegant way to project multiple dimensions simultaneously, by including the relevant dimensions as features of the parameter models’ training data.
Conclusion
In terms of our research questions in Section 3.1, we show that models trained on expert judges to project multiple traits in a single utterance generate utterances whose personality is recognized by naive judges.
Evaluation Experiment
Q1: Is the personality projected by models trained on
Introduction
Another thread investigates SNLG scoring models trained using higher-level linguistic features to replicate human judgments of utterance quality (Rambow et al., 2001; Nakatsu and White, 2006; Stent and Guo, 2005).
Parameter Estimation Models
2.3 Statistical Model Training
model trained is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Zhang, Zhe and Singh, Munindar P.
Experiments
Figure 7 shows the results of a 10-fold cross validation on the 200-review dataset (light grey bars show the accuracy of the model trained without using transition cue features).
Experiments
In particular, the accuracy of 0D is markedly improved by adding T. The model trained using all the feature sets yields the best accuracy.
Experiments
We compare models trained using (1) our domain-specific lexicon, (2) Affective Norms for English Words (ANEW) (Bradley and Lang, 1999), and (3) Linguistic Inquiry and Word Count (LIWC) (Tausczik and Pennebaker, 2010).
model trained is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Liu, Le and Hong, Yu and Liu, Hao and Wang, Xing and Yao, Jianmin
Abstract
Most current data selection methods solely use language models trained on a small scale in-domain data to select domain-relevant sentence pairs from general-domain parallel corpus.
Conclusion
We present three novel methods for translation model training data selection, which are based on the translation model and language model.
Experiments
Meanwhile, we use the language model training scripts integrated in the NiuTrans toolkit to train another 4-gram language model, which is used in MT tuning and decoding.
Introduction
However, domain-specific machine translation has few parallel corpora for translation model training in the domain of interest.
Introduction
Current data selection methods mostly use language models trained on small scale in-domain data to measure domain relevance and select domain-relevant parallel sentence pairs to expand training corpora.
model trained is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Liu, Shujie and Yang, Nan and Li, Mu and Zhou, Ming
Experiments and Results
The language model is a 5-gram language model trained with the target sentences in the training data.
Experiments and Results
Our baseline decoder is an in-house implementation of Bracketing Transduction Grammar (BT-G) (Wu, 1997) in CKY-style decoding with a lexical reordering model trained with maximum entropy (Xiong et al., 2006).
Model Training
Due to the inexact search nature of SMT decoding, search errors may inevitably break theoretical properties, and the final translation results may be not suitable for model training .
Phrase Pair Embedding
Forced decoding is utilized to get positive samples, and contrastive divergence is used for model training .
Related Work
The combination of reconstruction error and reordering error is used to be the objective function for the model training .
model trained is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Li, Junhui and Marton, Yuval and Resnik, Philip and Daumé III, Hal
Discussion
Reordering accuracy analysis: The reordering type distribution on the reordering model training data in Table 3 suggests that semantic reordering is more difficult than syntactic reordering.
Experiments
4.2 Model Training
Experiments
However, our preliminary experiments showed that the reordering models trained on gold alignment yielded higher improvement.
Experiments
Table 3: Reordering type distribution over the reordering model’s training data.
model trained is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Feng, Minwei and Peter, Jan-Thorsten and Ney, Hermann
Experiments
The source side data statistics for the reordering model training is given in Table 2 (target side has only nine labels).
Experiments
Table 2: tagging-style model training data statistics
Experiments
will show later, the model trained with both CRFs and RNN help to improve the translation quality.
Tagging-style Reordering Model
Once the model training is finished, we make inference on develop and test corpora which means that we get the labels of the source sentences that need to be translated.
model trained is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Li, Mu and Duan, Nan and Zhang, Dongdong and Li, Chi-Ho and Zhou, Ming
Collaborative Decoding
Model training .
Collaborative Decoding
2.5 Model Training
Collaborative Decoding
Model training for co-decoding
Experiments
The language model used for all models (include decoding models and system combination models described in Section 2.6) is a 5-gram model trained with the English part of bilingual data and xinhua portion of LDC English Giga-word corpus version 3.
Experiments
We parsed the language model training data with Berkeley parser, and then trained a dependency language model based on the parsing output.
model trained is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Sun, Xu and Gao, Jianfeng and Micol, Daniel and Quirk, Chris
Clickthrough Data and Spelling Correction
Unfortunately, we found in our experiments that the pairs extracted using the method are too noisy for reliable error model training , even with a very tight threshold, and we did not see any significant improvement.
Clickthrough Data and Spelling Correction
Although by doing so we could miss some basic, obvious spelling corrections, our experiments show that the negative impact on error model training is negligible.
Clickthrough Data and Spelling Correction
We found that the error models trained using the data directly extracted from the query reformulation sessions suffer from the problem of underestimating the self-transformation probability of a query P(Q2=Q1|Q1), because we only included in the training data the pairs where the query is different from the correction.
The Baseline Speller System
The language model (the second factor) is a backoff bigram model trained on the tokenized form of one year of query logs, using maximum likelihood estimation with absolute discounting smoothing.
model trained is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Wu, Hua and Wang, Haifeng
Conclusion
9We used about 70k sentence pairs for CE model training , while Wang et a1.
Conclusion
(2008) used about 100k sentence pairs, a CE translation dictionary and more monolingual corpora for model training .
Experiments
Table 2 describes the data used for model training in this paper, including the BTEC (Basic Travel Expression Corpus) Chinese-English (CE) corpus and the BTEC English-Spanish (ES) corpus provided by IWSLT 2008 organizers, the HIT olympic CE corpus (2004-863-008)1 and the Europarl ES corpusz.
Experiments
Here, we used the synthetic CE Olympic corpus to train a model, which was interpolated with the CE model trained with both the BTEC CE1 corpus and the synthetic BTEC corpus to obtain an interpolated CE translation model.
model trained is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Liu, Shujie and Li, Chi-Ho and Li, Mu and Zhou, Ming
Experiments and Results
Our baseline decoder is an in-house implementation of Bracketing Transduction Grammar (Dekai Wu, 1997) (BTG) in CKY-style decoding with a lexical reordering model trained with maximum entropy (Xiong et al., 2006).
Experiments and Results
The language model is 5-gram language model trained with the target sentences in the training data.
Experiments and Results
The language model is 5-gram language model trained with the Giga-Word corpus plus the English sentences in the training data.
Graph Construction
Note that, due to pruning in both decoding and translation model training , forced alignment may fail, i.e.
model trained is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Kolachina, Prasanth and Cancedda, Nicola and Dymetman, Marc and Venkatapathy, Sriram
Inferring a learning curve from mostly monolingual data
In scenario $2, the models trained from the seed parallel corpus and the features used for inference (Section 4) provide complementary information.
Inferring a learning curve from mostly monolingual data
Using the models trained for the experiments in Section 3, we estimate the squared extrapolation error at the anchors 83- when using models trained on size up to 30;, and set the confidence in the extrapolations8 for u to its inverse:
Inferring a learning curve from mostly monolingual data
For the cases where a slightly larger in-domain “seed” parallel corpus is available, we introduced an extrapolation method and a combined method yielding high-precision predictions: using models trained on up to 20K sentence pairs we can predict performance on a given test set with a root mean squared error in the order of l BLEU point at 75K sentence pairs, and in the order of 2-4 BLEU points at 500K.
Selecting a parametric family of curves
For a certain bilingual test dataset d, we consider a set of observations 0d 2 {(301, yl), ($2, yg)...(;vn, 3471)}, where y, is the performance on d (measured using BLEU (Papineni et al., 2002)) of a translation model trained on a parallel corpus of size 307;.
model trained is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Huang, Fei and Xu, Jian-Ming and Ittycheriah, Abraham and Roukos, Salim
Adaptive MT Quality Estimation
Figure 2: Correlation coefficient 7“ between predicted TER (X-axis) and true TER (y-axis) for QB models trained from the same document (top figure) or different document (bottom figure).
Discussion and Conclusion
However, the QE model training data is no longer constant.
Document-specific MT System
ment (HMM (Vogel et al., 1996) and MaxEnt (Ittycheriah and Roukos, 2005) alignment models, phrase pair extraction, MT model training (Ittycheriah and Roukos, 2007) and LM model training .
Experiments
As our MT model training data include proprietary data, the MT performance is significantly better than publicly available MT software.
model trained is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Titov, Ivan and Kozhevnikov, Mikhail
Empirical Evaluation
Similarly, we call the models trained from this data supervised though full supervision was not available.
Empirical Evaluation
Additionally, we report the results of the model trained with all the 750 texts labeled (Supervised UB), its scores can be regarded as an upper bound on the results of the semi-supervised models.
Empirical Evaluation
Surprisingly, its precision is higher than that of the model trained on 750 labeled examples, though admittedly it is achieved at a very different recall level.
model trained is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Niu, Zheng-Yu and Wang, Haifeng and Wu, Hua
Experiments of Parsing
Models Training data (%) (%) (%) GP CTB 79.9 82.2 81.0 RP CTB 82.0 84.6 83.3
Experiments of Parsing
All the sentences LR LP F Models Training data (%) (%) (%)
Experiments of Parsing
LR LP F Models Training data (%) (%) (%)
model trained is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Jiang, Wenbin and Huang, Liang and Liu, Qun
Experiments
While in the subtable below, JST F1 is also undefined since the model trained on PD gives a POS set different from that of CTB.
Experiments
We also see that for both segmentation and Joint S&T, the performance sharply declines when a model trained on PD is tested on CTB (row 2 in each subtable).
Experiments
This obviously fall behind those of the models trained on CTB itself (row 3 in each subtable), about 97% F1, which are used as the baselines of the following annotation adaptation experiments.
model trained is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Heilman, Michael and Cahill, Aoife and Madnani, Nitin and Lopez, Melissa and Mulholland, Matthew and Tetreault, Joel
Experiments
12We selected a threshold for binarization from a grid of 1001 points from 1 to 4 that maximized the accuracy of binarized predictions from a model trained on the training set and evaluated on the binarized development set.
System Description
The model computes the following features from a 5-gram language model trained on the same three sections of English Gigaword using the SRILM toolkit (Stolcke, 2002):
System Description
Finally, the system computes the average log-probability and number of out-of-vocabulary words from a language model trained on a collection of essays written by nonnative English speakers7 (“nonnative LM”).
model trained is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Bhat, Suma and Xue, Huichao and Yoon, Su-Youn
Experimental Setup
The ASR set, with 47,227 responses, was used for ASR training and POS similarity model training .
Experimental Setup
Although the skewed distribution limits the number of score-specific instances for the highest and lowest scores available for model training , we used the data without modifying the distribution since it is representative of responses in a large-scale language assessment scenario.
Models for Measuring Grammatical Competence
A distinguishing feature of the current study is that the measure is based on a comparison of characteristics of the test response to models trained on large amounts of data from each score point, as opposed to measures that are simply characteristics of the responses themselves (which is how measures have been considered in prior studies).
model trained is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Zhang, Dongdong and Li, Mu and Duan, Nan and Li, Chi-Ho and Zhou, Ming
Experiments
In the experiments, the language model is a Chinese 5-gram language model trained with the Chinese part of the LDC parallel corpus and the Xin-hua part of the Chinese Gigaword corpus with about 27 million words.
Experiments
The Bi-ME model is trained with FBIS corpus, whose size is smaller than that used in Mo-ME model training .
Experiments
We can see that the Bi-ME model can achieve better results than the Mo-ME model in both recall and precision metrics although only a small sized bilingual corpus is used for Bi-ME model training .
model trained is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Zhang, Longkai and Li, Li and He, Zhengyan and Wang, Houfeng and Sun, Ni
Experiment
When segmenting texts of the target domain using models trained on source domain, the performance will be hurt with more false segmented instances added into the training set.
INTRODUCTION
These new features of micro-blogs make the Chinese Word Segmentation (CWS) models trained on the source domain, such as news corpus, fail to perform equally well when transferred to texts from micro-blogs.
Our method
Because of this, the model trained on this unbalanced corpus tends to be biased.
model trained is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Wang, Lu and Raghavan, Hema and Castelli, Vittorio and Florian, Radu and Cardie, Claire
Experimental Setup
Beam size is fixed at 2000.4 Sentence compressions are evaluated by a 5-gram language model trained on Gigaword (Graff, 2003) by SRILM (Stolcke, 2002).
Sentence Compression
As the space of possible compressions is exponential in the number of leaves in the parse tree, instead of looking for the globally optimal solution, we use beam search to find a set of highly likely compressions and employ a language model trained on a large corpus for evaluation.
Sentence Compression
Given the N -best compressions from the decoder, we evaluate the yield of the trimmed trees using a language model trained on the Gigaword (Graff, 2003) corpus and return the compression with the highest probability.
model trained is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Pervouchine, Vladimir and Li, Haizhou and Lin, Bo
Experiments
Figure 6: Mean reciprocal ratio on Xinhua test set vs. alignment entropy and F-score for models trained with different affinity alignments.
Experiments
Figure 7: Mean reciprocal ratio on Xinhua test set vs. alignment entropy and F-score for models trained with different phonological alignments.
Related Work
Although the direct orthographic mapping approach advocates a direct transfer of grapheme at runtime, we still need to establish the grapheme correspondence at the model training stage, when phoneme level alignment can help.
model trained is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Roark, Brian and Allauzen, Cyril and Riley, Michael
Experimental results
For all results reported here, we use the SRILM toolkit for baseline model training and pruning, then convert from the resulting ARPA format model to an OpenFst format (Allauzen et al., 2007), as used in the OpenGrm n-gram library (Roark et al., 2012).
Experimental results
The model was trained on the 1996 and 1997 Hub4 acoustic model training sets (about 150 hours of data) using semi-tied covariance modeling and CMLLR-based speaker adaptive training and 4 iterations of boosted MMI.
Introduction
This is done via a marginal distribution constraint which requires the expected frequency of the lower-order n- grams to match their observed frequency in the training data, much as is commonly done for maximum entropy model training .
model trained is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Koo, Terry and Carreras, Xavier and Collins, Michael
Experiments
For example, the performance of the dep1c and dep2c models trained on 1k sentences is roughly the same as the performance of the dep1 and dep2 models, respectively, trained on 2k sentences.
Experiments
For example, in scenario 1 the dep2c model trained on lk sentences is close in performance to the depl model trained on 4k sentences, and the dep2c model trained on 4k sentences is close to the depl model trained on the entire training set (roughly 40k sentences).
Experiments
For example, the deplc model trained on 4k sentences is roughly as good as the dep1 model trained on 8k sentences.
model trained is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Lan, Man and Xu, Yu and Niu, Zhengyu
Abstract
However, a previous study (Sporleder and Lascarides, 2008) showed that models trained on these synthetic data do not generalize very well to natural (i.e.
Multitask Learning for Discourse Relation Prediction
However, (Sporleder and Lascarides, 2008) found that the model trained on synthetic implicit data has not performed as well as expected in natural implicit data.
Related Work
Unlike their previous work, our previous work (Zhou et al., 2010) presented a method to predict the missing connective based on a language model trained on an unannotated corpus.
model trained is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Ó Séaghdha, Diarmuid
Results
Table 1 shows sample semantic classes induced by models trained on the corpus of BNC verb-object co-occurrences.
Results
unsurprising given that they are structurally similar models trained on the same data.
Results
It would be interesting to do a full comparison that controls for size and type of corpus data; in the meantime, we can report that the LDA and ROOTH-LDA models trained on verb-object observations in the BNC (about 4 times smaller than AQUAINT) also achieve a perfect score on the Holmes et al.
model trained is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Celikyilmaz, Asli and Hakkani-Tur, Dilek and Tur, Gokhan and Sarikaya, Ruhi
Markov Topic Regression - MTR
We use the word-tag posterior probabilities obtained from a CRF sequence model trained on labeled utterances as features.
Semi-Supervised Semantic Labeling
They decode unlabeled queries from target domain (t) using a CRF model trained on the POS-labeled newswire data (source domain (0)).
Semi-Supervised Semantic Labeling
They use a small value for 7' to enable the new model to be as close as possible to the initial model trained on source data.
model trained is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Zweig, Geoffrey and Platt, John C. and Meek, Christopher and Burges, Christopher J.C. and Yessenalina, Ainur and Liu, Qiang
Experimental Results 5.1 Data Resources
Note that the latter are derived from models trained with the Los Angeles Times data, while the Holmes results are derived from models trained with 19th—century novels.
Sentence Completion via Language Modeling
Our baseline model is a Good—Turing smoothed model trained with the CMU language modeling toolkit (Clarkson and Rosenfeld, 1997).
Sentence Completion via Language Modeling
For the SAT task, we used a trigram language model trained on 1.1B words of newspaper data, described in Section 5.1.
model trained is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Pauls, Adam and Klein, Dan
Experiments
In Table l, we show the first four samples of length between 15 and 20 generated from our model and a 5- gram model trained on the Penn Treebank.
Experiments
Table 5: Classification accuracies on the noisy WSJ for models trained on WSJ Sections 2—21 and our 1B token corpus.
Experiments
In Table 4, we also show the performance of the generative models trained on our 1B corpus.
model trained is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Green, Spence and DeNero, John
Conclusion and Outlook
We achieved best results when the model training data, MT tuning set, and MT evaluation set contained roughly the same genre.
Discussion of Translation Results
For comparison, +POS indicates our class-based model trained on the 11 coarse POS tags only (e.g., “Noun”).
Discussion of Translation Results
The best result—a +1.04 BLEU average gain—was achieved when the class-based model training data, MT tuning set, and MT evaluation set contained the same genre.
model trained is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Danescu-Niculescu-Mizil, Cristian and Cheng, Justin and Kleinberg, Jon and Lee, Lillian
Hello. My name is Inigo Montoya.
First, we show a concrete sense in which memorable quotes are indeed distinctive: with respect to lexical language models trained on the newswire portions of the Brown corpus [21], memorable quotes have significantly lower likelihood than their non-memorable counterparts.
Hello. My name is Inigo Montoya.
In particular, we analyze a corpus of advertising slogans, and we show that these slogans have significantly greater likelihood at both the word level and the part-of-speech level with respect to a language model trained on memorable movie quotes, compared to a corresponding language model trained on non-memorable movie quotes.
Never send a human to do a machine’s job.
In particular, the newswire section of the Brown corpus is predicted better at the lexical level by the language model trained on non-memorable quotes.
model trained is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Zarriess, Sina and Cahill, Aoife and Kuhn, Jonas
Abstract
We show that with an appropriately underspecified input, a linguistically informed realisation model trained to regenerate strings from the underlying semantic representation achieves 91.5% accuracy (over a baseline of 82.5%) in the prediction of the original voice.
Experiments
In Table 4, we report the performance of ranking models trained on the different feature subsets introduced in Section 4.
Experiments
The union of the features corresponds to the model trained on SEMh in Experiment 1.
model trained is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Tan, Ming and Zhou, Wenli and Zheng, Lei and Wang, Shaojun
Experimental results
For composite 5-gramfl’LSA model trained on 1.3 billion tokens corpus, 400 cores have to be used to keep top 5 most likely topics.
Experimental results
gramfl’LSA model trained on 44M tokens corpus, the computation time increases drastically with less than 5% percent perplexity improvement.
Experimental results
Its decoder uses a trigram language model trained with modified Kneser—Ney smoothing (Kneser and Ney, 1995) on a 200 million tokens corpus.
model trained is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Lu, Bin and Tan, Chenhao and Cardie, Claire and K. Tsou, Benjamin
A Joint Model with Unlabeled Parallel Text
When 11 is 0, the algorithm ignores the unlabeled data and degenerates to two MaXEnt models trained on only the labeled data.
A Joint Model with Unlabeled Parallel Text
Train two initial monolingual models Train and initialize 61(0) and 62(0) on the labeled data 2.
Results and Analysis
When 11 is set to 0, the joint model degenerates to two MaXEnt models trained with only the labeled data.
model trained is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Khapra, Mitesh M. and Joshi, Salil and Chatterjee, Arindam and Bhattacharyya, Pushpak
Abstract
We then use bilingual bootstrapping, wherein, a model trained using the seed annotated data of L1 is used to annotate the untagged data of L2 and vice versa using parameter projection.
Bilingual Bootstrapping
repeat 61 2: model trained using LDl 62 2: model trained using LD2
Bilingual Bootstrapping
repeat 61 2: model trained using LDl 62 2: model trained using LD2 for all ul E UD1 do 3 2: sense assigned by 61 to ul if confidence( S) > 6 then LD1 2= LD1 + U1 U D1 2: U D1 - U1 end if end for
model trained is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Chen, David and Dolan, William
Experiments
Overall, all the trained models produce reasonable paraphrase systems, even the model trained on just 28K single parallel sentences.
Experiments
Examples of the outputs produced by the models trained on single parallel sentences and on all parallel sentences are shown in Table 2.
Experiments
We randomly selected 200 source sentences and generated 2 paraphrases for each, representing the two extremes: one paraphrase produced by the model trained with single parallel sentences, and the other by the model trained with all parallel sentences.
model trained is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Cai, Peng and Gao, Wei and Zhou, Aoying and Wong, Kam-Fai
Evaluation
Adaptation takes place when ranking tasks are performed by using the models trained on the domains in which they were originally defined to rank the documents in other domains.
Introduction
This motivated the popular domain adaptation solution based on instance weighting, which assigns larger weights to those transferable instances so that the model trained on the source domain can adapt more effectively to the target domain (Jiang and Zhai, 2007).
Related Work
In (Geng et al., 2009; Chen et al., 2008b), the parameters of ranking model trained on the source domain was adjusted with the small set of labeled data in the target domain.
model trained is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: