Abstract | We explore the relationship between normal English and simplified English and compare language models trained on varying amounts of text from each. |
Abstract | We find that a combined model using both simplified and normal English data achieves a 23% improvement in perplexity and a 24% improvement on the lexical simplification task over a model trained only on simple data. |
Introduction | Finally, many recent text simplification systems have utilized language models trained only on simplified data (Zhu et al., 2010; Woodsend and Lapata, 2011; Coster and Kauchak, 2011a; Wubben et al., 2012); improvements in simple language modeling could translate into improvements for these systems. |
Language Model Evaluation: Perplexity | Figure 1: Language model perplexities on the held-out test data for models trained on increasing amounts of data. |
Language Model Evaluation: Perplexity | As expected, when trained on the same amount of data, the language models trained on simple data perform significantly better than language models trained on normal data. |
Language Model Evaluation: Perplexity | The perplexity for the simple-ALL+norma1 model, which starts with all available simple data, continues to improve as normal data is added resulting in a 23% improvement over the model trained with only simple data (from a perplexity of 129 down to 100). |
Related Work | Similarly for summarization, systems that have employed language models trained only on unsummarized text (Banko et al., 2000; Daume and Marcu, 2002). |
Generating reference reordering from parallel sentences | However, as we will see in the experimental results, the quality of a reordering model trained from automatic alignments is very sensitive to the quality of alignments. |
Generating reference reordering from parallel sentences | In addition to the original source and target sentence, we also feed the predictions of the reordering model trained in Step 1 to this alignment model (see section 4.2 for details of the model itself). |
Generating reference reordering from parallel sentences | Step 3: Finally, we use the predictions of the alignment model trained in Step 2 to train reordering models C(773|ws,wt, a) (see section 4.3 for details on the reordering model itself). |
Introduction | A reordering model trained on such incorrect reorderings would obviously perform poorly. |
Introduction | Our experiments show that reordering models trained using these improved machine alignments perform significantly better than models trained only on manual word alignments. |
Introduction | This results in a 1.8 BLEU point gain in machine translation performance on an Urdu-English machine translation task over a preordering model trained using only manual word alignments. |
Results and Discussions | Using fewer features: We compare the performance of a model trained using lexical features for all words (Column 2 of Table l) with a model trained using lexical features only for the 1000 most frequent words (Column 3 of Table l). |
Results and Discussions | Table 2: mBLEU scores for Urdu to English reordering using models trained on different data sources and tested on a development set of 8017 Urdu tokens. |
Results and Discussions | Table 3: mBLEU with different methods to generate reordering model training data from a machine aligned parallel corpus in addition to manual word alignments. |
Experiments | The source side data statistics for the reordering model training is given in Table 2 (target side has only nine labels). |
Experiments | Table 2: tagging-style model training data statistics |
Experiments | will show later, the model trained with both CRFs and RNN help to improve the translation quality. |
Tagging-style Reordering Model | Once the model training is finished, we make inference on develop and test corpora which means that we get the labels of the source sentences that need to be translated. |
Markov Topic Regression - MTR | We use the word-tag posterior probabilities obtained from a CRF sequence model trained on labeled utterances as features. |
Semi-Supervised Semantic Labeling | They decode unlabeled queries from target domain (t) using a CRF model trained on the POS-labeled newswire data (source domain (0)). |
Semi-Supervised Semantic Labeling | They use a small value for 7' to enable the new model to be as close as possible to the initial model trained on source data. |
Abstract | However, a previous study (Sporleder and Lascarides, 2008) showed that models trained on these synthetic data do not generalize very well to natural (i.e. |
Multitask Learning for Discourse Relation Prediction | However, (Sporleder and Lascarides, 2008) found that the model trained on synthetic implicit data has not performed as well as expected in natural implicit data. |
Related Work | Unlike their previous work, our previous work (Zhou et al., 2010) presented a method to predict the missing connective based on a language model trained on an unannotated corpus. |
Experimental results | For all results reported here, we use the SRILM toolkit for baseline model training and pruning, then convert from the resulting ARPA format model to an OpenFst format (Allauzen et al., 2007), as used in the OpenGrm n-gram library (Roark et al., 2012). |
Experimental results | The model was trained on the 1996 and 1997 Hub4 acoustic model training sets (about 150 hours of data) using semi-tied covariance modeling and CMLLR-based speaker adaptive training and 4 iterations of boosted MMI. |
Introduction | This is done via a marginal distribution constraint which requires the expected frequency of the lower-order n- grams to match their observed frequency in the training data, much as is commonly done for maximum entropy model training . |
Experimental Setup | Beam size is fixed at 2000.4 Sentence compressions are evaluated by a 5-gram language model trained on Gigaword (Graff, 2003) by SRILM (Stolcke, 2002). |
Sentence Compression | As the space of possible compressions is exponential in the number of leaves in the parse tree, instead of looking for the globally optimal solution, we use beam search to find a set of highly likely compressions and employ a language model trained on a large corpus for evaluation. |
Sentence Compression | Given the N -best compressions from the decoder, we evaluate the yield of the trimmed trees using a language model trained on the Gigaword (Graff, 2003) corpus and return the compression with the highest probability. |
Experiment | When segmenting texts of the target domain using models trained on source domain, the performance will be hurt with more false segmented instances added into the training set. |
INTRODUCTION | These new features of micro-blogs make the Chinese Word Segmentation (CWS) models trained on the source domain, such as news corpus, fail to perform equally well when transferred to texts from micro-blogs. |
Our method | Because of this, the model trained on this unbalanced corpus tends to be biased. |