Index of papers in Proc. ACL 2013 that mention
  • language model
Nuhn, Malte and Ney, Hermann
Abstract
In this paper we show that even for the case of 1:1 substitution ciphers—which encipher plaintext symbols by exchanging them with a unique substitute—finding the optimal decipherment with respect to a bigram language model is NP-hard.
Definitions
denotes the language model .
Definitions
Depending on the structure of the language model Equation 2 can be further simplified.
Definitions
Similarly, we define language model matrices S for the unigram and the bigram case.
Introduction
The general idea is to find those translation model parameters that maximize the probability of the translations of a given source text in a given language model of the target language.
Introduction
This might be related to the fact that a statistical formulation of the decipherment problem has not been analyzed with respect to n-gram language models : This paper shows the close relationship of the decipherment problem to the quadratic assignment problem.
Introduction
In Section 4 we show that decipherment using a unigram language model corresponds to solving a linear sum assignment problem (LSAP).
Related Work
gram language model .
language model is mentioned in 20 sentences in this paper.
Topics mentioned in this paper:
Berg-Kirkpatrick, Taylor and Durrett, Greg and Klein, Dan
Experiments
Then, Tesseract uses a classifier, aided by a word-unigram language model , to recognize whole words.
Experiments
6.3 Language Model
Learning
The number of states in the dynamic programming lattice grows exponentially with the order of the language model (Jelinek, 1998; Koehn, 2004).
Learning
As a result, inference can become slow when the language model order n is large.
Learning
On each iteration of EM, we perform two passes: a coarse pass using a low-order language model, and a fine pass using a high-order language model (Petrov et al., 2008; Zhang and Gildea, 2008).
Model
P(E, T, R, X) = P(E) [ Language model ] - P(T|E) [Typesetting model] - P(R) [Inking model] - P (X |E, T, R) [Noise model]
Model
3.1 Language Model P(E)
Model
Our language model , P(E), is a Kneser-Ney smoothed character n-gram model (Kneser and Ney, 1995).
Related Work
Work that has directly addressed historical documents has done so using a pipelined approach, and without fully integrating a strong language model (Vamvakas et al., 2008; Kluzner et al., 2009; Kae et al., 2010; Kluzner et al., 2011).
Related Work
They integrated typesetting models with language models , but did not model noise.
Related Work
Our approach is also similar in that we use a strong language model (in conjunction with the constraint that the correspondence be regular) to learn the correct mapping.
language model is mentioned in 27 sentences in this paper.
Topics mentioned in this paper:
Nagata, Ryo and Whittaker, Edward
Approach
In his method, a variety of languages are modeled by their spelling systems (i.e., character-based n-gram language models ).
Approach
Then, agglomerative hierarchical clustering is applied to the language models to reconstruct a language family tree.
Approach
The similarity used for clustering is based on a divergence-like distance between two language models that was originally proposed by Juang and Rabiner (1985).
Methods
Similarly, let M,- be a language model trained using Di.
Methods
2, we use an n-gram language model based on a mixture of word and POS tokens instead of a simple word-based language model .
Methods
In this language model , content words in n-grams are replaced with their corresponding POS tags.
language model is mentioned in 20 sentences in this paper.
Topics mentioned in this paper:
Liu, Yang
Abstract
As the algorithm generates dependency trees for partial translations left-to-right in decoding, it allows for efficient integration of both n-gram and dependency language models .
Introduction
In addition, it is straightforward to integrate n-gram language models into phrase-based decoders in which translation always grows left-to-right.
Introduction
As a result, phrase-based decoders only need to maintain the boundary words on one end to calculate language model probabilities.
Introduction
Unfortunately, as syntax-based decoders often generate target-language words in a bottom-up way using the CKY algorithm, integrating n-gram language models becomes more expensive because they have to maintain target boundary words at both ends of a partial translation (Chiang, 2007; Huang and Chiang, 2007).
language model is mentioned in 15 sentences in this paper.
Topics mentioned in this paper:
Chong, Tze Yuang and E. Banchs, Rafael and Chng, Eng Siong and Li, Haizhou
Abstract
In this paper, we explore the use of distance and co-occurrence information of word—pairs for language modeling .
Introduction
Language models have been extensively studied in natural language processing.
Introduction
The role of a language model is to measure how probably a (target) word would occur based on some given evidence extracted from the history-context.
Language Modeling with TD and TO
A language model estimates word probabilities given their history, i.e.
Language Modeling with TD and TO
In order to define the TD and TO components for language modeling , we express the observation of an arbitrary history-word, wi_k at the kth position behind the target-word, as the joint of two events: i) the word wi_k occurs within the histo-ry-context: wi_k E h, and ii) it occurs at distance k from the target-word: A(wi_k) = k, (A: k for brevity); i.e.
Language Modeling with TD and TO
In fact, the TO model is closely related to the trigger language model (Rosenfeld 1996), as the prediction of the target-word (the triggered word) is based on the presence of a history-word (the trigger).
Motivation of the Proposed Approach
The attributes of distance and co-occurrence are exploited and modeled differently in each language modeling approach.
Related Work
Latent-semantic language model approaches (Bellegarda 1998, Coccaro 2005) weight word counts with TFIDF to highlight their semantic importance towards the prediction.
Related Work
Other approaches such as the class-based language model (Brown 1992, Kneser & Ney 1993)
Related Work
The structured language model (Chelba & J elinek 2000) determines the “heads” in the history-context by using a parsing tree.
language model is mentioned in 12 sentences in this paper.
Topics mentioned in this paper:
liu, lemao and Watanabe, Taro and Sumita, Eiichiro and Zhao, Tiejun
Introduction
Further, decoding with nonlocal (or state-dependent) features, such as a language model , is also a problem.
Introduction
Actually, even for the (log-) linear model, efficient decoding with the language model is not trivial (Chiang, 2007).
Introduction
For the nonlocal features such as the language model , Chiang (2007) proposed a cube-pruning method for efficient decoding.
language model is mentioned in 17 sentences in this paper.
Topics mentioned in this paper:
Roark, Brian and Allauzen, Cyril and Riley, Michael
Abstract
We present an algorithm for re-estimating parameters of backoff n-gram language models so as to preserve given marginal distributions, along the lines of well-known Kneser-Ney (1995) smoothing.
Introduction
Smoothed n-gram language models are the defacto standard statistical models of language for a wide range of natural language applications, including speech recognition and machine translation.
Introduction
)nstraints for language modeling
Introduction
As a result, statistical language models — an important component of many such applications — are often trained on very large corpora, then modified to fit within some pre-specified size bound.
Preliminaries
N-gram language models are typically presented mathematically in terms of words 212, the strings (histories) h that precede them, and the suffixes of the histories (backoffs) h’ that are used in the smoothing recursion.
Preliminaries
N-gram language models allow for a sparse representation, so that only a subset of the possible n-grams must be explicitly stored.
language model is mentioned in 16 sentences in this paper.
Topics mentioned in this paper:
Kauchak, David
Abstract
In this paper we examine language modeling for text simplification.
Abstract
Unlike some text-to-text translation tasks, text simplification is a monolingual translation task allowing for text in both the input and output domain to be used for training the language model .
Abstract
We explore the relationship between normal English and simplified English and compare language models trained on varying amounts of text from each.
Introduction
An important component of many text-to-text translation systems is the language model which predicts the likelihood of a text sequence being produced in the output language.
Introduction
In some problem domains, such as machine translation, the translation is between two distinct languages and the language model can only be trained on data in the output language.
Introduction
In these monolingual problems, text could be used from both the input and output domain to train a language model .
Related Work
If we view the normal data as out-of-domain data, then the problem of combining simple and normal data is similar to the language model domain adaption problem (Suzuki and Gao, 2005), in particular cross-domain adaptation (Bellegarda, 2004) where a domain-specific model is improved by incorporating additional general data.
language model is mentioned in 53 sentences in this paper.
Topics mentioned in this paper:
Sennrich, Rico and Schwenk, Holger and Aransa, Walid
Translation Model Architecture
We train a language model on the source language side of each of the n component bitexts, and compute an n-dimensional vector for each sentence by computing its entropy with each language model .
Translation Model Architecture
Our aim is not to discriminate between sentences that are more likely and unlikely in general, but to cluster on the basis of relative differences between the language model entropies.
Translation Model Architecture
While it is not the focus of this paper, we also evaluate language model adaptation.
language model is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Wu, Yuanbin and Ng, Hwee Tou
Experiments
As described in Section 3.2, the weight of each variable is a linear combination of the language model score, three classifier confidence scores, and three classifier disagreement scores.
Experiments
We use the Web 1T 5—gram corpus (Brants and Franz, 2006) to compute the language model score for a sentence.
Experiments
Finally, the language model score, classifier confidence scores, and classifier disagreement scores are normalized to take values in [0, 1], based on the H00 2011 development data.
Inference with First Order Variables
The language model score h(s’, LM) of 8’ based on a large web corpus;
Inference with First Order Variables
Next, to compute whpyg, we collect language model score and confidence scores from the article (ART), preposition (PREP), and noun number (NOUN) classifier, i.e., E = {ART, PREP, NOUN}.
Inference with Second Order Variables
When measuring the gain due to 21131213312 2 1 (change cat to cats), the weight wNoungmluml is likely to be small since A cats will get a low language model score, a low article classifier confidence score, and a low noun number classifier confidence score.
Related Work
Features used in classification include surrounding words, part-of—speech tags, language model scores (Gamon, 2010), and parse tree structures (Tetreault et al., 2010).
language model is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Zhu, Conghui and Watanabe, Taro and Sumita, Eiichiro and Zhao, Tiejun
Experiment
SRILM Toolkit (Stol-cke, 2002) is employed to train 4-gram language models on the Xinhua portion of Gigaword corpus, while for the IWLST2012 data set, only its training set is used.
Experiment
The similarity between the data from each domain and the test data is calculated using the perplexity measure with 5-gram language model .
Hierarchical Phrase Table Combination
Pitman-Yor process is also employed in n-gram language models which are hierarchically represented through the hierarchical Pitman-Yor process with switch priors to integrate different domains in all the levels (Wood and Teh, 2009).
Phrase Pair Extraction with Unsupervised Phrasal ITGs
Pbase is a base measure defined as a combination of the IBM Models in two directions and the unigram language models in both sides.
Related Work
The translation model and language model are primary components in SMT.
Related Work
Previous work proved successful in the use of large-scale data for language models from diverse domains (Brants et al., 2007; Schwenk and Koehn, 2008).
Related Work
Alternatively, the language model is incrementally updated by using a succinct data structure with a interpolation technique (Levenberg and Osborne, 2009; Levenberg et al., 2011).
language model is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Ravi, Sujith
Bayesian MT Decipherment via Hash Sampling
Secondly, for Bayesian inference we need to sample from a distribution that involves computing probabilities for all the components ( language model , translation model, fertility, etc.)
Bayesian MT Decipherment via Hash Sampling
Note that the (translation) model in our case consists of multiple exponential families components—a multinomial pertaining to the language model (which remains fixed5), and other components pertaining to translation probabilities P9(fi|ei), fertility ngert, etc.
Bayesian MT Decipherment via Hash Sampling
where, pold(-), pnew(-) are the true conditional likelihood probabilities according to our model (including the language model component) for the old, new sample respectively.
Decipherment Model for Machine Translation
For P(e), we use a word n-gram language model (LM) trained on monolingual target text.
Decipherment Model for Machine Translation
Generate a target (e.g., English) string 6 = 61.43;, with probability P (6) according to an n-gram language model .
Experiments and Results
The latter is used to construct a target language model used for decipherment training.
Experiments and Results
Overall, using a 3-gram language model (instead of 2-gram) for decipherment training improves the performance for all methods.
language model is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Hasegawa, Takayuki and Kaji, Nobuhiro and Yoshinaga, Naoki and Toyoda, Masashi
Eliciting Addressee’s Emotion
We use GIZA++8 and SRILM9 for learning translation model and 5-gram language model , re-
Eliciting Addressee’s Emotion
We use the emotion-tagged dialogue corpus to learn eight translation models and language models , each of which is specialized in generating the response that elicits one of the eight emotions (Plutchik, 1980).
Eliciting Addressee’s Emotion
In this case, the first two utterances are used to learn the translation model, while only the second utterance is used to learn the language model .
Experiments
Table 6: The number of utterance pairs used for training classifiers in emotion prediction and learning the translation models and language models in response generation.
Experiments
We use the utterance pairs summarized in Table 6 to learn the translation models and language models for eliciting each emotional category.
Related Work
The linear interpolation of translation and/or language models is a widely-used technique for adapting machine translation systems to new domains (Sennrich, 2012).
language model is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Zhang, Jiajun and Zong, Chengqing
Experiments
For the out-of-domain data, we build the phrase table and reordering table using the 2.08 million Chinese-to-English sentence pairs, and we use the SRILM toolkit (Stolcke, 2002) to train the 5-gram English language model with the target part of the parallel sentences and the Xinhua portion of the English Gigaword.
Experiments
An in-domain 5-gram English language model is trained with the target 1 million monolingual data.
Experiments
(2008) regards the in-domain lexicon with corpus translation probability as another phrase table and further use the in-domain language model besides the out-of-domain language model .
Probabilistic Bilingual Lexicon Acquisition
In order to assign probabilities to each entry, we apply the Corpus Translation Probability which used in (Wu et al., 2008): given an in-domain source language monolingual data, we translate this data with the phrase-based model trained on the out-of-domain News data, the in-domain lexicon and the in-domain target language monolingual data (for language model estimation).
Related Work
For the target-side monolingual data, they just use it to train language model , and for the source-side monolingual data, they employ a baseline (word-based SMT or phrase-based SMT trained with small-scale bitext) to first translate the source sentences, combining the source sentence and its target translation as a bilingual sentence pair, and then train a new phrase-base SMT with these pseudo sentence pairs.
language model is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Zhu, Muhua and Zhang, Yue and Chen, Wenliang and Zhang, Min and Zhu, Jingbo
Semi-supervised Parsing with Large Data
These relations are captured by word clustering, lexical dependencies, and a dependency language model , respectively.
Semi-supervised Parsing with Large Data
4.3 Structural Relations: Dependency Language Model
Semi-supervised Parsing with Large Data
The dependency language model is proposed by Shen et al.
language model is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Razmara, Majid and Siahbani, Maryam and Haffari, Reza and Sarkar, Anoop
Collocational Lexicon Induction
It has been used as word similarity measure in language modeling (Dagan et al., 1999).
Experiments & Results 4.1 Experimental Setup
For the end-to-end MT pipeline, we used Moses (Koehn et al., 2007) with these standard features: relative-frequency and lexical translation model (TM) probabilities in both directions; distortion model; language model (LM) and word count.
Experiments & Results 4.1 Experimental Setup
For the language model, we used the KenLM toolkit (Heafield, 2011) to create a 5-gram language model on the target side of the Europarl corpus (V7) with approximately 54M tokens with Kneser-Ney smoothing.
Experiments & Results 4.1 Experimental Setup
However, in an MT pipeline, the language model is supposed to rerank the hypotheses and move more appropriate translations (in terms of fluency) to the top of the list.
Introduction
Even noisy translation of oovs can aid the language model to better
language model is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Sajjad, Hassan and Darwish, Kareem and Belinkov, Yonatan
Conclusion
Also, we believe that improving English language modeling to match the genre of the translated sentences can have significant positive impact on translation quality.
Previous Work
They used two language models built from the English GigaWord corpus and from a large web crawl.
Previous Work
For language modeling , we used either EGen or the English side of the AR corpus plus the English side of NIST12 training data and English Gi-gaWord v5.
Previous Work
— B2-B4 systems used identical training data, namely EG, with the GW, EGen, or both for B2, B3, and B4 respectively for language modeling .
Proposed Methods 3.1 Egyptian to EG’ Conversion
Using both language models (52) led to slight improvement.
language model is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Wang, Lu and Raghavan, Hema and Castelli, Vittorio and Florian, Radu and Cardie, Claire
Experimental Setup
Beam size is fixed at 2000.4 Sentence compressions are evaluated by a 5-gram language model trained on Gigaword (Graff, 2003) by SRILM (Stolcke, 2002).
Sentence Compression
As the space of possible compressions is exponential in the number of leaves in the parse tree, instead of looking for the globally optimal solution, we use beam search to find a set of highly likely compressions and employ a language model trained on a large corpus for evaluation.
Sentence Compression
Given the N -best compressions from the decoder, we evaluate the yield of the trimmed trees using a language model trained on the Gigaword (Graff, 2003) corpus and return the compression with the highest probability.
Sentence Compression
Thus, the decoder is quite flexible — its learned scoring function allows us to incorporate features salient for sentence compression while its language model guarantees the linguistic quality of the compressed string.
language model is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Andreas, Jacob and Vlachos, Andreas and Clark, Stephen
Discussion
The first is the incorporation of a language model (or comparable long-distance structure-scoring model) to assign scores to predicted parses independent of the transformation model.
Experimental setup
The best symmetrization algorithm, translation and language model weights for each language are selected using cross-validation on the development set.
MT—based semantic parsing
In order to learn a semantic parser using MT we linearize the MRs, learn alignments between the MRL and the NL, extract translation rules, and learn a language model for the MRL.
MT—based semantic parsing
Language modeling In addition to translation rules learned from a parallel corpus, MT systems also rely on an n-gram language model for the target language, estimated from a (typically larger) monolingual corpus.
MT—based semantic parsing
In the case of SP, such a monolingual corpus is rarely available, and we instead use the MRs available in the training data to learn a language model of the MRL.
language model is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Braune, Fabienne and Seemann, Nina and Quernheim, Daniel and Maletti, Andreas
Decoding
The language model (LM) scoring is directly integrated into the cube pruning algorithm.
Decoding
Naturally, we also had to adjust hypothesis expansion and, most importantly, language model scoring inside the cube pruning algorithm.
Experiments
Our German 4-gram language model was trained on the German sentences in the training data augmented by the Stuttgart SdeWaC corpus (Web-as-Corpus Consortium, 2008), whose generation is detailed in (Baroni et al., 2009).
Translation Model
(1) The forward translation weight using the rule weights as described in Section 2 (2) The indirect translation weight using the rule weights as described in Section 2 (3) Lexical translation weight source —> target (4) Lexical translation weight target —> source (5) Target side language model (6) Number of words in the target sentences (7) Number of rules used in the pre-translation (8) Number of target side sequences; here k times the number of sequences used in the pre-translations that constructed 7' (gap penalty) The rule weights required for (l) are relative frequencies normalized over all rules with the same left-hand side.
Translation Model
The computation of the language model estimates for (6) is adapted to score partial translations consisting of discontiguous units.
language model is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
O'Connor, Brendan and Stewart, Brandon M. and Smith, Noah A.
Inference
After randomly initializing all 77k,8,7.,t, inference is performed by a blocked Gibbs sampler, alternating resamplings for three major groups of variables: the language model (z,gb), context model (07,7, [3, p), and the 77, 6 variables, which bottleneck between the submodels.
Inference
The language model sampler sequentially updates every za) (and implicitly gb via collapsing) in the manner of Griffiths and Steyvers (2004): p(z(i)|6, ma), 1)) oc 68,r,t,z(nw,z + b/V)/(nz + b), where counts 77 are for all event tuples besides 7'.
Model
0 Language model:
Model
Thus the language model is very similar to a topic model’s generation of token topics and wordtypes.
language model is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Nguyen, ThuyLinh and Vogel, Stephan
Experiment Results
The language model is the interpolation of 5-gram language models built from news corpora of the NIST 2012 evaluation.
Experiment Results
The language model is the trigram SRI language model built from Xinhua corpus of 180 millions words.
Experiment Results
The language model is three-gram SRILM trained from the target side of the training corpora.
Introduction
Many features are shared between phrase-based and tree-based systems including language model , word count, and translation model features.
language model is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Smith, Jason R. and Saint-Amand, Herve and Plamada, Magdalena and Koehn, Philipp and Callison-Burch, Chris and Lopez, Adam
Abstract
In all experiments we include the target side of the mined parallel data in the language model , in order to distinguish whether results are due to influences from parallel or monolingual data.
Abstract
In these experiments, we use 5-gram language models when the target language is English or German, and 4—gram language models for French and Spanish.
Abstract
The baseline system was trained using only the Europarl corpus (Koehn, 2005) as parallel data, and all experiments use the same language model trained on the target sides of Europarl, the English side of all linked Spanish-English Wikipedia articles, and the English side of the mined CommonCran data.
language model is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Deveaud, Romain and SanJuan, Eric and Bellot, Patrice
Introduction
The approach by Wei and Croft (2006) was the first to leverage LDA topics to improve the estimate of document language models and achieved good empirical results.
Topic-Driven Relevance Models
where 9 is a set of pseudo-relevant feedback documents and 6D is the language model of document D. This notion of estimating a query model is
Topic-Driven Relevance Models
We tackle the null probabilities problem by smoothing the document language model using the well-known Dirichlet smoothing (Zhai and Lafferty, 2004).
Topic-Driven Relevance Models
Instead of viewing 9 as a set of document language models that are likely to contain topical information about the query, we take a probabilistic topic modeling approach.
language model is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Cohn, Trevor and Haffari, Gholamreza
Experiments
In the end-to-end MT pipeline we use a standard set of features: relative-frequency and lexical translation model probabilities in both directions; distance-based distortion model; language model and word count.
Experiments
We train 3-gram language models using modified Kneser—Ney smoothing.
Experiments
For AR-EN experiments the language model is trained on English data as (Blunsom et al., 2009a), and for FA-EN and UR-EN the English data are the target sides of the bilingual training data.
Introduction
We develop a Bayesian approach using a Pitman-Yor process prior, which is capable of modelling a diverse range of geometrically decaying distributions over infinite event spaces (here translation phrase-pairs), an approach shown to be state of the art for language modelling (Teh, 2006).
language model is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Alfonseca, Enrique and Pighin, Daniele and Garrido, Guillermo
Experiment settings
o TopicSum: we use TopicSum (Haghighi and Vanderwende, 2009), a 3-layer hierarchical topic model, to infer the language model that is most central for the collection.
Experiment settings
divergence with respect the collection language model is the one chosen.
Related work
(2007) generate novel utterances by combining Prim’s maximum-spanning-tree algorithm with an n-gram language model to enforce fluency.
language model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Zhou, Guangyou and Liu, Fang and Liu, Yang and He, Shizhu and Zhao, Jun
Experiments
Row 1 and row 2 are two baseline systems, which model the relevance score using VSM (Cao et al., 2010) and language model (LM) (Zhai and Laf-ferty, 2001; Cao et al., 2010) in the term space.
Experiments
Row 3 is the word-based translation model (Jeon et al., 2005), and row 4 is the word-based translation language model, which linearly combines the word-based translation model and language model into a unified framework (Xue et al., 2008).
Experiments
(2009) in Table 3 because previous work (Ming et al., 2010) demonstrated that word-based translation language model (Xue et al., 2008) obtained the superior performance than the syntactic tree matching (Wang et al., 2009).
language model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Celikyilmaz, Asli and Hakkani-Tur, Dilek and Tur, Gokhan and Sarikaya, Ruhi
Markov Topic Regression - MTR
(19) Language Model Prior (77W): Probabilities on word transitions denoted as nw=p(wi=v|wi_1).
Markov Topic Regression - MTR
We built a language model using SRILM (Stol-cke, 2002) on the domain specific sources such as top wiki pages and blogs on online movie reviews, etc., to obtain the probabilities of domain-specific n-grams, up to 3-grams.
Markov Topic Regression - MTR
(l), we assume that the prior on the semantic tags, 773, is more indicative of the decision for sampling a w,- from a new tag compared to language model posteriors on word sequences, 77W.
language model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Zhai, Feifei and Zhang, Jiajun and Zhou, Yu and Zong, Chengqing
Experiment
We train a 5-gram language model with the Xinhua portion of English Gigaword corpus and target part of the training data.
Integrating into the PAS-based Translation Framework
The weights of the MEPD feature can be tuned by MERT (Och, 2003) together with other translation features, such as language model .
PAS-based Translation Framework
The target-side-like PAS is selected only according to the language model and translation probabilities, without considering any context information of PAS.
language model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Yang, Nan and Liu, Shujie and Li, Mu and Zhou, Ming and Yu, Nenghai
Related Work
(Bengio et al., 2006) proposed to use multilayer neural network for language modeling task.
Related Work
(Niehues and Waibel, 2012) shows that machine translation results can be improved by combining neural language model with n-gram traditional language.
Related Work
(Son et al., 2012) improves translation quality of n- gram translation model by using a bilingual neural language model .
language model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Tomeh, Nadi and Habash, Nizar and Roth, Ryan and Farra, Noura and Dasigi, Pradeep and Diab, Mona
Abstract
Optical Character Recognition (OCR) systems for Arabic rely on information contained in the scanned images to recognize sequences of characters and on language models to emphasize fluency.
Discriminative Reranking for OCR
The LM models are built using the SRI Language Modeling Toolkit (Stolcke, 2002).
Introduction
The BBN Byblos OCR system (Natajan et al., 2002; Prasad et al., 2008; Saleem et al., 2009), which we use in this paper, relies on a hidden Markov model (HMM) to recover the sequence of characters from the image, and uses an n-gram language model (LM) to emphasize the fluency of the output.
language model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Elfardy, Heba and Diab, Mona
Approach to Sentence-Level Dialect Identification
The aforementioned approach relies on language models (LM) and MSA and EDA Morphological Analyzer to decide whether each word is (a) MSA, (b) EDA, (c) Both (MSA & EDA) or (d) OOV.
Approach to Sentence-Level Dialect Identification
The perplexity of a language model on a given test sentence; S(w1, .., wn) is defined as:
Related Work
Amazon Mechanical Turk and try a language modeling (LM) approach to solve the problem.
language model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Feng, Yang and Cohn, Trevor
Experiments
The language model is a 3-gram language model trained using the SRILM toolkit (Stolcke, 2002) on the English side of the training data.
Experiments
The language model is a 3-gram LM trained on Xinhua portion of the Gigaword corpus using the SRILM toolkit with modified Kneser—Ney smoothing.
Related Work
(2011) develop a bilingual language model which incorporates words in the source and target languages to predict the next unit, which they use as a feature in a translation system.
language model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Goto, Isao and Utiyama, Masao and Sumita, Eiichiro and Tamura, Akihiro and Kurohashi, Sadao
Experiment
We used 5-gram language models that were trained using the English side of each set of bilingual training data.
Experiment
The common SMT feature set consists of: four translation model features, phrase penalty, word penalty, and a language model feature.
Introduction
1A language model also supports the estimation.
language model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Hagiwara, Masato and Sekine, Satoshi
Introduction
Our approach is based on semi-Markov discriminative structure prediction, and it incorporates English back-transliteration and English language models (LMs) into WS in a seamless way.
Use of Language Model
Language Model Augmentation Analogous to Koehn and Knight (2003), we can exploit the fact that l/‘yF‘ reddo (red) in the example ffiayvnI/W‘ is such a common word that one can expect it appears frequently in the training corpus.
Use of Language Model
4.1 Language Model Projection
language model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Ling, Wang and Xiang, Guang and Dyer, Chris and Black, Alan and Trancoso, Isabel
Experiments
For this test set, we used 8 million sentences from the full NIST parallel dataset as the language model training data.
Experiments
If either the source or the target sides of the a training instance had an edit distance of less than 10%, we removed it.4 As for the language models, we collected a further 10M tweets from Twitter for the English language model and another 10M tweets from Weibo for the Chinese language model .
Experiments
As the language model , we use a 5-gram model with Kneser—Ney smoothing.
language model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: