Index of papers in Proc. ACL 2013 that mention

language model

Seen in text as:

language model (213)
language models (88)
language modeling (21)
Language Model (10)
Language model (4)
Language Modeling (3)

Seen in 314 sentences in 36 papers.

1. Decipherment Complexity in 1:1 Substitution Ciphers

Nuhn, Malte and Ney, Hermann

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	In this paper we show that even for the case of 1:1 substitution ciphers—which encipher plaintext symbols by exchanging them with a unique substitute—finding the optimal decipherment with respect to a bigram language model is NP-hard.
Definitions	denotes the language model .
Definitions	Depending on the structure of the language model Equation 2 can be further simplified.
Definitions	Similarly, we define language model matrices S for the unigram and the bigram case.
Introduction	The general idea is to find those translation model parameters that maximize the probability of the translations of a given source text in a given language model of the target language.
Introduction	This might be related to the fact that a statistical formulation of the decipherment problem has not been analyzed with respect to n-gram language models : This paper shows the close relationship of the decipherment problem to the quadratic assignment problem.
Introduction	In Section 4 we show that decipherment using a unigram language model corresponds to solving a linear sum assignment problem (LSAP).
Related Work	gram language model .

language model is mentioned in 20 sentences in this paper.

Topics mentioned in this paper:

2. Unsupervised Transcription of Historical Documents

Berg-Kirkpatrick, Taylor and Durrett, Greg and Klein, Dan

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Then, Tesseract uses a classifier, aided by a word-unigram language model , to recognize whole words.
Experiments	6.3 Language Model
Learning	The number of states in the dynamic programming lattice grows exponentially with the order of the language model (Jelinek, 1998; Koehn, 2004).
Learning	As a result, inference can become slow when the language model order n is large.
Learning	On each iteration of EM, we perform two passes: a coarse pass using a low-order language model, and a fine pass using a high-order language model (Petrov et al., 2008; Zhang and Gildea, 2008).
Model	P(E, T, R, X) = P(E) [ Language model ] - P(T\|E) [Typesetting model] - P(R) [Inking model] - P (X \|E, T, R) [Noise model]
Model	3.1 Language Model P(E)
Model	Our language model , P(E), is a Kneser-Ney smoothed character n-gram model (Kneser and Ney, 1995).
Related Work	Work that has directly addressed historical documents has done so using a pipelined approach, and without fully integrating a strong language model (Vamvakas et al., 2008; Kluzner et al., 2009; Kae et al., 2010; Kluzner et al., 2011).
Related Work	They integrated typesetting models with language models , but did not model noise.
Related Work	Our approach is also similar in that we use a strong language model (in conjunction with the constraint that the correspondence be regular) to learn the correct mapping.

language model is mentioned in 27 sentences in this paper.

Topics mentioned in this paper:

3. Reconstructing an Indo-European Family Tree from Non-native English Texts

Nagata, Ryo and Whittaker, Edward

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Approach	In his method, a variety of languages are modeled by their spelling systems (i.e., character-based n-gram language models ).
Approach	Then, agglomerative hierarchical clustering is applied to the language models to reconstruct a language family tree.
Approach	The similarity used for clustering is based on a divergence-like distance between two language models that was originally proposed by Juang and Rabiner (1985).
Methods	Similarly, let M,- be a language model trained using Di.
Methods	2, we use an n-gram language model based on a mixture of word and POS tokens instead of a simple word-based language model .
Methods	In this language model , content words in n-grams are replaced with their corresponding POS tags.

language model is mentioned in 20 sentences in this paper.

Topics mentioned in this paper:

4. A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation

Liu, Yang

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	As the algorithm generates dependency trees for partial translations left-to-right in decoding, it allows for efficient integration of both n-gram and dependency language models .
Introduction	In addition, it is straightforward to integrate n-gram language models into phrase-based decoders in which translation always grows left-to-right.
Introduction	As a result, phrase-based decoders only need to maintain the boundary words on one end to calculate language model probabilities.
Introduction	Unfortunately, as syntax-based decoders often generate target-language words in a bottom-up way using the CKY algorithm, integrating n-gram language models becomes more expensive because they have to maintain target boundary words at both ends of a partial translation (Chiang, 2007; Huang and Chiang, 2007).

language model is mentioned in 15 sentences in this paper.

Topics mentioned in this paper:

5. Modeling of term-distance and term-occurrence information for improving n-gram language model performance

Chong, Tze Yuang and E. Banchs, Rafael and Chng, Eng Siong and Li, Haizhou

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	In this paper, we explore the use of distance and co-occurrence information of word—pairs for language modeling .
Introduction	Language models have been extensively studied in natural language processing.
Introduction	The role of a language model is to measure how probably a (target) word would occur based on some given evidence extracted from the history-context.
Language Modeling with TD and TO	A language model estimates word probabilities given their history, i.e.
Language Modeling with TD and TO	In order to define the TD and TO components for language modeling , we express the observation of an arbitrary history-word, wi_k at the kth position behind the target-word, as the joint of two events: i) the word wi_k occurs within the histo-ry-context: wi_k E h, and ii) it occurs at distance k from the target-word: A(wi_k) = k, (A: k for brevity); i.e.
Language Modeling with TD and TO	In fact, the TO model is closely related to the trigger language model (Rosenfeld 1996), as the prediction of the target-word (the triggered word) is based on the presence of a history-word (the trigger).
Motivation of the Proposed Approach	The attributes of distance and co-occurrence are exploited and modeled differently in each language modeling approach.
Related Work	Latent-semantic language model approaches (Bellegarda 1998, Coccaro 2005) weight word counts with TFIDF to highlight their semantic importance towards the prediction.
Related Work	Other approaches such as the class-based language model (Brown 1992, Kneser & Ney 1993)
Related Work	The structured language model (Chelba & J elinek 2000) determines the “heads” in the history-context by using a parsing tree.

language model is mentioned in 12 sentences in this paper.

Topics mentioned in this paper:

6. Additive Neural Networks for Statistical Machine Translation

liu, lemao and Watanabe, Taro and Sumita, Eiichiro and Zhao, Tiejun

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	Further, decoding with nonlocal (or state-dependent) features, such as a language model , is also a problem.
Introduction	Actually, even for the (log-) linear model, efficient decoding with the language model is not trivial (Chiang, 2007).
Introduction	For the nonlocal features such as the language model , Chiang (2007) proposed a cube-pruning method for efficient decoding.

language model is mentioned in 17 sentences in this paper.

Topics mentioned in this paper:

7. Smoothed marginal distribution constraints for language modeling

Roark, Brian and Allauzen, Cyril and Riley, Michael

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We present an algorithm for re-estimating parameters of backoff n-gram language models so as to preserve given marginal distributions, along the lines of well-known Kneser-Ney (1995) smoothing.
Introduction	Smoothed n-gram language models are the defacto standard statistical models of language for a wide range of natural language applications, including speech recognition and machine translation.
Introduction	)nstraints for language modeling
Introduction	As a result, statistical language models — an important component of many such applications — are often trained on very large corpora, then modified to fit within some pre-specified size bound.
Preliminaries	N-gram language models are typically presented mathematically in terms of words 212, the strings (histories) h that precede them, and the suffixes of the histories (backoffs) h’ that are used in the smoothing recursion.
Preliminaries	N-gram language models allow for a sparse representation, so that only a subset of the possible n-grams must be explicitly stored.

language model is mentioned in 16 sentences in this paper.

Topics mentioned in this paper:

8. Improving Text Simplification Language Modeling Using Unsimplified Text Data

Kauchak, David

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	In this paper we examine language modeling for text simplification.
Abstract	Unlike some text-to-text translation tasks, text simplification is a monolingual translation task allowing for text in both the input and output domain to be used for training the language model .
Abstract	We explore the relationship between normal English and simplified English and compare language models trained on varying amounts of text from each.
Introduction	An important component of many text-to-text translation systems is the language model which predicts the likelihood of a text sequence being produced in the output language.
Introduction	In some problem domains, such as machine translation, the translation is between two distinct languages and the language model can only be trained on data in the output language.
Introduction	In these monolingual problems, text could be used from both the input and output domain to train a language model .
Related Work	If we view the normal data as out-of-domain data, then the problem of combining simple and normal data is similar to the language model domain adaption problem (Suzuki and Gao, 2005), in particular cross-domain adaptation (Bellegarda, 2004) where a domain-specific model is improved by incorporating additional general data.

language model is mentioned in 53 sentences in this paper.

Topics mentioned in this paper:

9. A Multi-Domain Translation Model Framework for Statistical Machine Translation

Sennrich, Rico and Schwenk, Holger and Aransa, Walid

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Translation Model Architecture	We train a language model on the source language side of each of the n component bitexts, and compute an n-dimensional vector for each sentence by computing its entropy with each language model .
Translation Model Architecture	Our aim is not to discriminate between sentences that are more likely and unlikely in general, but to cluster on the basis of relative differences between the language model entropies.
Translation Model Architecture	While it is not the focus of this paper, we also evaluate language model adaptation.

language model is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

10. Grammatical Error Correction Using Integer Linear Programming

Wu, Yuanbin and Ng, Hwee Tou

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	As described in Section 3.2, the weight of each variable is a linear combination of the language model score, three classifier confidence scores, and three classifier disagreement scores.
Experiments	We use the Web 1T 5—gram corpus (Brants and Franz, 2006) to compute the language model score for a sentence.
Experiments	Finally, the language model score, classifier confidence scores, and classifier disagreement scores are normalized to take values in [0, 1], based on the H00 2011 development data.
Inference with First Order Variables	The language model score h(s’, LM) of 8’ based on a large web corpus;
Inference with First Order Variables	Next, to compute whpyg, we collect language model score and confidence scores from the article (ART), preposition (PREP), and noun number (NOUN) classifier, i.e., E = {ART, PREP, NOUN}.
Inference with Second Order Variables	When measuring the gain due to 21131213312 2 1 (change cat to cats), the weight wNoungmluml is likely to be small since A cats will get a low language model score, a low article classifier confidence score, and a low noun number classifier confidence score.
Related Work	Features used in classification include surrounding words, part-of—speech tags, language model scores (Gamon, 2010), and parse tree structures (Tetreault et al., 2010).

language model is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

11. Hierarchical Phrase Table Combination for Machine Translation

Zhu, Conghui and Watanabe, Taro and Sumita, Eiichiro and Zhao, Tiejun

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiment	SRILM Toolkit (Stol-cke, 2002) is employed to train 4-gram language models on the Xinhua portion of Gigaword corpus, while for the IWLST2012 data set, only its training set is used.
Experiment	The similarity between the data from each domain and the test data is calculated using the perplexity measure with 5-gram language model .
Hierarchical Phrase Table Combination	Pitman-Yor process is also employed in n-gram language models which are hierarchically represented through the hierarchical Pitman-Yor process with switch priors to integrate different domains in all the levels (Wood and Teh, 2009).
Phrase Pair Extraction with Unsupervised Phrasal ITGs	Pbase is a base measure defined as a combination of the IBM Models in two directions and the unigram language models in both sides.
Related Work	The translation model and language model are primary components in SMT.
Related Work	Previous work proved successful in the use of large-scale data for language models from diverse domains (Brants et al., 2007; Schwenk and Koehn, 2008).
Related Work	Alternatively, the language model is incrementally updated by using a succinct data structure with a interpolation technique (Levenberg and Osborne, 2009; Levenberg et al., 2011).

language model is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

12. Scalable Decipherment for Machine Translation via Hash Sampling

Ravi, Sujith

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Bayesian MT Decipherment via Hash Sampling	Secondly, for Bayesian inference we need to sample from a distribution that involves computing probabilities for all the components ( language model , translation model, fertility, etc.)
Bayesian MT Decipherment via Hash Sampling	Note that the (translation) model in our case consists of multiple exponential families components—a multinomial pertaining to the language model (which remains fixed5), and other components pertaining to translation probabilities P9(fi\|ei), fertility ngert, etc.
Bayesian MT Decipherment via Hash Sampling	where, pold(-), pnew(-) are the true conditional likelihood probabilities according to our model (including the language model component) for the old, new sample respectively.
Decipherment Model for Machine Translation	For P(e), we use a word n-gram language model (LM) trained on monolingual target text.
Decipherment Model for Machine Translation	Generate a target (e.g., English) string 6 = 61.43;, with probability P (6) according to an n-gram language model .
Experiments and Results	The latter is used to construct a target language model used for decipherment training.
Experiments and Results	Overall, using a 3-gram language model (instead of 2-gram) for decipherment training improves the performance for all methods.

language model is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

13. Predicting and Eliciting Addressee's Emotion in Online Dialogue

Hasegawa, Takayuki and Kaji, Nobuhiro and Yoshinaga, Naoki and Toyoda, Masashi

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Eliciting Addressee’s Emotion	We use GIZA++8 and SRILM9 for learning translation model and 5-gram language model , re-
Eliciting Addressee’s Emotion	We use the emotion-tagged dialogue corpus to learn eight translation models and language models , each of which is specialized in generating the response that elicits one of the eight emotions (Plutchik, 1980).
Eliciting Addressee’s Emotion	In this case, the first two utterances are used to learn the translation model, while only the second utterance is used to learn the language model .
Experiments	Table 6: The number of utterance pairs used for training classifiers in emotion prediction and learning the translation models and language models in response generation.
Experiments	We use the utterance pairs summarized in Table 6 to learn the translation models and language models for eliciting each emotional category.
Related Work	The linear interpolation of translation and/or language models is a widely-used technique for adapting machine translation systems to new domains (Sennrich, 2012).

language model is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

14. Learning a Phrase-based Translation Model from Monolingual Data with Application to Domain Adaptation

Zhang, Jiajun and Zong, Chengqing

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	For the out-of-domain data, we build the phrase table and reordering table using the 2.08 million Chinese-to-English sentence pairs, and we use the SRILM toolkit (Stolcke, 2002) to train the 5-gram English language model with the target part of the parallel sentences and the Xinhua portion of the English Gigaword.
Experiments	An in-domain 5-gram English language model is trained with the target 1 million monolingual data.
Experiments	(2008) regards the in-domain lexicon with corpus translation probability as another phrase table and further use the in-domain language model besides the out-of-domain language model .
Probabilistic Bilingual Lexicon Acquisition	In order to assign probabilities to each entry, we apply the Corpus Translation Probability which used in (Wu et al., 2008): given an in-domain source language monolingual data, we translate this data with the phrase-based model trained on the out-of-domain News data, the in-domain lexicon and the in-domain target language monolingual data (for language model estimation).
Related Work	For the target-side monolingual data, they just use it to train language model , and for the source-side monolingual data, they employ a baseline (word-based SMT or phrase-based SMT trained with small-scale bitext) to first translate the source sentences, combining the source sentence and its target translation as a bilingual sentence pair, and then train a new phrase-base SMT with these pseudo sentence pairs.

language model is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

15. Fast and Accurate Shift-Reduce Constituent Parsing

Zhu, Muhua and Zhang, Yue and Chen, Wenliang and Zhang, Min and Zhu, Jingbo

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Semi-supervised Parsing with Large Data	These relations are captured by word clustering, lexical dependencies, and a dependency language model , respectively.
Semi-supervised Parsing with Large Data	4.3 Structural Relations: Dependency Language Model
Semi-supervised Parsing with Large Data	The dependency language model is proposed by Shen et al.

language model is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

16. Graph Propagation for Paraphrasing Out-of-Vocabulary Words in Statistical Machine Translation

Razmara, Majid and Siahbani, Maryam and Haffari, Reza and Sarkar, Anoop

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Collocational Lexicon Induction	It has been used as word similarity measure in language modeling (Dagan et al., 1999).
Experiments & Results 4.1 Experimental Setup	For the end-to-end MT pipeline, we used Moses (Koehn et al., 2007) with these standard features: relative-frequency and lexical translation model (TM) probabilities in both directions; distortion model; language model (LM) and word count.
Experiments & Results 4.1 Experimental Setup	For the language model, we used the KenLM toolkit (Heafield, 2011) to create a 5-gram language model on the target side of the Europarl corpus (V7) with approximately 54M tokens with Kneser-Ney smoothing.
Experiments & Results 4.1 Experimental Setup	However, in an MT pipeline, the language model is supposed to rerank the hypotheses and move more appropriate translations (in terms of fluency) to the top of the list.
Introduction	Even noisy translation of oovs can aid the language model to better

language model is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

17. Translating Dialectal Arabic to English

Sajjad, Hassan and Darwish, Kareem and Belinkov, Yonatan

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion	Also, we believe that improving English language modeling to match the genre of the translated sentences can have significant positive impact on translation quality.
Previous Work	They used two language models built from the English GigaWord corpus and from a large web crawl.
Previous Work	For language modeling , we used either EGen or the English side of the AR corpus plus the English side of NIST12 training data and English Gi-gaWord v5.
Previous Work	— B2-B4 systems used identical training data, namely EG, with the GW, EGen, or both for B2, B3, and B4 respectively for language modeling .
Proposed Methods 3.1 Egyptian to EG’ Conversion	Using both language models (52) led to slight improvement.

language model is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

BLEU (13)
parallel data (9)
LM (8)

18. A Sentence Compression Based Framework to Query-Focused Multi-Document Summarization

Wang, Lu and Raghavan, Hema and Castelli, Vittorio and Florian, Radu and Cardie, Claire

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Setup	Beam size is fixed at 2000.4 Sentence compressions are evaluated by a 5-gram language model trained on Gigaword (Graff, 2003) by SRILM (Stolcke, 2002).
Sentence Compression	As the space of possible compressions is exponential in the number of leaves in the parse tree, instead of looking for the globally optimal solution, we use beam search to find a set of highly likely compressions and employ a language model trained on a large corpus for evaluation.
Sentence Compression	Given the N -best compressions from the decoder, we evaluate the yield of the trimmed trees using a language model trained on the Gigaword (Graff, 2003) corpus and return the compression with the highest probability.
Sentence Compression	Thus, the decoder is quite flexible — its learned scoring function allows us to incorporate features salient for sentence compression while its language model guarantees the linguistic quality of the compressed string.

language model is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

19. Semantic Parsing as Machine Translation

Andreas, Jacob and Vlachos, Andreas and Clark, Stephen

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Discussion	The first is the incorporation of a language model (or comparable long-distance structure-scoring model) to assign scores to predicted parses independent of the transformation model.
Experimental setup	The best symmetrization algorithm, translation and language model weights for each language are selected using cross-validation on the development set.
MT—based semantic parsing	In order to learn a semantic parser using MT we linearize the MRs, learn alignments between the MRL and the NL, extract translation rules, and learn a language model for the MRL.
MT—based semantic parsing	Language modeling In addition to translation rules learned from a parallel corpus, MT systems also rely on an n-gram language model for the target language, estimated from a (typically larger) monolingual corpus.
MT—based semantic parsing	In the case of SP, such a monolingual corpus is rarely available, and we instead use the MRs available in the training data to learn a language model of the MRL.

language model is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

20. Shallow Local Multi-Bottom-up Tree Transducers in Statistical Machine Translation

Braune, Fabienne and Seemann, Nina and Quernheim, Daniel and Maletti, Andreas

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Decoding	The language model (LM) scoring is directly integrated into the cube pruning algorithm.
Decoding	Naturally, we also had to adjust hypothesis expansion and, most importantly, language model scoring inside the cube pruning algorithm.
Experiments	Our German 4-gram language model was trained on the German sentences in the training data augmented by the Stuttgart SdeWaC corpus (Web-as-Corpus Consortium, 2008), whose generation is detailed in (Baroni et al., 2009).
Translation Model	(1) The forward translation weight using the rule weights as described in Section 2 (2) The indirect translation weight using the rule weights as described in Section 2 (3) Lexical translation weight source —> target (4) Lexical translation weight target —> source (5) Target side language model (6) Number of words in the target sentences (7) Number of rules used in the pre-translation (8) Number of target side sequences; here k times the number of sequences used in the pre-translations that constructed 7' (gap penalty) The rule weights required for (l) are relative frequencies normalized over all rules with the same left-hand side.
Translation Model	The computation of the language model estimates for (6) is adapted to score partial translations consisting of discontiguous units.

language model is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

21. Learning to Extract International Relations from Political Context

O'Connor, Brendan and Stewart, Brandon M. and Smith, Noah A.

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Inference	After randomly initializing all 77k,8,7.,t, inference is performed by a blocked Gibbs sampler, alternating resamplings for three major groups of variables: the language model (z,gb), context model (07,7, [3, p), and the 77, 6 variables, which bottleneck between the submodels.
Inference	The language model sampler sequentially updates every za) (and implicitly gb via collapsing) in the manner of Griffiths and Steyvers (2004): p(z(i)\|6, ma), 1)) oc 68,r,t,z(nw,z + b/V)/(nz + b), where counts 77 are for all event tuples besides 7'.
Model	0 Language model:
Model	Thus the language model is very similar to a topic model’s generation of token topics and wordtypes.

language model is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

22. Integrating Phrase-based Reordering Features into a Chart-based Decoder for Machine Translation

Nguyen, ThuyLinh and Vogel, Stephan

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiment Results	The language model is the interpolation of 5-gram language models built from news corpora of the NIST 2012 evaluation.
Experiment Results	The language model is the trigram SRI language model built from Xinhua corpus of 180 millions words.
Experiment Results	The language model is three-gram SRILM trained from the target side of the training corpora.
Introduction	Many features are shared between phrase-based and tree-based systems including language model , word count, and translation model features.

language model is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

23. Dirt Cheap Web-Scale Parallel Text from the Common Crawl

Smith, Jason R. and Saint-Amand, Herve and Plamada, Magdalena and Koehn, Philipp and Callison-Burch, Chris and Lopez, Adam

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	In all experiments we include the target side of the mined parallel data in the language model , in order to distinguish whether results are due to influences from parallel or monolingual data.
Abstract	In these experiments, we use 5-gram language models when the target language is English or German, and 4—gram language models for French and Spanish.
Abstract	The baseline system was trained using only the Europarl corpus (Koehn, 2005) as parallel data, and all experiments use the same language model trained on the target sides of Europarl, the English side of all linked Spanish-English Wikipedia articles, and the English side of the mined CommonCran data.

language model is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

24. Are Semantically Coherent Topic Models Useful for Ad Hoc Information Retrieval?

Deveaud, Romain and SanJuan, Eric and Bellot, Patrice

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	The approach by Wei and Croft (2006) was the first to leverage LDA topics to improve the estimate of document language models and achieved good empirical results.
Topic-Driven Relevance Models	where 9 is a set of pseudo-relevant feedback documents and 6D is the language model of document D. This notion of estimating a query model is
Topic-Driven Relevance Models	We tackle the null probabilities problem by smoothing the document language model using the well-known Dirichlet smoothing (Zhai and Lafferty, 2004).
Topic-Driven Relevance Models	Instead of viewing 9 as a set of document language models that are likely to contain topical information about the query, we take a probabilistic topic modeling approach.

language model is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

25. An Infinite Hierarchical Bayesian Model of Phrasal Translation

Cohn, Trevor and Haffari, Gholamreza

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	In the end-to-end MT pipeline we use a standard set of features: relative-frequency and lexical translation model probabilities in both directions; distance-based distortion model; language model and word count.
Experiments	We train 3-gram language models using modified Kneser—Ney smoothing.
Experiments	For AR-EN experiments the language model is trained on English data as (Blunsom et al., 2009a), and for FA-EN and UR-EN the English data are the target sides of the bilingual training data.
Introduction	We develop a Bayesian approach using a Pitman-Yor process prior, which is capable of modelling a diverse range of geometrically decaying distributions over infinite event spaces (here translation phrase-pairs), an approach shown to be state of the art for language modelling (Teh, 2006).

language model is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

26. HEADY: News headline abstraction through event pattern clustering

Alfonseca, Enrique and Pighin, Daniele and Garrido, Guillermo

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiment settings	o TopicSum: we use TopicSum (Haghighi and Vanderwende, 2009), a 3-layer hierarchical topic model, to infer the language model that is most central for the collection.
Experiment settings	divergence with respect the collection language model is the one chosen.
Related work	(2007) generate novel utterances by combining Prim’s maximum-spanning-tree algorithm with an n-gram language model to enforce fluency.

language model is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

27. Statistical Machine Translation Improves Question Retrieval in Community Question Answering via Matrix Factorization

Zhou, Guangyou and Liu, Fang and Liu, Yang and He, Shizhu and Zhao, Jun

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Row 1 and row 2 are two baseline systems, which model the relevance score using VSM (Cao et al., 2010) and language model (LM) (Zhai and Laf-ferty, 2001; Cao et al., 2010) in the term space.
Experiments	Row 3 is the word-based translation model (Jeon et al., 2005), and row 4 is the word-based translation language model, which linearly combines the word-based translation model and language model into a unified framework (Xue et al., 2008).
Experiments	(2009) in Table 3 because previous work (Ming et al., 2010) demonstrated that word-based translation language model (Xue et al., 2008) obtained the superior performance than the syntactic tree matching (Wang et al., 2009).

language model is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

28. Semi-Supervised Semantic Tagging of Conversational Understanding using Markov Topic Regression

Celikyilmaz, Asli and Hakkani-Tur, Dilek and Tur, Gokhan and Sarikaya, Ruhi

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Markov Topic Regression - MTR	(19) Language Model Prior (77W): Probabilities on word transitions denoted as nw=p(wi=v\|wi_1).
Markov Topic Regression - MTR	We built a language model using SRILM (Stol-cke, 2002) on the domain specific sources such as top wiki pages and blogs on online movie reviews, etc., to obtain the probabilities of domain-specific n-grams, up to 3-grams.
Markov Topic Regression - MTR	(l), we assume that the prior on the semantic tags, 773, is more indicative of the decision for sampling a w,- from a new tag compared to language model posteriors on word sequences, 77W.

language model is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

29. Handling Ambiguities of Bilingual Predicate-Argument Structures for Statistical Machine Translation

Zhai, Feifei and Zhang, Jiajun and Zhou, Yu and Zong, Chengqing

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiment	We train a 5-gram language model with the Xinhua portion of English Gigaword corpus and target part of the training data.
Integrating into the PAS-based Translation Framework	The weights of the MEPD feature can be tuned by MERT (Och, 2003) together with other translation features, such as language model .
PAS-based Translation Framework	The target-side-like PAS is selected only according to the language model and translation probabilities, without considering any context information of PAS.

language model is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

30. Word Alignment Modeling with Context Dependent Deep Neural Network

Yang, Nan and Liu, Shujie and Li, Mu and Zhou, Ming and Yu, Nenghai

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Related Work	(Bengio et al., 2006) proposed to use multilayer neural network for language modeling task.
Related Work	(Niehues and Waibel, 2012) shows that machine translation results can be improved by combining neural language model with n-gram traditional language.
Related Work	(Son et al., 2012) improves translation quality of n- gram translation model by using a bilingual neural language model .

language model is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

31. Reranking with Linguistic and Semantic Features for Arabic Optical Character Recognition

Tomeh, Nadi and Habash, Nizar and Roth, Ryan and Farra, Noura and Dasigi, Pradeep and Diab, Mona

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Optical Character Recognition (OCR) systems for Arabic rely on information contained in the scanned images to recognize sequences of characters and on language models to emphasize fluency.
Discriminative Reranking for OCR	The LM models are built using the SRI Language Modeling Toolkit (Stolcke, 2002).
Introduction	The BBN Byblos OCR system (Natajan et al., 2002; Prasad et al., 2008; Saleem et al., 2009), which we use in this paper, relies on a hidden Markov model (HMM) to recover the sequence of characters from the image, and uses an n-gram language model (LM) to emphasize the fluency of the output.

language model is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

reranking (14)
LM (9)
error rate (4)

32. Sentence Level Dialect Identification in Arabic

Elfardy, Heba and Diab, Mona

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Approach to Sentence-Level Dialect Identification	The aforementioned approach relies on language models (LM) and MSA and EDA Morphological Analyzer to decide whether each word is (a) MSA, (b) EDA, (c) Both (MSA & EDA) or (d) OOV.
Approach to Sentence-Level Dialect Identification	The perplexity of a language model on a given test sentence; S(w1, .., wn) is defined as:
Related Work	Amazon Mechanical Turk and try a language modeling (LM) approach to solve the problem.

language model is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

33. A Markov Model of Machine Translation using Non-parametric Bayesian Inference

Feng, Yang and Cohn, Trevor

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	The language model is a 3-gram language model trained using the SRILM toolkit (Stolcke, 2002) on the English side of the training data.
Experiments	The language model is a 3-gram LM trained on Xinhua portion of the Gigaword corpus using the SRILM toolkit with modified Kneser—Ney smoothing.
Related Work	(2011) develop a bilingual language model which incorporates words in the source and target languages to predict the next unit, which they use as a feature in a translation system.

language model is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

34. Distortion Model Considering Rich Context for Statistical Machine Translation

Goto, Isao and Utiyama, Masao and Sumita, Eiichiro and Tamura, Akihiro and Kurohashi, Sadao

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiment	We used 5-gram language models that were trained using the English side of each set of bilingual training data.
Experiment	The common SMT feature set consists of: four translation model features, phrase penalty, word penalty, and a language model feature.
Introduction	1A language model also supports the estimation.

language model is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

35. Accurate Word Segmentation using Transliteration and Language Model Projection

Hagiwara, Masato and Sekine, Satoshi

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	Our approach is based on semi-Markov discriminative structure prediction, and it incorporates English back-transliteration and English language models (LMs) into WS in a seamless way.
Use of Language Model	Language Model Augmentation Analogous to Koehn and Knight (2003), we can exploit the fact that l/‘yF‘ reddo (red) in the example ffiayvnI/W‘ is such a common word that one can expect it appears frequently in the training corpus.
Use of Language Model	4.1 Language Model Projection

language model is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

LM (19)
PoS tags (4)
bigram (3)

36. Microblogs as Parallel Corpora

Ling, Wang and Xiang, Guang and Dyer, Chris and Black, Alan and Trancoso, Isabel

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	For this test set, we used 8 million sentences from the full NIST parallel dataset as the language model training data.
Experiments	If either the source or the target sides of the a training instance had an edit distance of less than 10%, we removed it.4 As for the language models, we collected a further 10M tweets from Twitter for the English language model and another 10M tweets from Weibo for the Chinese language model .
Experiments	As the language model , we use a 5-gram model with Kneser—Ney smoothing.

language model is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: