Index of papers in Proc. ACL that mention

Seen in text as:

LM (536)
lm (24)
+LM (7)
Lm (5)
(3)

Seen in 497 sentences in 57 papers.

1. Resolving Personal Names in Email Using Context Expansion

Elsayed, Tamer and Oard, Douglas W. and Namata, Galileo

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Mention Resolution Approach	We define a mention m as a tuple < lm, em >, where lm is the “literal” string of characters that represents m and em is the email where m is observed.1 We assume that m can be resolved to a distinguishable participant for whom at least one email address is present in the collection.2
Mention Resolution Approach	Select a specific lexical reference lm to refer to 0 given the context ark.
Mention Resolution Approach	1The exact position in em where lm is observed should also be included in the definition, but we ignore it assuming that all matched literal mentions in one email refer to the same identity.

LM is mentioned in 12 sentences in this paper.

Topics mentioned in this paper:

lm (12)
generative model (3)

2. Translation Assistance by Translation of L1 Fragments in an L2 Context

van Gompel, Maarten and van den Bosch, Antal

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Baselines	It adds a LM component to the MLF baseline.
Baselines	This LM baseline allows the comparison of classification through L1 fragments in an L2 context, with a more traditional L2 context modelling (i.e.
Discussion and conclusion	LM baseline llrl
Discussion and conclusion	llrl +LM auto
Discussion and conclusion	LM baseline 11r1
Experiments & Results	As expected, the LM baseline substantially outperforms the context-insensitive MLF baseline.
Experiments & Results	Second, our classifier approach attains a substantially higher accuracy than the LM baseline.
Experiments & Results	The same significance level was found when comparing llrl+LM against llrl, auto+LM against aut o, as well as the LM baseline against the MLF baseline.

LM is mentioned in 11 sentences in this paper.

Topics mentioned in this paper:

3. Lattice Desegmentation for Statistical Machine Translation

Salameh, Mohammad and Cherry, Colin and Kondrak, Grzegorz

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Our novel lattice desegmentation algorithm effectively combines both segmented and desegmented Views of the target language for a large subspace of possible translation outputs, which allows for inclusion of features related to the desegmentation process, as well as an unsegmented language model ( LM ).
Methods	In order to annotate lattice edges with an n-gram LM , every path coming into a node must end with the same sequence of (n — l) tokens.
Methods	If this property does not hold, then nodes must be split until it does.4 This property is maintained by the decoder’s recombination rules for the segmented LM, but it is not guaranteed for the desegmented LM .
Methods	Indeed, the expanded word-level context is one of the main benefits of incorporating a word-level LM .

LM is mentioned in 16 sentences in this paper.

Topics mentioned in this paper:

LM (16)
language model (13)
BLEU (13)

4. Fast and Robust Neural Network Joint Models for Statistical Machine Translation

Devlin, Jacob and Zbib, Rabih and Huang, Zhongqiang and Lamar, Thomas and Schwartz, Richard and Makhoul, John

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	These techniques speed up NNJM computation by a factor of l(),()()()X, making the model as fast as a standard back-off LM .
Decoding with the NNJ M	When performing hierarchical decoding with an n-gram LM , the leftmost and rightmost n — 1 words from each constituent must be stored in the state space.
Model Variations	4- gram Kneser—Ney LM
Model Variations	Dependency LM (Shen et al., 2010) Contextual lexical smoothing (Devlin, 2009) Length distribution (Shen et al., 2010)
Model Variations	0 LM adaptation (Snover et al., 2008)
Neural Network Joint Model (NNJ M)	Formally, our model approximates the probability of target hypothesis T conditioned on source sentence S. We follow the standard n-gram LM decomposition of the target, where each target word ti is conditioned on the previous n — 1 target words.
Neural Network Joint Model (NNJ M)	It is clear that this model is effectively an (n+m)-gram LM, and a lS-gram LM would be
Neural Network Joint Model (NNJ M)	Although self-normalization significantly improves the speed of NNJM lookups, the model is still several orders of magnitude slower than a back-off LM .

LM is mentioned in 14 sentences in this paper.

Topics mentioned in this paper:

5. Name-aware Machine Translation

Li, Haibo and Zheng, Jing and Ji, Heng and Li, Qi and Wang, Wen

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Baseline MT	The LM used for decoding is a log-linear combination of four word n-gram LMs which are built on different English
Baseline MT	corpora (details described in section 5.1), with the LM weights optimized on a development set and determined by minimum error rate training (MERT), to estimate the probability of a word given the preceding words.
Experiments	LM1 is a 7-gram LM trained on the tar-
Experiments	LM2 is a 7-gram LM trained only on the English monolingual discussion forums data listed above.
Experiments	LM3 is a 4-gram LM trained on the web genre among the target side of all parallel text (i.e., web text from pre-BOLT parallel text and BOLT released discussion forum parallel text).
Introduction	Optimize name translation and context translation simultaneously and conduct name translation driven decoding with language model ( LM ) based selection (Section 3.2).
Related Work	enable the LM to decide which translations to choose when encountering the names in the texts (Ji et al., 2009).
Related Work	The LM selection method often assigns an inappropriate weight to the additional name translation table because it is constructed independently from translation of context words; therefore after weighted voting most correct name translations are not used in the final translation output.
Related Work	More importantly, in these approaches the MT model was still mostly treated as a “black-box” because neither the translation model nor the LM was updated or adapted specifically for names.

LM is mentioned in 12 sentences in this paper.

Topics mentioned in this paper:

BLEU (19)
word alignment (17)
LM (12)

6. Accurate Word Segmentation using Transliteration and Language Model Projection

Hagiwara, Masato and Sekine, Satoshi

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We propose an online approach, integrating source LM, and / or, back-transliteration and English LM .
Experiments	We observed slight improvement by incorporating the source LM , and observed a 0.48 point F—value increase over baseline, which translates to 4.65 point Katakana F—value change and 16.0% (3.56% to 2.99 %) WER reduction, mainly due to its higher Katakana word rate (11.2%).
Experiments	H. This type of error is reduced by +LM-P, e.g., * 7°?X/fify 7 pumsu chikku “*plus tick” to 703%?‘y7 pumsuchz’kku “plastic” due to LM projection.
Experiments	performance, which may be because one cannot limit where the source LM features are applied.
Introduction	We refer to this process of transliterating unknown words into another language and using the target LM as LM projection.
Introduction	Since the model employs a general transliteration model and a general English LM , it achieves robust WS for unknown words.
Use of Language Model	As we mentioned in Section 2, English LM knowledge helps split transliterated compounds.
Use of Language Model	We use ( LM ) projection, which is a combination of back-transliteration and an English model, by extending the normal lattice building process as follows:
Use of Language Model	Then, edges are spanned between these extended English nodes, instead of between the original nodes, by additionally taking into consideration English LM features (ID 21 and 22 in Table 1): ¢iMP(wi) = IOgPWi) and ¢§Mp(wi—1,wi) = log p(wi_1,wi).
Word Segmentation Model	Figure 1: Example lattice with LM projection

LM is mentioned in 19 sentences in this paper.

Topics mentioned in this paper:

LM (19)
PoS tags (4)
bigram (3)

7. Advancements in Reordering Models for Statistical Machine Translation

Feng, Minwei and Peter, Jan-Thorsten and Ney, Hermann

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Comparative Study	Figure 4: bilingual LM illustration.
Comparative Study	4.3 Bilingual LM
Comparative Study	We build a 9-gram LM using SRILM toolkit (Stolcke, 2002) with modified Kneser—Ney smoothing.
Experiments	Our baseline is a phrase-based decoder, which includes the following models: an n-gram target-side language model ( LM ), a phrase translation model and a word-based lexicon model.
Experiments	0 lowercased training data from the GALE task (Table 1, UN corpus not included) alignment trained with GIZA++ o tuning corpus: NIST06 test corpora: NIST02 03 04 05 and 08 o 5-gram LM (1694412027 running words) trained by SRILM toolkit (Stolcke, 2002) with modified Kneser—Ney smoothing training data: target side of bilingual data.

LM is mentioned in 14 sentences in this paper.

Topics mentioned in this paper:

CRFs (19)
LM (14)
BLEU (8)

8. Sentence Level Dialect Identification in Arabic

Elfardy, Heba and Diab, Mona

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Approach to Sentence-Level Dialect Identification	The aforementioned approach relies on language models ( LM ) and MSA and EDA Morphological Analyzer to decide whether each word is (a) MSA, (b) EDA, (c) Both (MSA & EDA) or (d) OOV.
Approach to Sentence-Level Dialect Identification	The following variants of the underlying token-level system are built to assess the effect of varying the level of preprocessing on the underlying LM on the performance of the overall sentence level dialect identification process: (1) Surface, (2) Tokenized, (3) CODAfied, and (4) Tokenized-CODA.
Approach to Sentence-Level Dialect Identification	Orthography Normalized (CODAfied) LM : since DA is not originally a written form of Arabic, no standard orthography exists for it.
Related Work	Amazon Mechanical Turk and try a language modeling ( LM ) approach to solve the problem.

LM is mentioned in 15 sentences in this paper.

Topics mentioned in this paper:

9. Shallow Local Multi-Bottom-up Tree Transducers in Statistical Machine Translation

Braune, Fabienne and Seemann, Nina and Quernheim, Daniel and Maletti, Andreas

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Decoding	The language model ( LM ) scoring is directly integrated into the cube pruning algorithm.
Decoding	Thus, LM estimates are available for all considered hypotheses.
Decoding	Because the output trees can remain discontiguous after hypothesis creation, LM scoring has to be done individually over all output trees.

LM is mentioned in 17 sentences in this paper.

Topics mentioned in this paper:

10. Word Sense Disambiguation Improves Information Retrieval

Zhong, Zhi and Ng, Hwee Tou

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion	We proposed a method for annotating senses to terms in short queries, and also described an approach to integrate senses into an LM approach for IR.
Conclusion	Our experimental results showed that the incorporation of senses improved a state-of-the-art baseline, a stem-based LM approach with PRF method.
Conclusion	We also proposed a method to further integrate the synonym relations to the LM approaches.
Experiments	We use the Lemur toolkit (Ogilvie and Callan, 2001) version 4.11 as the basic retrieval tool, and select the default unigram LM approach based on KL-divergence and Dirichlet-prior smoothing method in Lemur as our basic retrieval approach.
Incorporating Senses into Language Modeling Approaches	In this section, we propose to incorporate senses into the LM approach to IR.
Incorporating Senses into Language Modeling Approaches	In this part, we further integrate the synonym relations of senses into the LM approach.
Introduction	We incorporate word senses into the language modeling ( LM ) approach to IR (Ponte and Croft, 1998), and utilize sense synonym relations to further improve the performance.
Introduction	Section 3 introduces the LM approach to IR, including the pseudo relevance feedback method.
Introduction	generating word senses for query terms in Section 4, followed by presenting our novel method of incorporating word senses and their synonyms into the LM approach in Section 5.
The Language Modeling Approach to IR	This section describes the LM approach to IR and the pseudo relevance feedback approach.
Word Sense Disambiguation	Suppose we already have a basic IR method which does not require any sense information, such as the stem-based LM approach.

LM is mentioned in 11 sentences in this paper.

Topics mentioned in this paper:

11. Fast and Scalable Decoding with Language Model Look-Ahead for Phrase-based Statistical Machine Translation

Wuebker, Joern and Ney, Hermann and Zens, Richard

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	Research efforts to increase search efficiency for phrase-based MT (Koehn et al., 2003) have explored several directions, ranging from generalizing the stack decoding algorithm (Ortiz et al., 2006) to additional early pruning techniques (Delaney et al., 2006), (Moore and Quirk, 2007) and more efficient language model ( LM ) querying (Heafield, 2011).
Introduction	We show that taking a heuristic LM score estimate for pre-sorting the
Introduction	Further, we introduce two novel LM lookahead methods.
Search Algorithm Extensions	In addition to sorting according to the purely phrase-intemal scores, which is common practice, we compute an estimate qLME(é) for the LM score of each target phrase é. qLME(é) is the weighted LM score we receive by assuming 5 to be a complete sentence without using sentence start and end markers.
Search Algorithm Extensions	LM score computations are among the most expensive in decoding.
Search Algorithm Extensions	(2006) report significant improvements in runtime by removing unnecessary LM lookups via early pruning.

LM is mentioned in 42 sentences in this paper.

Topics mentioned in this paper:

LM (42)
BLEU (15)
beam search (6)

12. Fast Syntactic Analysis for Statistical Language Modeling via Substructure Sharing and Uptraining

Rastrow, Ariya and Dredze, Mark and Khudanpur, Sanjeev

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	We use a modified Kneser—Ney (KN) backoff 4- gram baseline LM .
Incorporating Syntactic Structures	tool(s) 1 LM 1 W1 ———> $1,...,371" ———> F(w1,sl,...,slln) tool(s) 1 LM 1 W2 ———> S2,...,sl2n ———> F(w2,s2,...,sl2n) tool(s) 1 LM 1 wk ———> sk,...,s}:‘ ———> F(wk,sk,...,s}:‘) Here, {w1, .
Introduction	Language models ( LM ) are crucial components in tasks that require the generation of coherent natural language text, such as automatic speech recognition (ASR) and machine translation (MT).
Introduction	For example, incorporating long-distance dependencies and syntactic structure can help the LM better predict words by complementing the predictive power of n-grams (Chelba and Jelinek, 2000; Collins et al., 2005; Filimonov and Harper, 2009; Kuo et al., 2009).
Introduction	Even so, the reliance on auxiliary tools slow LM application to the point of being impractical for real time systems.
Syntactic Language Models	(2009) integrate syntactic features into a neural network LM for Arabic speech recognition.
Syntactic Language Models	We work with a discriminative LM with long-span dependencies.
Syntactic Language Models	The LM score 8(w, a) for each hypothesis w of a speech utterance with acoustic sequence a is based on the baseline ASR system score b(w, a) (initial 77.-gram LM score and the acoustic score) and cm, the weight assigned to the baseline score.1 The score is defined as:

LM is mentioned in 16 sentences in this paper.

Topics mentioned in this paper:

13. Deciphering Foreign Language by Combining Language Models and Context Vectors

Nuhn, Malte and Mauser, Arne and Ney, Hermann

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Evaluation	Using a 2-gram LM they obtain 15.3 BLEU and with a whole segment LM , they achieve 19.3 BLEU.
Experimental Evaluation	In comparison to this baseline we run our algorithm with N0 = 50 candidates per source word for both, a 2-gram and a 3- gram LM .
Experimental Evaluation	Figure 3 and Figure 4 show the evolution of BLEU and TER scores for applying our method using a 2-gram and a 3- gram LM .
Training Algorithm and Implementation	Our FST representation of the LM makes use of failure transitions as described in (Allauzen et al., 2003).

LM is mentioned in 27 sentences in this paper.

Topics mentioned in this paper:

LM (27)
BLEU (16)
translation model (15)

14. Deciphering Foreign Language

Ravi, Sujith and Knight, Kevin

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Machine Translation as a Decipherment Task	For P (e), we use a word n-gram LM trained on monolingual English data.
Machine Translation as a Decipherment Task	Better LMs yield better MT results for both parallel and decipherment training—for example, using a segment-based English LM instead of a 2-gram LM yields a 24% reduction in edit distance and a 9% improvement in BLEU score for EM decipherment.
Word Substitution Decipherment	We model P(e) using a statistical word n-gram English language model ( LM ).
Word Substitution Decipherment	We use an English word bi-gram LM as the base distribution (P0) for the source model and specify a uniform P0 distribution for the
Word Substitution Decipherment	In order to sample at position i, we choose the top K English words Y ranked by P(X Y 2), which can be computed offline from a statistical word bigram LM .

LM is mentioned in 14 sentences in this paper.

Topics mentioned in this paper:

15. Faster and Smaller N-Gram Language Models

Pauls, Adam and Klein, Dan

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	To test our LM implementations, we performed experiments with two different language models.
Introduction	Where classic LMs take word tuples and produce counts or probabilities, we propose an LM that takes a word-and-context encoding (so the context need not be re-looked up) and returns both the probability and also the context encoding for the sufi‘ix of the original query.
Introduction	Our LM toolkit, which is implemented in Java and compatible with the standard ARPA file formats, is available on the web.1
Preliminaries	Tries or variants thereof are implemented in many LM tool kits, including SRILM (Stolcke, 2002), IRSTLM (Federico and Cettolo, 2007), CMU SLM (Whittaker and Raj, 2001), and MIT LM (Hsu and Glass, 2008).
Speeding up Decoding	Figure 3: Queries issued when scoring trigrams that are created when a state with LM context “the cat” combines with “fell down”.
Speeding up Decoding	We found in our experiments that a cache using linear probing provided marginal performance increases of about 40%, largely because of cached back-off computation, while our simpler cache increases performance by about 300% even over our HASH LM implementation.
Speeding up Decoding	As discussed in Section 3, our LM implementations can answer queries about context-encoded n-grams faster than explicitly encoded n-grams.

LM is mentioned in 18 sentences in this paper.

Topics mentioned in this paper:

16. Phrase-Based Statistical Machine Translation as a Traveling Salesman Problem

Zaslavskiy, Mikhail and Dymetman, Marc and Cancedda, Nicola

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	imizing the LM score over all possible permutations.
Experiments	Usually the LM score is used as one component of a more complex decoder score which also includes biphrase and distortion scores.
Experiments	But in this particular “translation task” from bad to good English, we consider that all “biphrases” are of the form 6 — e, where e is an English word, and we do not take into account any distortion: we only consider the quality of the permutation as it is measured by the LM component.
Phrase-based Decoding as TSP	4.1 From Bigram to N-gram LM

LM is mentioned in 11 sentences in this paper.

Topics mentioned in this paper:

language model (14)
BLEU (11)
LM (11)

17. Bayesian Inference for Zodiac and Other Homophonic Ciphers

Ravi, Sujith and Knight, Kevin

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Decipherment	We build a statistical English language model ( LM ) for the plaintext source model P (p), which assigns a probability to any English letter sequence.
Decipherment	We build an interpolated word+n-gram LM and use it to assign P (p) probabilities to any plaintext letter sequence p1...pn.3 The advantage is that it helps direct the sampler towards plaintext hypotheses that resemble natural language—high probability letter sequences which form valid words such as “H E L L O” instead of sequences like “‘T X H R T”.
Decipherment	3We set the interpolation weights for the word and n-gram LM as (0.9, 0.1).
Experiments and Results	Even with a 3-gram letter LM , our method yields a +63% improvement in decipherment accuracy over EM on the homophonic cipher with spaces.
Experiments and Results	We observe that the word+3—gram LM proves highly effective when tackling more complex ciphers and cracks the Zodiac-408 cipher.
Experiments and Results	0 Letter n-gram versus W0rd+n-gram LMs—Figure 2 shows that using a word+3-gram LM instead of a 3-gram LM results in +75% improvement in decipherment accuracy.

LM is mentioned in 10 sentences in this paper.

Topics mentioned in this paper:

18. A New String-to-Dependency Machine Translation Algorithm with a Target Dependency Language Model

Shen, Libin and Xu, Jinxi and Weischedel, Ralph

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Dependency Language Model	Suppose we use a trigram dependency LM,
Discussion	Only translation probability P was employed in the construction of the target forest due to the complexity of the syntax-based LM .
Discussion	Since our dependency LM models structures over target words directly based on dependency trees, we can build a single-step system.
Discussion	This dependency LM can also be used in hierarchical MT systems using lexical-ized CFG trees.
Experiments	0 str-dep: a string-to-dependency system with a dependency LM .
Experiments	The English side of this subset was also used to train a 3-gram dependency LM .
Experiments	BLEU% TER% lower mixed lower mixed Decoding (3—gram LM) baseline 38.18 35.77 58.91 56.60 filtered 37.92 35.48 57.80 55.43 str-dep 39.52 37.25 56.27 54.07 Rescoring (5—gram LM ) baseline 40.53 38.26 56.35 54.15 filtered 40.49 38.26 55.57 53.47 str-dep 41.60 39.47 55.06 52.96
Implementation Details	We rescore 1000-best translations (Huang and Chiang, 2005) by replacing the 3-gram LM score with the 5-gram LM score computed offline.

LM is mentioned in 10 sentences in this paper.

Topics mentioned in this paper:

19. Reranking with Linguistic and Semantic Features for Arabic Optical Character Recognition

Tomeh, Nadi and Habash, Nizar and Roth, Ryan and Farra, Noura and Dasigi, Pradeep and Diab, Mona

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Discriminative Reranking for OCR	Base features include the HMM and LM scores produced by the OCR system.
Discriminative Reranking for OCR	Word LM features (“LM-word”) include the log probabilities of the hypothesis obtained using n-gram LMs with n E {1, .
Discriminative Reranking for OCR	The LM models are built using the SRI Language Modeling Toolkit (Stolcke, 2002).
Experiments	Our baseline is based on the sum of the logs of the HMM and LM scores.
Experiments	For LM training we used 220M words from Arabic Gigaword 3, and 2.4M words from each “print” and “hand” ground truth annotations.
Introduction	The BBN Byblos OCR system (Natajan et al., 2002; Prasad et al., 2008; Saleem et al., 2009), which we use in this paper, relies on a hidden Markov model (HMM) to recover the sequence of characters from the image, and uses an n-gram language model ( LM ) to emphasize the fluency of the output.
Introduction	For an input image, the OCR decoder generates an n-best list of hypotheses each of which is associated with HMM and LM scores.

LM is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

reranking (14)
LM (9)
error rate (4)

20. Incremental Syntactic Language Models for Phrase-based Translation

Schwartz, Lane and Callison-Burch, Chris and Schuler, William and Wu, Stephen

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Related Work	Like (Galley and Manning, 2009) our work implements an incremental syntactic language model; our approach differs by calculating syntactic LM scores over all available phrase-structure parses at each hypothesis instead of the l-best dependency parse.
Related Work	In-domain Out-of-domain LM WSJ 23 ppl ur-en dev ppl WSJ 1-gram 1973.57 3581.72 WSJ 2-gram 349.18 1312.61 WSJ 3-gram 262.04 1264.47 WSJ 4-gram 244.12 1261.37 WSJ 5-gram 232.08 1261.90 WSJ HHMM 384.66 529.41 Interpolated WSJ 5-gram + HHMM 209.13 225.48 Giga 5-gram 258.35 312.28 Interp.
Related Work	HHMM parser beam sizes are indicated for the syntactic LM .

LM is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

21. Randomized Language Models via Perfect Hash Functions

Talbot, David and Brants, Thorsten

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Setup	4.1 Distributed LM Framework
Experimental Setup	We deploy the randomized LM in a distributed framework which allows it to scale more easily by distributing it across multiple language model servers.
Experimental Setup	The proposed randomized LM can encode parameters estimated using any smoothing scheme (e.g.
Experiments	size dev test test LM GB MT04 \| MT05 \| MT06 unpruned block 116 0.5304 0.5697 0.4663 unpruned rand 69 0.5299 0.5692 0.4659 pruned block 42 0.5294 0.5683 0.4665 pruned rand 27 0.5289 0.5679 0.4656
Introduction	Using higher-order models and larger amounts of training data can significantly improve performance in applications, however the size of the resulting LM can become prohibitive.
Introduction	Efficiency is paramount in applications such as machine translation which make huge numbers of LM requests per sentence.
Perfect Hash-based Language Models	Our randomized LM is based on the Bloomier filter (Chazelle et al., 2004).
Scaling Language Models	In the next section we describe our randomized LM scheme based on perfect hash functions.

LM is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

22. Quadratic-Time Dependency Parsing for Machine Translation

Galley, Michel and Manning, Christopher D.

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	This provides a compelling advantage over previous dependency language models for MT (Shen et al., 2008), which use a 5 - gram LM only during rerank-ing.
Introduction	In our experiments, we build a competitive baseline (Koehn et al., 2007) incorporating a 5-gram LM trained on a large part of Gigaword and show that our dependency language model provides improvements on five different test sets, with an overall gain of 0.92 in TER and 0.45 in BLEU scores.
Machine translation experiments	This LM required 16GB of RAM during training.
Machine translation experiments	Regarding the small difference in BLEU scores on MT08, we would like to point out that tuning on MTOS and testing on MT08 had a rather adverse effect with respect to translation length: while the two systems are relatively close in terms of BLEU scores (24.83 and 24.91, respectively), the dependency LM provides a much bigger gain when evaluated with BLEU precision (27.73 vs. 28.79), i.e., by ignoring the brevity penalty.
Machine translation experiments	LM newswire web speech all
Related work	In the latter paper, Huang and Chiang introduce rescoring methods named “cube pruning” and “cube growing”, which first use a baseline decoder (either synchronous CFG or a phrase-based system) and no LM to generate a hypergraph, and then rescoring this hypergraph with a language model.

LM is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

23. A Class-Based Agreement Model for Generating Accurately Inflected Translations

Green, Spence and DeNero, John

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

A Class-based Model of Agreement	For contexts in which the LM is guaranteed to back off (for instance, after an unseen bigram), our decoder maintains only the minimal state needed (perhaps only a single word).
Inference during Translation Decoding	initialize 77 to —oo set 77(t) = 0 compute 7* from parameters <8, 5%, 7r, is_goal> compute q(e{,+1) 2 p(7'*) under the generative LM set model state anew = <§L, 7-2) for prefix ef Output: q(e£+1)
Introduction	Intuition might suggest that the standard 71- gram language model ( LM ) is suflicient to handle agreement phenomena.
Introduction	However, LM statistics are sparse, and they are made sparser by morphological variation.
Introduction	For English-to-Arabic translation, we achieve a +1.04 BLEU average improvement by tiling our model on top of a large LM .
Related Work	They used a target-side LM over Combinatorial Categorial Grammar (CCG) supertags, along with a penalty for the number of operator violations, and also modified the phrase probabilities based on the tags.
Related Work	Then they mixed the classes into a word-based LM .
Related Work	Target-Side Syntactic LMs Our agreement model is a form of syntactic LM , of which there is a long history of research, especially in speech processing.5 Syntactic LMs have traditionally been too slow for scoring during MT decoding.

LM is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

phrase-based (10)
CRF (9)
LM (9)

24. Translating Dialectal Arabic to English

Sajjad, Hassan and Darwish, Kareem and Belinkov, Yonatan

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Previous Work	Our LM experiments also affirmed the importance of in-domain English LMs.
Previous Work	‘Train LM BLEU oov
Previous Work	Table 1: Baseline results using the EG and AR training sets with G W and EGen corpora for LM training
Proposed Methods 3.1 Egyptian to EG’ Conversion	Then we used a trigram LM that we built from the aforementioned Aljazeera articles to pick the most likely candidate in context.
Proposed Methods 3.1 Egyptian to EG’ Conversion	We simply multiplied the character-level transformation probability with the LM probability — giving them equal weight.
Proposed Methods 3.1 Egyptian to EG’ Conversion	- SI and $2 trained on the EG’ with EGen and both EGen and GW for LM training respectively.

LM is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

BLEU (13)
parallel data (9)
LM (8)

25. A Risk Minimization Framework for Extractive Speech Summarization

Lin, Shih-Hsiang and Chen, Berlin

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental results and discussions 6.1 Baseline experiments	In the first set of experiments, we evaluate the baseline performance of the LM and BC summarizers (cf.
Experimental results and discussions 6.1 Baseline experiments	Second, the supervised summarizer (i.e., BC) outperforms the unsupervised summarizer (i.e., LM ).
Experimental results and discussions 6.1 Baseline experiments	One is that BC is trained with the handcrafted document-summary sentence labels in the development set while LM is instead conducted in a purely unsupervised manner.
Proposed Methods	In order to estimate the sentence generative probability, we explore the language modeling ( LM ) approach, which has been introduced to a wide spectrum of IR tasks and demonstrated with good empirical success, to predict the sentence generative probability.
Proposed Methods	In the LM approach, each sentence in a document can be simply regarded as a probabilistic generative model consisting of a unigram distribution (the so-called “bag-0f-words” assumption) for generating the document (Chen et al., 2009): (w)

LM is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

loss function (14)
LM (8)
CRF (6)

26. A Joint Graph Model for Pinyin-to-Chinese Conversion with Typo Correction

Jia, Zhongye and Zhao, Hai

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	The Chinese part of the corpus is segmented into words before LM training.
Experiments	The pinyin syllable segmentation already has very high (over 98%) accuracy with a trigram LM using improved Kneser-Ney smoothing.
Experiments	We consider different LM smoothing methods including Kneser-Ney (KN), improved Kneser-Ney (IKN), and Witten-Bell (WB).
Pinyin Input Method Model	WE(W,j—>Vj+1,k) : _10g P(Vj+1vk Vivi) Although the model is formulated on first order HMM, i.e., the LM used for transition probability is a bigram one, it is easy to extend the model to take advantage of higher order n-gram LM , by tracking longer history while traversing the graph.
Related Works	Various approaches were made for the task including language model ( LM ) based methods (Chen et al., 2013), ME model (Han and Chang, 2013), CRF (Wang et al., 2013d; Wang et al., 2013a), SMT (Chiu et al., 2013; Liu et al., 2013), and graph model (Jia et al., 2013), etc.

LM is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

27. Scalable Decipherment for Machine Translation via Hash Sampling

Ravi, Sujith

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Bayesian MT Decipherment via Hash Sampling	5 A high value for the LM concentration parameter oz ensures that the LM probabilities do not deviate too far from the original fixed base distribution during sampling.
Decipherment Model for Machine Translation	For P(e), we use a word n-gram language model ( LM ) trained on monolingual target text.
Experiments and Results	We observe that our method produces much better results than the others even with a 2-gram LM .
Experiments and Results	With a 3-gram LM , the new method achieves the best performance; the highest BLEU score reported on this task.
Experiments and Results	Bayesian Hash Sampling with 2—gram LM vocab=full (V6), addjertility=n0 4.2 vocab=pruned*, add_fertility=yes 5.3

LM is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

28. Maximum Expected BLEU Training of Phrase and Lexicon Translation Models

He, Xiaodong and Deng, Li

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Features used in a phrase-based system usually include LM , reordering model, word and phrase counts, and phrase and lexicon translation models.
Abstract	Other models used in the baseline system include lexicalized ordering model, word count and phrase count, and a 3-gram LM trained on the English side of the parallel training corpus.
Abstract	In our system, a primary phrase table is trained from the 110K TED parallel training data, and a 3-gram LM is trained on the English side of the parallel data.

LM is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

29. Underspecifying and Predicting Voice for Surface Realisation Ranking

Zarriess, Sina and Cahill, Aoife and Kuhn, Jonas

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Setup	Match 15.45 15.04 11.89 LM BLEU 0.68 0.68 0.65
Experiments	In Table 2, we compare the performance of the linguistically informed model described in Section 4 on the candidates sets against a random choice and a language model ( LM ) baseline.
Experiments	statistically significant.5 In general, the linguistic model largely outperforms the LM and is less sensitive to the additional confusion introduced by the SEMh input.
Experiments	Table 5 also reports the performance of an unlabelled model that additionally integrates LM scores.
Human Evaluation	6(Nakanishi et al., 2005) also note a negative effect of including LM scores in their model, pointing out that the LM was not trained on enough data.
Human Evaluation	The corpus used for training our LM might also have been too small or distinct in genre.

LM is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

30. Syntax-to-Morphology Mapping in Factored Phrase-Based Statistical Machine Translation from English to Turkish

Yeniterzi, Reyyan and Oflazer, Kemal

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Setup and Results	As a baseline system, we built a standard phrase-based system, using the surface forms of the words without any transformations, and with a 3—gram LM in the decoder.
Experimental Setup and Results	We believe that the use of multiple language models (some much less sparse than the surface LM ) in the factored baseline is the main reason for the improvement.
Experimental Setup and Results	Using a 4-gram root LM , considerably less sparse than word forms but more sparse that tags, we get a BLEU score of 22.80 (max: 24.07, min: 21.57, std: 0.85).
Experiments with Constituent Reordering	16These experiments were done on top of the model in 3.2.3 with a 3-gram word and root LMs and 8-gram tag LM .

LM is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

31. Forest-Based Translation

Mi, Haitao and Huang, Liang and Liu, Qun

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Forest-based translation	The decoder performs two tasks on the translation forest: l-best search with integrated language model (LM), and k-best search with LM to be used in minimum error rate training.
Forest-based translation	For l-best search, we use the cube pruning technique (Chiang, 2007; Huang and Chiang, 2007) which approximately intersects the translation forest with the LM .
Forest-based translation	Basically, cube pruning works bottom up in a forest, keeping at most k +LM items at each node, and uses the best-first expansion idea from the Algorithm 2 of Huang and Chiang (2005) to speed

LM is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

BLEU (12)
parse tree (11)
BLEU score (10)

32. A Hybrid Approach to Skeleton-based Translation

Xiao, Tong and Zhu, Jingbo and Zhang, Chunliang

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

A Skeleton-based Approach to MT 2.1 Skeleton Identification	Given a translation model m, a language model lm and a vector of feature weights w, the model score of a derivation d is computed by
A Skeleton-based Approach to MT 2.1 Skeleton Identification	g(d;w, m, lm) 2 Wm - fm(d) + wlm - lm (d) (4)
A Skeleton-based Approach to MT 2.1 Skeleton Identification	lm (d) and wlm are the score and weight of the language model, respectively.

LM is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

33. Applying Morphology Generation Models to Machine Translation

Toutanova, Kristina and Suzuki, Hisami and Ruopp, Achim

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Inflection prediction models	LM 81.0 69.4 Model 91.6 91.0 Avg \| I \| 13.9 24.1
Integration of inflection models with MT systems	PLM) is the joint probability of the sequence of inflected words according to a trigram language model ( LM ).
Integration of inflection models with MT systems	The LM used for the integration is the same LM used in the base MT system that is trained on fully inflected word forms (the base MT system trained on stems uses an LM trained on a stem sequence).
Integration of inflection models with MT systems	Equation (1) shows that the model first selects the best sequence of inflected forms for each MT hypothesis Si according to the LM and the inflection model.

LM is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

34. An Unsupervised Model for Joint Phrase Alignment and Extraction

Neubig, Graham and Watanabe, Taro and Sumita, Eiichiro and Mori, Shinsuke and Kawahara, Tatsuya

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Evaluation	TM (en) 1.80M 1.62M 1.35M 2.38M TM (other) 1.85M 1.82M 1.56M 2.78M LM (en) 52.7M 52.7M 52.7M 44.7M
Experimental Evaluation	Table 1: The number of words in each corpus for TM and LM training, tuning, and testing.
Experimental Evaluation	We use the news commentary corpus for training the TM, and the news commentary and Europarl corpora for training the LM .

LM is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

35. Grammatical Error Correction Using Integer Linear Programming

Wu, Yuanbin and Ng, Hwee Tou

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Inference with First Order Variables	The language model score h(s’, LM ) of 8’ based on a large web corpus;
Inference with First Order Variables	wlmflc : VLMh(8/7 LM ) + Z )‘tf(8/7
Inference with First Order Variables	wNoun,2,singular : VLMh/(Sl, LM )+ AARTfls’, ART) + APREpfls’, PREP) + ANOUNfls’, NOUN)+ IU’ARTg(S/7 ART) + ,U/PREPg(3/a PREP) ‘i— MNOUNg(8/, NOUN).
Inference with Second Order Variables	wufl) : VLMh(8/7 LM ) + Z )‘tf(8/7

LM is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

36. Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machine Translation

Lu, Shixiang and Chen, Zhenbiao and Xu, Bo

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments and Results	The LM corpus is the English side of the parallel data (BTEC, CJK and CWMT083) (1.34M sentences).
Experiments and Results	The LM corpus is the English side of the parallel data as well as the English Gigaword corpus (LDC2007T07) (11.3M sentences).
Input Features for DNN Feature Learning	1Backward LM has been introduced by Xiong et a1.
Input Features for DNN Feature Learning	(2011), which successfully capture both the preceding and succeeding contexts of the current word, and we estimate the backward LM by inverting the order in each sentence in the training data from the original order to the reverse order.
Related Work	(2012) improved translation quality of n-gram translation model by using a bilingual neural LM , where translation probabilities are estimated using a continuous representation of translation units in lieu of standard discrete representations.

LM is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

37. Discovering Latent Structure in Task-Oriented Dialogues

Zhai, Ke and Williams, Jason D

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Latent Structure in Dialogues	The simplest formulation we consider is an HMM where each state contains a unigram language model ( LM ), proposed by Chotimongkol (2008) for task-oriented dialogue and originally
Latent Structure in Dialogues	,men are generated (independently) according to the LM .
Latent Structure in Dialogues	(2010) extends LM—HMM to allow words to be emitted from two additional sources: the topic of current dialogue qb, or a background LM a shared across all dialogues.

LM is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

38. A New Dataset and Method for Automatically Grading ESOL Texts

Yannakoudakis, Helen and Briscoe, Ted and Medlock, Ben

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Approach	In order to estimate the error-rate, we build a trigram language model (LM) using ukWaC (ukWaC LM ) (Ferraresi et al., 2008), a large corpus of English containing more than 2 billion tokens.
Approach	correlatlon correlatlon word ngrams 0.601 0.598 +PoS ngrams 0.682 0.687 +script length 0.692 0.689 +PS rules 0.707 0.708 +complexity 0.714 0.712 Error-rate features +ukWaC LM 0.735 0.758 +CLC LM 0.741 0.773 +true CLC error-rate 0.751 0.789
Approach	CLC (CLC LM ).
Evaluation	feature correlation correlation none 0.741 0.773 word ngrams 0.713 0.762 PoS ngrams 0.724 0.737 script length 0.734 0.772 PS rules 0.712 0.731 complexity 0.738 0.760 ukWaC+CLC LM 0.714 0.712
Evaluation	In the experiments reported hereafter, we use the ukWaC+CLC LM to calculate the error-rate.

LM is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

39. Vector Space Model for Adaptation in Statistical Machine Translation

Chen, Boxing and Kuhn, Roland and Foster, George

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Other features include lexical weighting in both directions, word count, a distance-based RM, a 4-gram LM trained on the target side of the parallel data, and a 6-gram English Gigaword LM .
Experiments	Some of the results reported above involved linear TM mixtures, but none of them involved linear LM mixtures.
Experiments	For instance, with an initial Chinese system that employs linear mixture LM adaptation (lin-lm) and has a BLEU of 32.1, adding l-feature VSM adaptation (+vsm, joint) improves performance to 33.1 (improvement significant at p < 0.01), while adding 3-feature VSM instead (+vsm, 3 feat.)
Introduction	Both were studied in (Foster and Kuhn, 2007), which concluded that the best approach was to combine sub-models of the same type (for instance, several different TMs or several different LMs) linearly, while combining models of different types (for instance, a mixture TM with a mixture LM ) log-linearly.

LM is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

40. PORT: a Precision-Order-Recall MT Evaluation Metric for Tuning

Chen, Boxing and Kuhn, Roland and Larkin, Samuel

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	The first is a 4-gram LM which is estimated on the target side of the texts used in the large data condition (below).
Experiments	The second is a 5-gram LM estimated on English Gigaword.
Experiments	We used two LMs in loglinear combination: a 4—gram LM trained on the target side of the parallel

LM is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

41. Discovering the Discriminative Views: Measuring Term Weights for Sentiment Analysis

Kim, Jungi and Li, Jin-Ji and Lee, Jong-Hyeok

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiment	VS 0.4196 0.4542 0.6600 BM25 0.4235 T 0.4579 0.6600 LM 0.4158 0.4520 0.6560 PMI 0.4177 0.4538 0.6620 LSA 0.4155 0.4526 0.6480 WP 0.4165 0.4533 0.6640
Experiment	Model Precision Recall F-Measure BASELINE 0.305 0.866 0.451 VS 0.331 0.807 0.470 BM25 0.327 0.795 0.464 LM 0.325 0.794 0.461 LSA 0.315 0.806 0.453 PMI 0.342 0.603 0.436 DTP 0.322 0.778 0.455 VS-LSA 0.335 0.769 0.466 VS-PMI 0.311 0.833 0.453 VS-DTP 0.342 0.745 0.469
Experiment	Differences in effectiveness of VS, BM25, and LM come from parameter tuning and corpus differences.
Term Weighting and Sentiment Analysis	IR models, such as Vector Space (VS), probabilistic models such as BM25, and Language Modeling ( LM ), albeit in different forms of approach and measure, employ heuristics and formal modeling approaches to effectively evaluate the relevance of a term to a document (Fang et al., 2004).
Term Weighting and Sentiment Analysis	In our experiments, we use the Vector Space model with Pivoted Normalization (VS), Probabilistic model (BM25), and Language modeling with Dirichlet Smoothing ( LM ).

LM is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

42. Measure Word Generation for English-Chinese SMT Systems

Zhang, Dongdong and Li, Mu and Duan, Nan and Li, Chi-Ho and Zhou, Ming

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	In the tables, Lm denotes the n-gram language model feature, T mh denotes the feature of collocation between target head words and the candidate measure word, Smh denotes the feature of collocation between source head words and the candidate measure word, HS denotes the feature of source head word selection, Punc denotes the feature of target punctuation position, T [ex denotes surrounding word features in translation, Slex denotes surrounding word features in source sentence, and Pas denotes Part-Of-Speech feature.
Experiments	Feature setting Precision Recall Baseline 54.82% 45.61% Lm 51.11% 41.24% +Tmh 61.43% 49.22% +Punc 62.54% 50.08% +Tlex 64.80% 51.87%
Experiments	Feature setting Precision Recall Baseline 54.82% 45.61% Lm 51.11% 41.24% +Tmh+Smh 64.50% 51.64% +Hs 65.32% 52.26% +Punc 66.29% 53.10% +Pos 66.53% 53.25% +Tlex 67.50% 54.02% +Slex 69.52% 55.54%

LM is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

43. A Discriminative Model for Joint Morphological Disambiguation and Dependency Parsing

Lee, John and Naradowsky, Jason and Smith, David A.

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Joint Model	0 LINK: 0(n2) boolean variables Lm corresponding to a possible link between each pair
Joint Model	LM 2 true when there is a dependency link from the word 2121- to the word wj.
Joint Model	o POS-LINK: There are 0(n2m2) such ternary factors, each connected to the variables LM , TWO”, and ijoswj.

LM is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

44. Statistical Machine Translation Improves Question Retrieval in Community Question Answering via Matrix Factorization

Zhou, Guangyou and Liu, Fang and Liu, Yang and He, Shizhu and Zhao, Jun

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	‘ # 1 Methods \| MAP 1 P@ 10 l 1 VSM 0.242 0.226 2 LM 0.385 0.242 3 Jeon et a1.
Experiments	Row 1 and row 2 are two baseline systems, which model the relevance score using VSM (Cao et al., 2010) and language model ( LM ) (Zhai and Laf-ferty, 2001; Cao et al., 2010) in the term space.
Experiments	(l) Monolingual translation models significantly outperform the VSM and LM (row 1 and
Introduction	As a principle approach to capture semantic word relations, word-based translation models are built by using the IBM model 1 (Brown et al., 1993) and have been shown to outperform traditional models (e.g., VSM, BM25, LM ) for question retrieval.

LM is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

45. A Hybrid Rule/Model-Based Finite-State Framework for Normalizing SMS Messages

Beaufort, Richard and Roekhaut, Sophie and Cougnon, Louise-Amélie and Fairon, Cédrick

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

The normalization models	All tokens Tj of S are concatenated together and composed with the lexical language model LM .
The normalization models	8’ = BestPath( (©3112) 0 LM ) (6)
The normalization models	LM 2 FirstProjection( L o LMw ) (13)

LM is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

46. A Multi-Domain Translation Model Framework for Statistical Machine Translation

Sennrich, Rico and Schwenk, Holger and Aransa, Walid

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Translation Model Architecture	We find that an adaptation of the TM and LM to the full development set (system “1 cluster”) yields the smallest improvements over the unadapted baseline.
Translation Model Architecture	For the IT test set, the system with gold labels and TM adaptation yields an improvement of 0.7 BLEU (21.1 —> 21.8), LM adaptation yields 1.3 BLEU (21.1 —> 22.4), and adapting both models outperforms the baseline by 2.1 BLEU (21.1 —> 23.2).
Translation Model Architecture	TM adaptation with 8 clusters (21.1 —> 21.8 —> 22.1), or LM adaptation with 4 or 8 clusters (21.1 —> 22.4 —> 23.1).

LM is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

47. Online Relative Margin Maximization for Statistical Machine Translation

Eidelman, Vladimir and Marton, Yuval and Resnik, Philip

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Discussion	6In our preliminary experiments with the smaller trigram LM , MERT did better on MT05 in the smaller feature set, and MIRA had a small advantage in two cases.
Experiments	We trained a 4-gram LM on the
Experiments	In preliminary experiments with a smaller trigram LM , our RM method consistently yielded the highest scores in all Chinese-English tests — up to 1.6 BLEU and 6.4 TER from MIRA, the second best performer.
The Relative Margin Machine in SMT	4- gram LM 24M 600M —

LM is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

feature set (25)
BLEU (18)
TER (11)

48. Pivot Approach for Extracting Paraphrase Patterns from Bilingual Corpora

Zhao, Shiqi and Wang, Haifeng and Liu, Ting and Li, Sheng

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	In detail, a paraphrase pattern 6’ of e was reranked based on a language model ( LM ):
Experiments	scoreLM(e’ \|SE) is the LM based score: scoreLM(e’\|SE) = %logPLM(S’E), where 8% is the sentence generated by replacing e in SE with e’ .
Experiments	To investigate the contribution of the LM based score, we ran the experiment again with A = l (ignoring the LM based score) and found that the precision is 57.09%.

LM is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

49. Mixing Multiple Translation Models in Statistical Machine Translation

Razmara, Majid and Foster, George and Sankaran, Baskaran and Sarkar, Anoop

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments & Results 4.1 Experimental Setup	For the mixture baselines, we used a standard one-pass phrase-based system (Koehn et al., 2003), Portage (Sadat et al., 2005), with the following 7 features: relative-frequency and lexical translation model (TM) probabilities in both directions; word-displacement distortion model; language model ( LM ) and word count.
Experiments & Results 4.1 Experimental Setup	For ensemble decoding, we modified an in-house implementation of hierarchical phrase-based system, Kriya (Sankaran et al., 2012) which uses the same features mentioned in (Chiang, 2005): forward and backward relative-frequency and lexical TM probabilities; LM ; word, phrase and glue-rules penalty.
Experiments & Results 4.1 Experimental Setup	We suspect this is because a single LM is shared between both models.

LM is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

50. Machine Translation without Words through Substring Alignment

Neubig, Graham and Watanabe, Taro and Mori, Shinsuke and Kawahara, Tatsuya

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	TM (en) 2.80M 3.10M 2.77M 2.13M TM (other) 2.56M 2.23M 3.05M 2.34M LM (en) 16.0M 15.5M 13.8M 11.5M
Experiments	LM (other) 15.3M 11.3M 15.6M 11.9M
Experiments	Table 1: The number of words in each corpus for TM and LM training, tuning, and testing.

LM is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

51. Automatic Keyphrase Extraction: A Survey of the State of the Art

Hasan, Kazi Saidul and Ng, Vincent

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Keyphrase Extraction Approaches	A unigram LM and an n-gram LM are constructed for each of these two corpora.
Keyphrase Extraction Approaches	Phraseness, defined using the foreground LM, is calculated as the loss of information incurred as a result of assuming a unigram LM (i.e., conditional independence among the words of the phrase) instead of an n-gram LM (i.e., the phrase is drawn from an n-gram LM ).
Keyphrase Extraction Approaches	Informativeness is computed as the loss that results because of the assumption that the candidate is sampled from the background LM rather than the foreground LM .

LM is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

52. Predicting Grammaticality on an Ordinal Scale

Heilman, Michael and Cahill, Aoife and Madnani, Nitin and Lopez, Melissa and Mulholland, Matthew and Tetreault, Joel

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

System Description	Finally, the system computes the average log-probability and number of out-of-vocabulary words from a language model trained on a collection of essays written by nonnative English speakers7 (“nonnative LM” ).
System Description	7. our system 0.668 — nonnative LM (§3.2.2) 0.665 — HPSG parse (§3.2.3) 0.664 — PCFG parse (§3.2.4) 0.662
System Description	— spelling (§3.2.l) 0.643 — gigaword LM (§3.2.2) 0.638 — link parse (§3.2.3) 0.632

LM is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

53. Computational Approaches to Sentence Completion

Zweig, Geoffrey and Platt, John C. and Meek, Christopher and Burges, Christopher J.C. and Yessenalina, Ainur and Liu, Qiang

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Discussion	Method SAT Holmes Chance 20% 20% GT N— gram LM 42 39 RNN 42 45
Experimental Results 5.1 Data Resources	These data sources were evaluated using the baseline n—gram LM approach of Section 3.1.
Experimental Results 5.1 Data Resources	Method Test 3—input LSA 46% LSA + Good—Turing LM 53 LSA + Good—Turing LM + RNN 52

LM is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

54. Modelling Events through Memory-based, Open-IE Patterns for Abstractive Summarization

Pighin, Daniele and Cornolti, Marco and Alfonseca, Enrique and Filippova, Katja

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Pattern extraction by sentence compression	The method of Clarke and Lapata (2008) uses a trigram language model ( LM ) to score compressions.
Pattern extraction by sentence compression	Since we are interested in very short outputs, a LM trained on standard, uncompressed text would not be suitable.
Pattern extraction by sentence compression	Instead, we chose to modify the method of Filippova and Altun (2013) because it relies on dependency parse trees and does not use any LM scoring.

LM is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

55. An Algorithm for Unsupervised Transliteration Mining with an Application to Word Alignment

Sajjad, Hassan and Fraser, Alexander and Schmid, Helmut

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Models	Secondly, it is easy to use a large language model ( LM ) with Moses.
Models	We build the LM on the target word types in the data to be filtered.
Models	The LM is implemented as a five-gram model using the SRILM-Toolkit (Stol-cke, 2002), with Add-l smoothing for unigrams and Kneser-Ney smoothing for higher n-grams.

LM is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

56. A Discriminative Latent Variable Model for Statistical Machine Translation

Blunsom, Phil and Cohn, Trevor and Osborne, Miles

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Evaluation	The feature set includes: a trigram language model ( lm ) trained
Evaluation	Discriminative max-derivation 25.78 Hiero (pd, gr, re, we) 26.48 Discriminative max—translation 27.72 Hiero (pd, 19,, p2“, pi“, 97", re, we) 28.14 Hiero (pd, 19,, p2“, pi“, 97", re, we, lm) 32.00
Evaluation	8Hiero (pd, Pr, P262195”, 97“, re, we, lm ) represents state-

LM is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

57. Recommendation in Internet Forums and Blogs

Wang, Jia and Li, Qing and Chen, Yuanzhu Peter and Lin, Zhangxi

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Evaluation	The second one, LM , is based on statistical language models for relevant information retrieval (Ponte and Croft, 1998).
Experimental Evaluation	Okapi 0.827 0.833 0.807 0.751 Forum LM 0.804 0.833 0.807 0.731 Our 0.967 0.967 0.9 0.85
Experimental Evaluation	Okapi 0.733 0.651 0.667 0.466 Blog LM 0.767 0.718 0.70 0.524 Our 0.933 0.894 0.867 0.756

LM is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: