Index of papers in Proc. ACL 2014 that mention

Seen in text as:

LM (75)
lm (7)

Seen in 73 sentences in 10 papers.

1. Fast and Robust Neural Network Joint Models for Statistical Machine Translation

Devlin, Jacob and Zbib, Rabih and Huang, Zhongqiang and Lamar, Thomas and Schwartz, Richard and Makhoul, John

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	These techniques speed up NNJM computation by a factor of l(),()()()X, making the model as fast as a standard back-off LM .
Decoding with the NNJ M	When performing hierarchical decoding with an n-gram LM , the leftmost and rightmost n — 1 words from each constituent must be stored in the state space.
Model Variations	4- gram Kneser—Ney LM
Model Variations	Dependency LM (Shen et al., 2010) Contextual lexical smoothing (Devlin, 2009) Length distribution (Shen et al., 2010)
Model Variations	0 LM adaptation (Snover et al., 2008)
Neural Network Joint Model (NNJ M)	Formally, our model approximates the probability of target hypothesis T conditioned on source sentence S. We follow the standard n-gram LM decomposition of the target, where each target word ti is conditioned on the previous n — 1 target words.
Neural Network Joint Model (NNJ M)	It is clear that this model is effectively an (n+m)-gram LM, and a lS-gram LM would be
Neural Network Joint Model (NNJ M)	Although self-normalization significantly improves the speed of NNJM lookups, the model is still several orders of magnitude slower than a back-off LM .

LM is mentioned in 14 sentences in this paper.

Topics mentioned in this paper:

2. Lattice Desegmentation for Statistical Machine Translation

Salameh, Mohammad and Cherry, Colin and Kondrak, Grzegorz

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Our novel lattice desegmentation algorithm effectively combines both segmented and desegmented Views of the target language for a large subspace of possible translation outputs, which allows for inclusion of features related to the desegmentation process, as well as an unsegmented language model ( LM ).
Methods	In order to annotate lattice edges with an n-gram LM , every path coming into a node must end with the same sequence of (n — l) tokens.
Methods	If this property does not hold, then nodes must be split until it does.4 This property is maintained by the decoder’s recombination rules for the segmented LM, but it is not guaranteed for the desegmented LM .
Methods	Indeed, the expanded word-level context is one of the main benefits of incorporating a word-level LM .

LM is mentioned in 16 sentences in this paper.

Topics mentioned in this paper:

LM (16)
language model (13)
BLEU (13)

3. Translation Assistance by Translation of L1 Fragments in an L2 Context

van Gompel, Maarten and van den Bosch, Antal

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Baselines	It adds a LM component to the MLF baseline.
Baselines	This LM baseline allows the comparison of classification through L1 fragments in an L2 context, with a more traditional L2 context modelling (i.e.
Discussion and conclusion	LM baseline llrl
Discussion and conclusion	llrl +LM auto
Discussion and conclusion	LM baseline 11r1
Experiments & Results	As expected, the LM baseline substantially outperforms the context-insensitive MLF baseline.
Experiments & Results	Second, our classifier approach attains a substantially higher accuracy than the LM baseline.
Experiments & Results	The same significance level was found when comparing llrl+LM against llrl, auto+LM against aut o, as well as the LM baseline against the MLF baseline.

LM is mentioned in 11 sentences in this paper.

Topics mentioned in this paper:

4. A Joint Graph Model for Pinyin-to-Chinese Conversion with Typo Correction

Jia, Zhongye and Zhao, Hai

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	The Chinese part of the corpus is segmented into words before LM training.
Experiments	The pinyin syllable segmentation already has very high (over 98%) accuracy with a trigram LM using improved Kneser-Ney smoothing.
Experiments	We consider different LM smoothing methods including Kneser-Ney (KN), improved Kneser-Ney (IKN), and Witten-Bell (WB).
Pinyin Input Method Model	WE(W,j—>Vj+1,k) : _10g P(Vj+1vk Vivi) Although the model is formulated on first order HMM, i.e., the LM used for transition probability is a bigram one, it is easy to extend the model to take advantage of higher order n-gram LM , by tracking longer history while traversing the graph.
Related Works	Various approaches were made for the task including language model ( LM ) based methods (Chen et al., 2013), ME model (Han and Chang, 2013), CRF (Wang et al., 2013d; Wang et al., 2013a), SMT (Chiu et al., 2013; Liu et al., 2013), and graph model (Jia et al., 2013), etc.

LM is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

5. A Hybrid Approach to Skeleton-based Translation

Xiao, Tong and Zhu, Jingbo and Zhang, Chunliang

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

A Skeleton-based Approach to MT 2.1 Skeleton Identification	Given a translation model m, a language model lm and a vector of feature weights w, the model score of a derivation d is computed by
A Skeleton-based Approach to MT 2.1 Skeleton Identification	g(d;w, m, lm) 2 Wm - fm(d) + wlm - lm (d) (4)
A Skeleton-based Approach to MT 2.1 Skeleton Identification	lm (d) and wlm are the score and weight of the language model, respectively.

LM is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

6. Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machine Translation

Lu, Shixiang and Chen, Zhenbiao and Xu, Bo

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments and Results	The LM corpus is the English side of the parallel data (BTEC, CJK and CWMT083) (1.34M sentences).
Experiments and Results	The LM corpus is the English side of the parallel data as well as the English Gigaword corpus (LDC2007T07) (11.3M sentences).
Input Features for DNN Feature Learning	1Backward LM has been introduced by Xiong et a1.
Input Features for DNN Feature Learning	(2011), which successfully capture both the preceding and succeeding contexts of the current word, and we estimate the backward LM by inverting the order in each sentence in the training data from the original order to the reverse order.
Related Work	(2012) improved translation quality of n-gram translation model by using a bilingual neural LM , where translation probabilities are estimated using a continuous representation of translation units in lieu of standard discrete representations.

LM is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

7. Discovering Latent Structure in Task-Oriented Dialogues

Zhai, Ke and Williams, Jason D

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Latent Structure in Dialogues	The simplest formulation we consider is an HMM where each state contains a unigram language model ( LM ), proposed by Chotimongkol (2008) for task-oriented dialogue and originally
Latent Structure in Dialogues	,men are generated (independently) according to the LM .
Latent Structure in Dialogues	(2010) extends LM—HMM to allow words to be emitted from two additional sources: the topic of current dialogue qb, or a background LM a shared across all dialogues.

LM is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

8. Automatic Keyphrase Extraction: A Survey of the State of the Art

Hasan, Kazi Saidul and Ng, Vincent

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Keyphrase Extraction Approaches	A unigram LM and an n-gram LM are constructed for each of these two corpora.
Keyphrase Extraction Approaches	Phraseness, defined using the foreground LM, is calculated as the loss of information incurred as a result of assuming a unigram LM (i.e., conditional independence among the words of the phrase) instead of an n-gram LM (i.e., the phrase is drawn from an n-gram LM ).
Keyphrase Extraction Approaches	Informativeness is computed as the loss that results because of the assumption that the candidate is sampled from the background LM rather than the foreground LM .

LM is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

9. Predicting Grammaticality on an Ordinal Scale

Heilman, Michael and Cahill, Aoife and Madnani, Nitin and Lopez, Melissa and Mulholland, Matthew and Tetreault, Joel

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

System Description	Finally, the system computes the average log-probability and number of out-of-vocabulary words from a language model trained on a collection of essays written by nonnative English speakers7 (“nonnative LM” ).
System Description	7. our system 0.668 — nonnative LM (§3.2.2) 0.665 — HPSG parse (§3.2.3) 0.664 — PCFG parse (§3.2.4) 0.662
System Description	— spelling (§3.2.l) 0.643 — gigaword LM (§3.2.2) 0.638 — link parse (§3.2.3) 0.632

LM is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

10. Modelling Events through Memory-based, Open-IE Patterns for Abstractive Summarization

Pighin, Daniele and Cornolti, Marco and Alfonseca, Enrique and Filippova, Katja

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Pattern extraction by sentence compression	The method of Clarke and Lapata (2008) uses a trigram language model ( LM ) to score compressions.
Pattern extraction by sentence compression	Since we are interested in very short outputs, a LM trained on standard, uncompressed text would not be suitable.
Pattern extraction by sentence compression	Instead, we chose to modify the method of Filippova and Altun (2013) because it relies on dependency parse trees and does not use any LM scoring.

LM is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: