Index of papers in Proc. ACL 2014 that mention
  • LM
Devlin, Jacob and Zbib, Rabih and Huang, Zhongqiang and Lamar, Thomas and Schwartz, Richard and Makhoul, John
Abstract
These techniques speed up NNJM computation by a factor of l(),()()()X, making the model as fast as a standard back-off LM .
Decoding with the NNJ M
When performing hierarchical decoding with an n-gram LM , the leftmost and rightmost n — 1 words from each constituent must be stored in the state space.
Model Variations
4- gram Kneser—Ney LM
Model Variations
Dependency LM (Shen et al., 2010) Contextual lexical smoothing (Devlin, 2009) Length distribution (Shen et al., 2010)
Model Variations
0 LM adaptation (Snover et al., 2008)
Neural Network Joint Model (NNJ M)
Formally, our model approximates the probability of target hypothesis T conditioned on source sentence S. We follow the standard n-gram LM decomposition of the target, where each target word ti is conditioned on the previous n — 1 target words.
Neural Network Joint Model (NNJ M)
It is clear that this model is effectively an (n+m)-gram LM, and a lS-gram LM would be
Neural Network Joint Model (NNJ M)
Although self-normalization significantly improves the speed of NNJM lookups, the model is still several orders of magnitude slower than a back-off LM .
LM is mentioned in 14 sentences in this paper.
Topics mentioned in this paper:
Salameh, Mohammad and Cherry, Colin and Kondrak, Grzegorz
Abstract
Our novel lattice desegmentation algorithm effectively combines both segmented and desegmented Views of the target language for a large subspace of possible translation outputs, which allows for inclusion of features related to the desegmentation process, as well as an unsegmented language model ( LM ).
Methods
In order to annotate lattice edges with an n-gram LM , every path coming into a node must end with the same sequence of (n — l) tokens.
Methods
If this property does not hold, then nodes must be split until it does.4 This property is maintained by the decoder’s recombination rules for the segmented LM, but it is not guaranteed for the desegmented LM .
Methods
Indeed, the expanded word-level context is one of the main benefits of incorporating a word-level LM .
LM is mentioned in 16 sentences in this paper.
Topics mentioned in this paper:
van Gompel, Maarten and van den Bosch, Antal
Baselines
It adds a LM component to the MLF baseline.
Baselines
This LM baseline allows the comparison of classification through L1 fragments in an L2 context, with a more traditional L2 context modelling (i.e.
Discussion and conclusion
LM baseline llrl
Discussion and conclusion
llrl +LM auto
Discussion and conclusion
LM baseline 11r1
Experiments & Results
As expected, the LM baseline substantially outperforms the context-insensitive MLF baseline.
Experiments & Results
Second, our classifier approach attains a substantially higher accuracy than the LM baseline.
Experiments & Results
The same significance level was found when comparing llrl+LM against llrl, auto+LM against aut o, as well as the LM baseline against the MLF baseline.
LM is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Jia, Zhongye and Zhao, Hai
Experiments
The Chinese part of the corpus is segmented into words before LM training.
Experiments
The pinyin syllable segmentation already has very high (over 98%) accuracy with a trigram LM using improved Kneser-Ney smoothing.
Experiments
We consider different LM smoothing methods including Kneser-Ney (KN), improved Kneser-Ney (IKN), and Witten-Bell (WB).
Pinyin Input Method Model
WE(W,j—>Vj+1,k) : _10g P(Vj+1vk Vivi) Although the model is formulated on first order HMM, i.e., the LM used for transition probability is a bigram one, it is easy to extend the model to take advantage of higher order n-gram LM , by tracking longer history while traversing the graph.
Related Works
Various approaches were made for the task including language model ( LM ) based methods (Chen et al., 2013), ME model (Han and Chang, 2013), CRF (Wang et al., 2013d; Wang et al., 2013a), SMT (Chiu et al., 2013; Liu et al., 2013), and graph model (Jia et al., 2013), etc.
LM is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Xiao, Tong and Zhu, Jingbo and Zhang, Chunliang
A Skeleton-based Approach to MT 2.1 Skeleton Identification
Given a translation model m, a language model lm and a vector of feature weights w, the model score of a derivation d is computed by
A Skeleton-based Approach to MT 2.1 Skeleton Identification
g(d;w, m, lm) 2 Wm - fm(d) + wlm - lm (d) (4)
A Skeleton-based Approach to MT 2.1 Skeleton Identification
lm (d) and wlm are the score and weight of the language model, respectively.
LM is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Lu, Shixiang and Chen, Zhenbiao and Xu, Bo
Experiments and Results
The LM corpus is the English side of the parallel data (BTEC, CJK and CWMT083) (1.34M sentences).
Experiments and Results
The LM corpus is the English side of the parallel data as well as the English Gigaword corpus (LDC2007T07) (11.3M sentences).
Input Features for DNN Feature Learning
1Backward LM has been introduced by Xiong et a1.
Input Features for DNN Feature Learning
(2011), which successfully capture both the preceding and succeeding contexts of the current word, and we estimate the backward LM by inverting the order in each sentence in the training data from the original order to the reverse order.
Related Work
(2012) improved translation quality of n-gram translation model by using a bilingual neural LM , where translation probabilities are estimated using a continuous representation of translation units in lieu of standard discrete representations.
LM is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Zhai, Ke and Williams, Jason D
Latent Structure in Dialogues
The simplest formulation we consider is an HMM where each state contains a unigram language model ( LM ), proposed by Chotimongkol (2008) for task-oriented dialogue and originally
Latent Structure in Dialogues
,men are generated (independently) according to the LM .
Latent Structure in Dialogues
(2010) extends LM—HMM to allow words to be emitted from two additional sources: the topic of current dialogue qb, or a background LM a shared across all dialogues.
LM is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Hasan, Kazi Saidul and Ng, Vincent
Keyphrase Extraction Approaches
A unigram LM and an n-gram LM are constructed for each of these two corpora.
Keyphrase Extraction Approaches
Phraseness, defined using the foreground LM, is calculated as the loss of information incurred as a result of assuming a unigram LM (i.e., conditional independence among the words of the phrase) instead of an n-gram LM (i.e., the phrase is drawn from an n-gram LM ).
Keyphrase Extraction Approaches
Informativeness is computed as the loss that results because of the assumption that the candidate is sampled from the background LM rather than the foreground LM .
LM is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Heilman, Michael and Cahill, Aoife and Madnani, Nitin and Lopez, Melissa and Mulholland, Matthew and Tetreault, Joel
System Description
Finally, the system computes the average log-probability and number of out-of-vocabulary words from a language model trained on a collection of essays written by nonnative English speakers7 (“nonnative LM” ).
System Description
7. our system 0.668 — nonnative LM (§3.2.2) 0.665 — HPSG parse (§3.2.3) 0.664 — PCFG parse (§3.2.4) 0.662
System Description
— spelling (§3.2.l) 0.643 — gigaword LM (§3.2.2) 0.638 — link parse (§3.2.3) 0.632
LM is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Pighin, Daniele and Cornolti, Marco and Alfonseca, Enrique and Filippova, Katja
Pattern extraction by sentence compression
The method of Clarke and Lapata (2008) uses a trigram language model ( LM ) to score compressions.
Pattern extraction by sentence compression
Since we are interested in very short outputs, a LM trained on standard, uncompressed text would not be suitable.
Pattern extraction by sentence compression
Instead, we chose to modify the method of Filippova and Altun (2013) because it relies on dependency parse trees and does not use any LM scoring.
LM is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: