Mention Resolution Approach | We define a mention m as a tuple < lm, em >, where lm is the “literal” string of characters that represents m and em is the email where m is observed.1 We assume that m can be resolved to a distinguishable participant for whom at least one email address is present in the collection.2 |
Mention Resolution Approach | Select a specific lexical reference lm to refer to 0 given the context ark. |
Mention Resolution Approach | 1The exact position in em where lm is observed should also be included in the definition, but we ignore it assuming that all matched literal mentions in one email refer to the same identity. |
Dependency Language Model | Suppose we use a trigram dependency LM, |
Discussion | Only translation probability P was employed in the construction of the target forest due to the complexity of the syntax-based LM . |
Discussion | Since our dependency LM models structures over target words directly based on dependency trees, we can build a single-step system. |
Discussion | This dependency LM can also be used in hierarchical MT systems using lexical-ized CFG trees. |
Experiments | 0 str-dep: a string-to-dependency system with a dependency LM . |
Experiments | The English side of this subset was also used to train a 3-gram dependency LM . |
Experiments | BLEU% TER% lower mixed lower mixed Decoding (3—gram LM) baseline 38.18 35.77 58.91 56.60 filtered 37.92 35.48 57.80 55.43 str-dep 39.52 37.25 56.27 54.07 Rescoring (5—gram LM ) baseline 40.53 38.26 56.35 54.15 filtered 40.49 38.26 55.57 53.47 str-dep 41.60 39.47 55.06 52.96 |
Implementation Details | We rescore 1000-best translations (Huang and Chiang, 2005) by replacing the 3-gram LM score with the 5-gram LM score computed offline. |
Experimental Setup | 4.1 Distributed LM Framework |
Experimental Setup | We deploy the randomized LM in a distributed framework which allows it to scale more easily by distributing it across multiple language model servers. |
Experimental Setup | The proposed randomized LM can encode parameters estimated using any smoothing scheme (e.g. |
Experiments | size dev test test LM GB MT04 | MT05 | MT06 unpruned block 116 0.5304 0.5697 0.4663 unpruned rand 69 0.5299 0.5692 0.4659 pruned block 42 0.5294 0.5683 0.4665 pruned rand 27 0.5289 0.5679 0.4656 |
Introduction | Using higher-order models and larger amounts of training data can significantly improve performance in applications, however the size of the resulting LM can become prohibitive. |
Introduction | Efficiency is paramount in applications such as machine translation which make huge numbers of LM requests per sentence. |
Perfect Hash-based Language Models | Our randomized LM is based on the Bloomier filter (Chazelle et al., 2004). |
Scaling Language Models | In the next section we describe our randomized LM scheme based on perfect hash functions. |
Forest-based translation | The decoder performs two tasks on the translation forest: l-best search with integrated language model (LM), and k-best search with LM to be used in minimum error rate training. |
Forest-based translation | For l-best search, we use the cube pruning technique (Chiang, 2007; Huang and Chiang, 2007) which approximately intersects the translation forest with the LM . |
Forest-based translation | Basically, cube pruning works bottom up in a forest, keeping at most k +LM items at each node, and uses the best-first expansion idea from the Algorithm 2 of Huang and Chiang (2005) to speed |
Inflection prediction models | LM 81.0 69.4 Model 91.6 91.0 Avg | I | 13.9 24.1 |
Integration of inflection models with MT systems | PLM) is the joint probability of the sequence of inflected words according to a trigram language model ( LM ). |
Integration of inflection models with MT systems | The LM used for the integration is the same LM used in the base MT system that is trained on fully inflected word forms (the base MT system trained on stems uses an LM trained on a stem sequence). |
Integration of inflection models with MT systems | Equation (1) shows that the model first selects the best sequence of inflected forms for each MT hypothesis Si according to the LM and the inflection model. |
Experiments | In the tables, Lm denotes the n-gram language model feature, T mh denotes the feature of collocation between target head words and the candidate measure word, Smh denotes the feature of collocation between source head words and the candidate measure word, HS denotes the feature of source head word selection, Punc denotes the feature of target punctuation position, T [ex denotes surrounding word features in translation, Slex denotes surrounding word features in source sentence, and Pas denotes Part-Of-Speech feature. |
Experiments | Feature setting Precision Recall Baseline 54.82% 45.61% Lm 51.11% 41.24% +Tmh 61.43% 49.22% +Punc 62.54% 50.08% +Tlex 64.80% 51.87% |
Experiments | Feature setting Precision Recall Baseline 54.82% 45.61% Lm 51.11% 41.24% +Tmh+Smh 64.50% 51.64% +Hs 65.32% 52.26% +Punc 66.29% 53.10% +Pos 66.53% 53.25% +Tlex 67.50% 54.02% +Slex 69.52% 55.54% |
Experiments | In detail, a paraphrase pattern 6’ of e was reranked based on a language model ( LM ): |
Experiments | scoreLM(e’ |SE) is the LM based score: scoreLM(e’|SE) = %logPLM(S’E), where 8% is the sentence generated by replacing e in SE with e’ . |
Experiments | To investigate the contribution of the LM based score, we ran the experiment again with A = l (ignoring the LM based score) and found that the precision is 57.09%. |
Evaluation | The feature set includes: a trigram language model ( lm ) trained |
Evaluation | Discriminative max-derivation 25.78 Hiero (pd, gr, re, we) 26.48 Discriminative max—translation 27.72 Hiero (pd, 19,, p2“, pi“, 97", re, we) 28.14 Hiero (pd, 19,, p2“, pi“, 97", re, we, lm) 32.00 |
Evaluation | 8Hiero (pd, Pr, P262195”, 97“, re, we, lm ) represents state- |