Experiments and Results | In the contrast experiments, our DAE and HCDAE features are appended as extra features to the phrase table . |
Input Features for DNN Feature Learning | Following (Maskey and Zhou, 2012), we use the following 4 phrase features of each phrase pair (Koehn et al., 2003) in the phrase table as the first type of input features, bidirectional phrase translation probability (P (6| f) and P (f |e)), bidirectional lexical weighting (Lem(e|f) and Lex(f|e)), |
Introduction | Using the 4 original phrase features in the phrase table as the input features, they pre-trained the DBN by contrastive divergence (Hinton, 2002), and generated new unsupervised DBN features using forward computation. |
Introduction | These new features are appended as extra features to the phrase table for the translation decoder. |
Semi-Supervised Deep Auto-encoder Features Learning for SMT | To speedup the pre-training, we subdivide the entire phrase pairs (with features X) in the phrase table into small mini-batches, each containing 100 cases, and update the weights after each mini-batch. |
Semi-Supervised Deep Auto-encoder Features Learning for SMT | After the pre-training, for each phrase pair in the phrase table , we generate the DBN features (Maskey and Zhou, 2012) by passing the original phrase features X through the DBN using forward computation. |
Semi-Supervised Deep Auto-encoder Features Learning for SMT | To determine an adequate number of epochs and to avoid over-fitting, we fine-tune on a fraction phrase table and test performance on the remaining validation phrase table, and then repeat fine-tuning on the entire phrase table for 100 epochs. |
Abstract | We evaluate our proposed method on two end-to-end SMT tasks ( phrase table pruning and decoding with phrasal semantic similarities) which need to measure semantic similarity between a source phrase and its translation candidates. |
Experiments | Two tasks are involved in the experiments: phrase table pruning that discards entries whose semantic similarity is very low and decoding with the phrasal semantic similarities as additional new features. |
Experiments | 4.3 Phrase Table Pruning |
Experiments | Pruning most of the phrase table without much impact on translation quality is very important for translation especially in environments where memory and time constraints are imposed. |
Introduction | Accordingly, we evaluate the BRAE model on two end-to-end SMT tasks ( phrase table pruning and decoding with phrasal semantic similarities) which need to check whether a translation candidate and the source phrase are in the same meaning. |
Introduction | In phrase table pruning, we discard the phrasal translation rules with low semantic similarity. |
Introduction | The experiments show that up to 72% of the phrase table can be discarded without significant decrease on the translation quality, and in decoding with phrasal semantic similarities up to 1.7 BLEU score improvement over the state-of-the-art baseline can be achieved. |
Evaluation | The baseline is a state-of-the-art phrase-based system; we perform word alignment using a lexicalized hidden Markov model, and then the phrase table is extracted using the grow—diag—final heuristic (Koehn et al., 2003). |
Evaluation | The 13 baseline features (2 lexical, 2 phrasal, 5 HRM, and 1 language model, word penalty, phrase length feature and distortion penalty feature) were tuned using MERT (Och, 2003), which is also used to tune the 4 feature weights introduced by the secondary phrase table (2 lexical and 2 phrasal, other features being shared between the two tables). |
Generation & Propagation | Our goal is to obtain translation distributions for source phrases that are not present in the phrase table extracted from the parallel corpus. |
Generation & Propagation | If a source phrase is found in the baseline phrase table it is called a labeled phrase: its conditional empirical probability distribution over target phrases (estimated from the parallel data) is used as the label, and is sub- |
Generation & Propagation | Prior to generation, one phrase node for each target phrase occurring in the baseline phrase table is added to the target graph (black nodes in Fig. |
Introduction | The additional phrases are incorporated in the SMT system through a secondary phrase table (§2.5). |
Related Work | (2012) propose a method that utilizes a preexisting phrase table and a small bilingual lexicon, and performs BLI using monolingual corpora. |
Related Work | Decipherment-based approaches (Ravi and Knight, 2011; Dou and Knight, 2012) have generally taken a monolingual view to the problem and combine phrase tables through the log-linear model during feature weight training. |
Paraphrasing | We define associations in cc and c primarily by looking up phrase pairs in a phrase table constructed using the PARALEX corpus (Fader et al., 2013). |
Paraphrasing | We use the word alignments to construct a phrase table by applying the consistent phrase pair heuristic (Och and Ney, 2004) to all 5-grams. |
Paraphrasing | This results in a phrase table with approximately 1.3 million phrase pairs. |
Methods | This is sometimes referred to as a word graph (Ueffing et al., 2002), although in our case the segmented phrase table also produces tokens that correspond to morphemes. |
Methods | (2010) address this problem by forcing the decoder’s phrase table to respect word boundaries, guaranteeing that each de-segmentable token sequence is local to an edge. |
Related Work | Alternatively, one can reparame-terize existing phrase tables as exponential models, so that translation probabilities account for source context and morphological features (Jeong et al., 2010; Subotin, 2011). |