Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machine Translation
Lu, Shixiang and Chen, Zhenbiao and Xu, Bo

Article Structure

Abstract

In this paper, instead of designing new features based on intuition, linguistic knowledge and domain, we learn some new and effective features using the deep auto-encoder (DAE) paradigm for phrase-based translation model.

Introduction

Recently, many new features have been explored for SMT and significant performance have been obtained in terms of translation quality, such as syntactic features, sparse features, and reordering features.

Related Work

Recently, there has been growing interest in use of DNN for SMT tasks.

Input Features for DNN Feature Learning

The phrase-based translation model (Koehn et al., 2003; Och and Ney, 2004) has demonstrated superior performance and been widely used in current SMT systems, and we employ our implementation on this translation model.

Semi-Supervised Deep Auto-encoder Features Learning for SMT

Each translation rule in the phrase-based translation model has a set number of features that are combined in the log-linear model (Och and Ney, 2002), and our semi-supervised DAE features can also be combined in this model.

Experiments and Results

5.1 Experimental Setup

Conclusions

The results also demonstrate that DNN (DAE and HCDAE) features are complementary to the original features for SMT, and adding them together obtain statistically significant improvements of 3.16 (IWSLT) and 2.06 (NIST) BLEU points over the baseline features.

Topics

phrase pair

Appears in 13 sentences as: Phrase pair (1) phrase pair (10) phrase pairs (2)
In Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machine Translation
  1. First, the input original features for the DBN feature learning are too simple, the limited 4 phrase features of each phrase pair , such as bidirectional phrase translation probability and bidirectional lexical weighting (Koehn et al., 2003), which are a bottleneck for learning effective feature representation.
    Page 1, “Introduction”
  2. To address the first shortcoming, we adapt and extend some simple but effective phrase features as the input features for new DNN feature leam-ing, and these features have been shown significant improvement for SMT, such as, phrase pair similarity (Zhao et al., 2004), phrase frequency, phrase length (Hopkins and May, 2011), and phrase generative probability (Foster et al., 2010), which also show further improvement for new phrase feature learning in our experiments.
    Page 1, “Introduction”
  3. Following (Maskey and Zhou, 2012), we use the following 4 phrase features of each phrase pair (Koehn et al., 2003) in the phrase table as the first type of input features, bidirectional phrase translation probability (P (6| f) and P (f |e)), bidirectional lexical weighting (Lem(e|f) and Lex(f|e)),
    Page 3, “Input Features for DNN Feature Learning”
  4. 3.2 Phrase pair similarity
    Page 3, “Input Features for DNN Feature Learning”
  5. (2004) proposed a way of using term weight based models in a vector space as additional evidences for phrase pair translation quality.
    Page 3, “Input Features for DNN Feature Learning”
  6. This model employ phrase pair similarity to encode the weights of content and non-content words in phrase translation pairs.
    Page 3, “Input Features for DNN Feature Learning”
  7. Following (Zhao et al., 2004), we calculate bidirectional phrase pair similarity using cosine distance and BM25 distance as,
    Page 3, “Input Features for DNN Feature Learning”
  8. We build the DAB network where the first layer with visible nodes equaling to 16, and each visible node 2),- corresponds to the above original features X in each phrase pair .
    Page 4, “Input Features for DNN Feature Learning”
  9. To speedup the pre-training, we subdivide the entire phrase pairs (with features X) in the phrase table into small mini-batches, each containing 100 cases, and update the weights after each mini-batch.
    Page 5, “Semi-Supervised Deep Auto-encoder Features Learning for SMT”
  10. Each layer is greedily pre-trained for 50 epochs through the entire phrase pairs .
    Page 5, “Semi-Supervised Deep Auto-encoder Features Learning for SMT”
  11. After the pre-training, for each phrase pair in the phrase table, we generate the DBN features (Maskey and Zhou, 2012) by passing the original phrase features X through the DBN using forward computation.
    Page 5, “Semi-Supervised Deep Auto-encoder Features Learning for SMT”

See all papers in Proc. ACL 2014 that mention phrase pair.

See all papers in Proc. ACL that mention phrase pair.

Back to top.

semi-supervised

Appears in 12 sentences as: semi-supervised (13) “semi-supervised” (1)
In Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machine Translation
  1. Using the unsupervised pre-trained deep belief net (DBN) to initialize DAE’s parameters and using the input original phrase features as a teacher for semi-supervised fine-tuning, we learn new semi-supervised DAE features, which are more effective and stable than the unsupervised DBN features.
    Page 1, “Abstract”
  2. On two Chinese-English tasks, our semi-supervised DAE features obtain statistically significant improvements of l.34/2.45 (IWSLT) and 0.82/1.52 (NIST) BLEU points over the unsupervised DBN features and the baseline features, respectively.
    Page 1, “Abstract”
  3. al., 2010), and speech spectrograms (Deng et al., 2010), we propose new feature learning using semi-supervised DAE for phrase-based translation model.
    Page 2, “Introduction”
  4. By using the input data as the teacher, the “semi-supervised” fine-tuning process of DAE addresses the problem of “back-propagation without a teacher” (Rumelhart et al., 1986), which makes the DAB learn more powerful and abstract features (Hinton and Salakhutdinov, 2006).
    Page 2, “Introduction”
  5. For our semi-supervised DAE feature learning task, we use the unsupervised pre-trained DBN to initialize DAE’s parameters and use the input original phrase features as the “teacher” for semi-supervised back-propagation.
    Page 2, “Introduction”
  6. Compared with the unsupervised DBN features, our semi-supervised DAE features are more effective and stable.
    Page 2, “Introduction”
  7. Our semi-supervised DAE features significantly outperform the unsupervised DBN features and the baseline features, and our introduced input phrase features significantly improve the performance of DAE feature
    Page 2, “Introduction”
  8. Section 4 describes how to learn our semi-supervised DAE features for SMT.
    Page 2, “Introduction”
  9. Each translation rule in the phrase-based translation model has a set number of features that are combined in the log-linear model (Och and Ney, 2002), and our semi-supervised DAE features can also be combined in this model.
    Page 4, “Semi-Supervised Deep Auto-encoder Features Learning for SMT”
  10. Figure 2: After the unsupervised pre-training, the DBNs are “unrolled” to create a semi-supervised DAE, which is then fine-tuned using back-propagation of error derivatives.
    Page 5, “Semi-Supervised Deep Auto-encoder Features Learning for SMT”
  11. To learn a semi-supervised DAE, we first “unroll” the above 11 layer DBN by using its weight matrices to create a deep, 2n-l layer network whose lower layers use the matrices to “encode” the input and whose upper layers use the matrices in reverse order to “decode” the input (Hinton and Salakhutdinov, 2006; Salakhutdinov and Hinton, 2009; Deng et al., 2010), as shown in Figure 2.
    Page 5, “Semi-Supervised Deep Auto-encoder Features Learning for SMT”

See all papers in Proc. ACL 2014 that mention semi-supervised.

See all papers in Proc. ACL that mention semi-supervised.

Back to top.

phrase table

Appears in 11 sentences as: phrase table (13)
In Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machine Translation
  1. Using the 4 original phrase features in the phrase table as the input features, they pre-trained the DBN by contrastive divergence (Hinton, 2002), and generated new unsupervised DBN features using forward computation.
    Page 1, “Introduction”
  2. These new features are appended as extra features to the phrase table for the translation decoder.
    Page 1, “Introduction”
  3. Following (Maskey and Zhou, 2012), we use the following 4 phrase features of each phrase pair (Koehn et al., 2003) in the phrase table as the first type of input features, bidirectional phrase translation probability (P (6| f) and P (f |e)), bidirectional lexical weighting (Lem(e|f) and Lex(f|e)),
    Page 3, “Input Features for DNN Feature Learning”
  4. To speedup the pre-training, we subdivide the entire phrase pairs (with features X) in the phrase table into small mini-batches, each containing 100 cases, and update the weights after each mini-batch.
    Page 5, “Semi-Supervised Deep Auto-encoder Features Learning for SMT”
  5. After the pre-training, for each phrase pair in the phrase table , we generate the DBN features (Maskey and Zhou, 2012) by passing the original phrase features X through the DBN using forward computation.
    Page 5, “Semi-Supervised Deep Auto-encoder Features Learning for SMT”
  6. To determine an adequate number of epochs and to avoid over-fitting, we fine-tune on a fraction phrase table and test performance on the remaining validation phrase table, and then repeat fine-tuning on the entire phrase table for 100 epochs.
    Page 5, “Semi-Supervised Deep Auto-encoder Features Learning for SMT”
  7. After the fine-tuning, for each phrase pair in the phrase table , we estimate our DAE features by passing the original phrase features X through the “encoder” part of the DAB using forward computation.
    Page 5, “Semi-Supervised Deep Auto-encoder Features Learning for SMT”
  8. Then, we append these features for each phrase pair to the phrase table as extra features.
    Page 5, “Semi-Supervised Deep Auto-encoder Features Learning for SMT”
  9. Thus, these new m1 + mg-dimensional DAE features are added as extra features to the phrase table .
    Page 6, “Semi-Supervised Deep Auto-encoder Features Learning for SMT”
  10. In our task, we introduce differences by using different initializations and different fractions of the phrase table .
    Page 6, “Semi-Supervised Deep Auto-encoder Features Learning for SMT”
  11. In the contrast experiments, our DAE and HCDAE features are appended as extra features to the phrase table .
    Page 7, “Experiments and Results”

See all papers in Proc. ACL 2014 that mention phrase table.

See all papers in Proc. ACL that mention phrase table.

Back to top.

translation model

Appears in 10 sentences as: translation model (9) translation models (3)
In Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machine Translation
  1. In this paper, instead of designing new features based on intuition, linguistic knowledge and domain, we learn some new and effective features using the deep auto-encoder (DAE) paradigm for phrase-based translation model .
    Page 1, “Abstract”
  2. al., 2010), and speech spectrograms (Deng et al., 2010), we propose new feature learning using semi-supervised DAE for phrase-based translation model .
    Page 2, “Introduction”
  3. (2012) improved translation quality of n-gram translation model by using a bilingual neural LM, where translation probabilities are estimated using a continuous representation of translation units in lieu of standard discrete representations.
    Page 2, “Related Work”
  4. Kalchbrenner and Blunsom (2013) introduced recurrent continuous translation models that comprise a class for purely continuous sentence-level translation models .
    Page 2, “Related Work”
  5. (2013) presented a joint language and translation model based on a recurrent neural network which predicts target words based on an unbounded history of both source and target words.
    Page 2, “Related Work”
  6. (2013) went beyond the log-linear model for SMT and proposed a novel additive neural networks based translation model , which overcome some of the shortcomings suffered by the log-linear model: linearity and the lack of deep interpretation and representation in features.
    Page 2, “Related Work”
  7. The phrase-based translation model (Koehn et al., 2003; Och and Ney, 2004) has demonstrated superior performance and been widely used in current SMT systems, and we employ our implementation on this translation model .
    Page 3, “Input Features for DNN Feature Learning”
  8. 2This corpus is used to train the translation model in our experiments, and we will describe it in detail in section 5.1.
    Page 3, “Input Features for DNN Feature Learning”
  9. Each translation rule in the phrase-based translation model has a set number of features that are combined in the log-linear model (Och and Ney, 2002), and our semi-supervised DAE features can also be combined in this model.
    Page 4, “Semi-Supervised Deep Auto-encoder Features Learning for SMT”
  10. The baseline translation models are generated by Moses with default parameter settings.
    Page 7, “Experiments and Results”

See all papers in Proc. ACL 2014 that mention translation model.

See all papers in Proc. ACL that mention translation model.

Back to top.

hidden layer

Appears in 8 sentences as: hidden layer (5) hidden layers (3)
In Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machine Translation
  1. Moreover, to learn high dimensional feature representation, we introduce a natural horizontal composition of more DAEs for large hidden layers feature learning.
    Page 1, “Abstract”
  2. Moreover, to learn high dimensional feature representation, we introduce a natural horizontal composition for DAEs (HCDAE) that can be used to create large hidden layer representations simply by horizontally combining two (or more) DAEs (Baldi, 2012), which shows further improvement compared with single DAE in our experiments.
    Page 2, “Introduction”
  3. The connection weight W, hidden layer biases c and visible layer biases b can be learned efficiently using the contrastive divergence (Hinton, 2002; Carreira-Perpinan and Hinton, 2005).
    Page 4, “Semi-Supervised Deep Auto-encoder Features Learning for SMT”
  4. When given a hidden layer h, factorial conditional distribution of visible layer 2) can be estimated by
    Page 4, “Semi-Supervised Deep Auto-encoder Features Learning for SMT”
  5. Moreover, although we have introduced another four types of phrase features (X2, X3, X4 and X 5), the only 16 features in X are a bottleneck for learning large hidden layers feature representation, because it has limited information, the performance of the high-dimensional DAE features which are directly learned from single DAE is not very satisfactory.
    Page 6, “Semi-Supervised Deep Auto-encoder Features Learning for SMT”
  6. To learn high-dimensional feature representation and to further improve the performance, we introduce a natural horizontal composition for DAEs that can be used to create large hidden layer representations simply by horizontally combining two (or more) DAEs (Baldi, 2012), as shown in Figure 3.
    Page 6, “Semi-Supervised Deep Auto-encoder Features Learning for SMT”
  7. Two single DAEs with architectures l6/m1/l6 and l6/m2/l6 can be trained and the hidden layers can be combined to yield an expanded hidden feature representation of size ml + 777.2, which can then be fed to the subsequent layers of the overall architecture.
    Page 6, “Semi-Supervised Deep Auto-encoder Features Learning for SMT”
  8. For example, the architecture 16-32-16-2 (4 layers’ network depth) corresponds to the DAE with 16-dimensional input features (X) (input layer), 32/16 hidden units (first/second hidden layer ), and 2-dimensional output features (new DAE features) (output layer).
    Page 6, “Semi-Supervised Deep Auto-encoder Features Learning for SMT”

See all papers in Proc. ACL 2014 that mention hidden layer.

See all papers in Proc. ACL that mention hidden layer.

Back to top.

significant improvements

Appears in 6 sentences as: significant improvement (1) significant improvements (2) significantly improve (1) significantly improves (2)
In Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machine Translation
  1. On two Chinese-English tasks, our semi-supervised DAE features obtain statistically significant improvements of l.34/2.45 (IWSLT) and 0.82/1.52 (NIST) BLEU points over the unsupervised DBN features and the baseline features, respectively.
    Page 1, “Abstract”
  2. To address the first shortcoming, we adapt and extend some simple but effective phrase features as the input features for new DNN feature leam-ing, and these features have been shown significant improvement for SMT, such as, phrase pair similarity (Zhao et al., 2004), phrase frequency, phrase length (Hopkins and May, 2011), and phrase generative probability (Foster et al., 2010), which also show further improvement for new phrase feature learning in our experiments.
    Page 1, “Introduction”
  3. Our semi-supervised DAE features significantly outperform the unsupervised DBN features and the baseline features, and our introduced input phrase features significantly improve the performance of DAE feature
    Page 2, “Introduction”
  4. Adding new DNN features as extra features significantly improves translation accuracy (row 2-17 vs. 1), with the highest increase of 2.45 (IWSLT) and 1.52 (NIST) (row 14 vs. 1) BLEU points over the baseline features.
    Page 7, “Experiments and Results”
  5. Also, adding more input features (X vs. X1) not only significantly improves the performance of DAE feature learning, but also slightly improves the performance of DBN feature learning.
    Page 9, “Experiments and Results”
  6. The results also demonstrate that DNN (DAE and HCDAE) features are complementary to the original features for SMT, and adding them together obtain statistically significant improvements of 3.16 (IWSLT) and 2.06 (NIST) BLEU points over the baseline features.
    Page 9, “Conclusions”

See all papers in Proc. ACL 2014 that mention significant improvements.

See all papers in Proc. ACL that mention significant improvements.

Back to top.

phrase-based

Appears in 6 sentences as: phrase-based (6)
In Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machine Translation
  1. In this paper, instead of designing new features based on intuition, linguistic knowledge and domain, we learn some new and effective features using the deep auto-encoder (DAE) paradigm for phrase-based translation model.
    Page 1, “Abstract”
  2. Instead of designing new features based on intuition, linguistic knowledge and domain, for the first time, Maskey and Zhou (2012) explored the possibility of inducing new features in an unsupervised fashion using deep belief net (DBN) (Hinton et al., 2006) for hierarchical phrase-based trans-
    Page 1, “Introduction”
  3. al., 2010), and speech spectrograms (Deng et al., 2010), we propose new feature learning using semi-supervised DAE for phrase-based translation model.
    Page 2, “Introduction”
  4. The phrase-based translation model (Koehn et al., 2003; Och and Ney, 2004) has demonstrated superior performance and been widely used in current SMT systems, and we employ our implementation on this translation model.
    Page 3, “Input Features for DNN Feature Learning”
  5. Each translation rule in the phrase-based translation model has a set number of features that are combined in the log-linear model (Och and Ney, 2002), and our semi-supervised DAE features can also be combined in this model.
    Page 4, “Semi-Supervised Deep Auto-encoder Features Learning for SMT”
  6. We choose the Moses (Koehn et al., 2007) framework to implement our phrase-based machine system.
    Page 6, “Experiments and Results”

See all papers in Proc. ACL 2014 that mention phrase-based.

See all papers in Proc. ACL that mention phrase-based.

Back to top.

NIST

Appears in 6 sentences as: NIST (7)
In Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machine Translation
  1. On two Chinese-English tasks, our semi-supervised DAE features obtain statistically significant improvements of l.34/2.45 (IWSLT) and 0.82/1.52 ( NIST ) BLEU points over the unsupervised DBN features and the baseline features, respectively.
    Page 1, “Abstract”
  2. Finally, we conduct large-scale experiments on IWSLT and NIST Chinese-English translation tasks, respectively, and the results demonstrate that our solutions solve the two aforementioned shortcomings successfully.
    Page 2, “Introduction”
  3. NIST .
    Page 6, “Experiments and Results”
  4. Our development set is NIST 2005 MT evaluation set (1084 sentences), and our test set is NIST 2006 MT evaluation set (1664 sentences).
    Page 6, “Experiments and Results”
  5. Adding new DNN features as extra features significantly improves translation accuracy (row 2-17 vs. 1), with the highest increase of 2.45 (IWSLT) and 1.52 ( NIST ) (row 14 vs. 1) BLEU points over the baseline features.
    Page 7, “Experiments and Results”
  6. The results also demonstrate that DNN (DAE and HCDAE) features are complementary to the original features for SMT, and adding them together obtain statistically significant improvements of 3.16 (IWSLT) and 2.06 ( NIST ) BLEU points over the baseline features.
    Page 9, “Conclusions”

See all papers in Proc. ACL 2014 that mention NIST.

See all papers in Proc. ACL that mention NIST.

Back to top.

LM

Appears in 5 sentences as: LM (5)
In Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machine Translation
  1. (2012) improved translation quality of n-gram translation model by using a bilingual neural LM , where translation probabilities are estimated using a continuous representation of translation units in lieu of standard discrete representations.
    Page 2, “Related Work”
  2. 1Backward LM has been introduced by Xiong et a1.
    Page 3, “Input Features for DNN Feature Learning”
  3. (2011), which successfully capture both the preceding and succeeding contexts of the current word, and we estimate the backward LM by inverting the order in each sentence in the training data from the original order to the reverse order.
    Page 3, “Input Features for DNN Feature Learning”
  4. The LM corpus is the English side of the parallel data (BTEC, CJK and CWMT083) (1.34M sentences).
    Page 6, “Experiments and Results”
  5. The LM corpus is the English side of the parallel data as well as the English Gigaword corpus (LDC2007T07) (11.3M sentences).
    Page 6, “Experiments and Results”

See all papers in Proc. ACL 2014 that mention LM.

See all papers in Proc. ACL that mention LM.

Back to top.

translation quality

Appears in 4 sentences as: translation quality (4)
In Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machine Translation
  1. Recently, many new features have been explored for SMT and significant performance have been obtained in terms of translation quality , such as syntactic features, sparse features, and reordering features.
    Page 1, “Introduction”
  2. (2012) improved translation quality of n-gram translation model by using a bilingual neural LM, where translation probabilities are estimated using a continuous representation of translation units in lieu of standard discrete representations.
    Page 2, “Related Work”
  3. (2004) proposed a way of using term weight based models in a vector space as additional evidences for phrase pair translation quality .
    Page 3, “Input Features for DNN Feature Learning”
  4. The translation quality is evaluated by case-insensitive IBM BLEU-4 metric.
    Page 7, “Experiments and Results”

See all papers in Proc. ACL 2014 that mention translation quality.

See all papers in Proc. ACL that mention translation quality.

Back to top.

translation probability

Appears in 4 sentences as: translation probabilities (1) translation probability (3)
In Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machine Translation
  1. First, the input original features for the DBN feature learning are too simple, the limited 4 phrase features of each phrase pair, such as bidirectional phrase translation probability and bidirectional lexical weighting (Koehn et al., 2003), which are a bottleneck for learning effective feature representation.
    Page 1, “Introduction”
  2. (2012) improved translation quality of n-gram translation model by using a bilingual neural LM, where translation probabilities are estimated using a continuous representation of translation units in lieu of standard discrete representations.
    Page 2, “Related Work”
  3. Following (Maskey and Zhou, 2012), we use the following 4 phrase features of each phrase pair (Koehn et al., 2003) in the phrase table as the first type of input features, bidirectional phrase translation probability (P (6| f) and P (f |e)), bidirectional lexical weighting (Lem(e|f) and Lex(f|e)),
    Page 3, “Input Features for DNN Feature Learning”
  4. where, p(ej|fi) and p(fi|ej) represents bidirectional word translation probability .
    Page 3, “Input Features for DNN Feature Learning”

See all papers in Proc. ACL 2014 that mention translation probability.

See all papers in Proc. ACL that mention translation probability.

Back to top.

Chinese-English

Appears in 4 sentences as: Chinese-English (4)
In Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machine Translation
  1. On two Chinese-English tasks, our semi-supervised DAE features obtain statistically significant improvements of l.34/2.45 (IWSLT) and 0.82/1.52 (NIST) BLEU points over the unsupervised DBN features and the baseline features, respectively.
    Page 1, “Abstract”
  2. Finally, we conduct large-scale experiments on IWSLT and NIST Chinese-English translation tasks, respectively, and the results demonstrate that our solutions solve the two aforementioned shortcomings successfully.
    Page 2, “Introduction”
  3. We now test our DAE features on the following two Chinese-English translation tasks.
    Page 6, “Experiments and Results”
  4. The bilingual corpus is the Chinese-English part of Basic Traveling Expression corpus (BTEC) and China-Japan-Korea (CJK) corpus (0.38M sentence pairs with 3.5/3.8M Chi-nese/English words).
    Page 6, “Experiments and Results”

See all papers in Proc. ACL 2014 that mention Chinese-English.

See all papers in Proc. ACL that mention Chinese-English.

Back to top.

neural networks

Appears in 3 sentences as: neural network (1) neural networks (2)
In Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machine Translation
  1. In this paper, we strive to effectively address the above two shortcomings, and systematically explore the possibility of learning new features using deep (multilayer) neural networks (DNN, which is usually referred under the name Deep Learning) for SMT.
    Page 1, “Introduction”
  2. (2013) presented a joint language and translation model based on a recurrent neural network which predicts target words based on an unbounded history of both source and target words.
    Page 2, “Related Work”
  3. (2013) went beyond the log-linear model for SMT and proposed a novel additive neural networks based translation model, which overcome some of the shortcomings suffered by the log-linear model: linearity and the lack of deep interpretation and representation in features.
    Page 2, “Related Work”

See all papers in Proc. ACL 2014 that mention neural networks.

See all papers in Proc. ACL that mention neural networks.

Back to top.

log-linear model

Appears in 3 sentences as: log-linear model (4)
In Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machine Translation
  1. (2013) went beyond the log-linear model for SMT and proposed a novel additive neural networks based translation model, which overcome some of the shortcomings suffered by the log-linear model : linearity and the lack of deep interpretation and representation in features.
    Page 2, “Related Work”
  2. Each translation rule in the phrase-based translation model has a set number of features that are combined in the log-linear model (Och and Ney, 2002), and our semi-supervised DAE features can also be combined in this model.
    Page 4, “Semi-Supervised Deep Auto-encoder Features Learning for SMT”
  3. To combine these learned features (DBN and DAB feature) into the log-linear model , we need to eliminate the impact of the nonlinear learning mechanism.
    Page 5, “Semi-Supervised Deep Auto-encoder Features Learning for SMT”

See all papers in Proc. ACL 2014 that mention log-linear model.

See all papers in Proc. ACL that mention log-linear model.

Back to top.

log-linear

Appears in 3 sentences as: log-linear (4)
In Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machine Translation
  1. (2013) went beyond the log-linear model for SMT and proposed a novel additive neural networks based translation model, which overcome some of the shortcomings suffered by the log-linear model: linearity and the lack of deep interpretation and representation in features.
    Page 2, “Related Work”
  2. Each translation rule in the phrase-based translation model has a set number of features that are combined in the log-linear model (Och and Ney, 2002), and our semi-supervised DAE features can also be combined in this model.
    Page 4, “Semi-Supervised Deep Auto-encoder Features Learning for SMT”
  3. To combine these learned features (DBN and DAB feature) into the log-linear model, we need to eliminate the impact of the nonlinear learning mechanism.
    Page 5, “Semi-Supervised Deep Auto-encoder Features Learning for SMT”

See all papers in Proc. ACL 2014 that mention log-linear.

See all papers in Proc. ACL that mention log-linear.

Back to top.

deep learning

Appears in 3 sentences as: Deep Learning (1) deep learning (2)
In Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machine Translation
  1. In this paper, we strive to effectively address the above two shortcomings, and systematically explore the possibility of learning new features using deep (multilayer) neural networks (DNN, which is usually referred under the name Deep Learning ) for SMT.
    Page 1, “Introduction”
  2. DNN features are learned from the nonlinear combination of the input original features, they strong capture high-order correlations between the activities of the original features, and we believe this deep learning paradigm induces the original features to further reach their potential for SMT.
    Page 2, “Introduction”
  3. Compared with the original features, DNN (DAE and HCDAE) features are learned from the nonlinear combination of the original features, they strong capture high-order correlations between the activities of the original features, and we believe this deep learning paradigm induces the original features to further reach their potential for SMT.
    Page 9, “Conclusions”

See all papers in Proc. ACL 2014 that mention deep learning.

See all papers in Proc. ACL that mention deep learning.

Back to top.

BLEU points

Appears in 3 sentences as: BLEU points (3)
In Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machine Translation
  1. On two Chinese-English tasks, our semi-supervised DAE features obtain statistically significant improvements of l.34/2.45 (IWSLT) and 0.82/1.52 (NIST) BLEU points over the unsupervised DBN features and the baseline features, respectively.
    Page 1, “Abstract”
  2. Adding new DNN features as extra features significantly improves translation accuracy (row 2-17 vs. 1), with the highest increase of 2.45 (IWSLT) and 1.52 (NIST) (row 14 vs. 1) BLEU points over the baseline features.
    Page 7, “Experiments and Results”
  3. The results also demonstrate that DNN (DAE and HCDAE) features are complementary to the original features for SMT, and adding them together obtain statistically significant improvements of 3.16 (IWSLT) and 2.06 (NIST) BLEU points over the baseline features.
    Page 9, “Conclusions”

See all papers in Proc. ACL 2014 that mention BLEU points.

See all papers in Proc. ACL that mention BLEU points.

Back to top.

BLEU

Appears in 3 sentences as: BLEU (3)
In Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machine Translation
  1. On two Chinese-English tasks, our semi-supervised DAE features obtain statistically significant improvements of l.34/2.45 (IWSLT) and 0.82/1.52 (NIST) BLEU points over the unsupervised DBN features and the baseline features, respectively.
    Page 1, “Abstract”
  2. Adding new DNN features as extra features significantly improves translation accuracy (row 2-17 vs. 1), with the highest increase of 2.45 (IWSLT) and 1.52 (NIST) (row 14 vs. 1) BLEU points over the baseline features.
    Page 7, “Experiments and Results”
  3. The results also demonstrate that DNN (DAE and HCDAE) features are complementary to the original features for SMT, and adding them together obtain statistically significant improvements of 3.16 (IWSLT) and 2.06 (NIST) BLEU points over the baseline features.
    Page 9, “Conclusions”

See all papers in Proc. ACL 2014 that mention BLEU.

See all papers in Proc. ACL that mention BLEU.

Back to top.