Learning Topic Representation for SMT with Neural Networks
Cui, Lei and Zhang, Dongdong and Liu, Shujie and Chen, Qiming and Li, Mu and Zhou, Ming and Yang, Muyun

Article Structure

Abstract

Statistical Machine Translation (SMT) usually utilizes contextual information to disambiguate translation candidates.

Introduction

Making translation decisions is a difficult task in many Statistical Machine Translation (SMT) systems.

Background: Deep Learning

Deep learning is an active topic in recent years which has triumphed in many machine learning research areas.

Topic Similarity Model with Neural Network

In this section, we explain our neural network based topic similarity model in detail, as well as how to incorporate the topic similarity features into SMT decoding procedure.

Experiments

4.1 Setup

Related Work

Topic modeling was first leveraged to improve SMT performance in (Zhao and Xing, 2006; Zhao and Xing, 2007).

Conclusion and Future Work

In this paper, we propose a neural network based approach to learning bilingual topic representation for SMT.

Topics

neural network

Appears in 36 sentences as: Neural Network (1) Neural network (1) neural network (22) neural networks (12)
In Learning Topic Representation for SMT with Neural Networks
  1. In this paper, we propose a novel approach to learning topic representation for parallel data using a neural network architecture, where abundant topical contexts are embedded via topic relevant monolingual data.
    Page 1, “Abstract”
  2. These topic-related documents are utilized to learn a specific topic representation for each sentence using a neural network based approach.
    Page 2, “Introduction”
  3. Neural network is an effective technique for learning different levels of data representations.
    Page 2, “Introduction”
  4. The levels inferred from neural network correspond to distinct levels of concepts, where high-level representations are obtained from low-level bag-of-words input.
    Page 2, “Introduction”
  5. Our problem fits well into the neural network framework and we expect that it can further improve inferring the topic representations for sentences.
    Page 2, “Introduction”
  6. To incorporate topic representations as translation knowledge into SMT, our neural network based approach directly optimizes similarities between the source language and target language in a compact topic space.
    Page 2, “Introduction”
  7. This technique began raising public awareness in the mid-2000s after researchers showed how a multilayer feed-forward neural network can be effectively trained.
    Page 2, “Background: Deep Learning”
  8. In this section, we explain our neural network based topic similarity model in detail, as well as how to incorporate the topic similarity features into SMT decoding procedure.
    Page 2, “Topic Similarity Model with Neural Network”
  9. Neural Network Training
    Page 3, “Topic Similarity Model with Neural Network”
  10. Figure 1: Overview of neural network based topic similarity model.
    Page 3, “Topic Similarity Model with Neural Network”
  11. There are two phases in our neural network training process: pre-training and fine-tuning.
    Page 3, “Topic Similarity Model with Neural Network”

See all papers in Proc. ACL 2014 that mention neural network.

See all papers in Proc. ACL that mention neural network.

Back to top.

parallel data

Appears in 15 sentences as: parallel data (15)
In Learning Topic Representation for SMT with Neural Networks
  1. In this paper, we propose a novel approach to learning topic representation for parallel data using a neural network architecture, where abundant topical contexts are embedded via topic relevant monolingual data.
    Page 1, “Abstract”
  2. One typical property of these approaches in common is that they only utilize parallel data where document boundaries are explicitly given.
    Page 1, “Introduction”
  3. However, this situation does not always happen since there is considerable amount of parallel data which does not have document boundaries.
    Page 1, “Introduction”
  4. This underlying topic space is learned from sentence-level parallel data in order to share topic information across the source and target languages as much as possible.
    Page 2, “Introduction”
  5. Inspired by previous successful research, we first learn sentence representations using topic-related monolingual texts in the pre-training phase, and then optimize the bilingual similarity by leveraging sentence-level parallel data in the fine-tuning phase.
    Page 2, “Background: Deep Learning”
  6. learn topic representations using sentence-level parallel data .
    Page 3, “Topic Similarity Model with Neural Network”
  7. 3.2 Fine-tuning with parallel data
    Page 4, “Topic Similarity Model with Neural Network”
  8. Consequently, the whole neural network can be fine-tuned towards the supervised criteria with the help of parallel data .
    Page 4, “Topic Similarity Model with Neural Network”
  9. In the pre-training phase, all parallel data is fed into two neural networks respectively for DAE training, where network parameters W and b are randomly initialized.
    Page 6, “Experiments”
  10. The parallel data we use is released by LDC3.
    Page 6, “Experiments”
  11. Translation models are trained over the parallel data that is automatically word-aligned
    Page 6, “Experiments”

See all papers in Proc. ACL 2014 that mention parallel data.

See all papers in Proc. ACL that mention parallel data.

Back to top.

sentence pair

Appears in 13 sentences as: sentence pair (9) sentence pairs (6)
In Learning Topic Representation for SMT with Neural Networks
  1. Given a parallel sentence pair ( f, e) , the first step is to treat f and e as queries, and use IR methods to retrieve relevant documents to enrich contextual information for them.
    Page 3, “Topic Similarity Model with Neural Network”
  2. Therefore, in this stage, parallel sentence pairs are used to help connecting the vectors from different languages because they express the same topic.
    Page 4, “Topic Similarity Model with Neural Network”
  3. Given a parallel sentence pair ( f, e), the DAB learns representations for f and 6 respectively, as zf = g(f) and ze = g(e) in Figure 1.
    Page 4, “Topic Similarity Model with Neural Network”
  4. Since a parallel sentence pair should have the same topic, our goal is to maximize the similarity score between the source sentence and target sentence.
    Page 4, “Topic Similarity Model with Neural Network”
  5. Inspired by the contrastive estimation method (Smith and Eisner, 2005), for each parallel sentence pair ( f, e) as a positive instance, we select another sentence pair ( f ’ , e’) from the training data and treat ( f, e’ > as a negative instance.
    Page 4, “Topic Similarity Model with Neural Network”
  6. When a synchronous rule (04, 7) is extracted from a sentence pair ( f, e), a triple instance I = ((04, y), ( f, e), c) is collected for inferring the topic representation of (oz, 7) , where c is the count of rule occurrence.
    Page 5, “Topic Similarity Model with Neural Network”
  7. These documents are built in the format of inverted index using Lucene2, which can be efficiently retrieved by the parallel sentence pairs .
    Page 6, “Experiments”
  8. In the fine-tuning phase, for each parallel sentence pair, we randomly select other ten sentence pairs which satisfy the criterion as negative instances.
    Page 6, “Experiments”
  9. In total, the datasets contain nearly 1.1 million sentence pairs .
    Page 6, “Experiments”
  10. In (Xiao et al., 2012), the topic of each sentence pair is exactly the same as the document it belongs to.
    Page 6, “Experiments”
  11. This is not simply coincidence since we can interpret their approach as a special case in our neural network method: when a parallel sentence pair has
    Page 7, “Experiments”

See all papers in Proc. ACL 2014 that mention sentence pair.

See all papers in Proc. ACL that mention sentence pair.

Back to top.

translation model

Appears in 11 sentences as: Translation model (1) translation model (5) translation modeling (2) Translation models (1) translation models (4)
In Learning Topic Representation for SMT with Neural Networks
  1. Current translation modeling approaches usually use context dependent information to disambiguate translation candidates.
    Page 1, “Introduction”
  2. Therefore, it is important to leverage topic information to learn smarter translation models and achieve better translation performance.
    Page 1, “Introduction”
  3. Attempts on topic-based translation modeling include topic-specific lexicon translation models (Zhao and Xing, 2006; Zhao and Xing, 2007), topic similarity models for synchronous rules (Xiao et al., 2012), and document-level translation with topic coherence (Xiong and Zhang, 2013).
    Page 1, “Introduction”
  4. Therefore, it helps to train a smarter translation model with the embedded topic information.
    Page 5, “Topic Similarity Model with Neural Network”
  5. Standard features: Translation model , including translation probabilities and lexical weights for both directions (4 features), 5-gram language model (1 feature), word count (1 feature), phrase count (1 feature), NULL penalty (1 feature), number of hierarchical rules used (1 feature).
    Page 5, “Topic Similarity Model with Neural Network”
  6. Translation models are trained over the parallel data that is automatically word-aligned
    Page 6, “Experiments”
  7. This implementation makes the system perform much better and the translation model size is much smaller.
    Page 6, “Experiments”
  8. Following this work, (Xiao et al., 2012) extended topic-specific lexicon translation models to hierarchical phrase-based translation models , where the topic information of synchronous rules was directly inferred with the help of document-level information.
    Page 8, “Related Work”
  9. They incorporated the bilingual topic information into language model adaptation and lexicon translation model adaptation, achieving significant improvements in the large-scale evaluation.
    Page 8, “Related Work”
  10. They estimated phrase-topic distributions in translation model adaptation and generated better translation quality.
    Page 8, “Related Work”
  11. Experimental results show that our approach is promising for SMT systems to learn a better translation model .
    Page 9, “Conclusion and Future Work”

See all papers in Proc. ACL 2014 that mention translation model.

See all papers in Proc. ACL that mention translation model.

Back to top.

parallel sentence

Appears in 11 sentences as: Parallel sentence (1) parallel sentence (10)
In Learning Topic Representation for SMT with Neural Networks
  1. Parallel sentence
    Page 3, “Topic Similarity Model with Neural Network”
  2. Given a parallel sentence pair ( f, e) , the first step is to treat f and e as queries, and use IR methods to retrieve relevant documents to enrich contextual information for them.
    Page 3, “Topic Similarity Model with Neural Network”
  3. Therefore, in this stage, parallel sentence pairs are used to help connecting the vectors from different languages because they express the same topic.
    Page 4, “Topic Similarity Model with Neural Network”
  4. Given a parallel sentence pair ( f, e), the DAB learns representations for f and 6 respectively, as zf = g(f) and ze = g(e) in Figure 1.
    Page 4, “Topic Similarity Model with Neural Network”
  5. Since a parallel sentence pair should have the same topic, our goal is to maximize the similarity score between the source sentence and target sentence.
    Page 4, “Topic Similarity Model with Neural Network”
  6. Inspired by the contrastive estimation method (Smith and Eisner, 2005), for each parallel sentence pair ( f, e) as a positive instance, we select another sentence pair ( f ’ , e’) from the training data and treat ( f, e’ > as a negative instance.
    Page 4, “Topic Similarity Model with Neural Network”
  7. These documents are built in the format of inverted index using Lucene2, which can be efficiently retrieved by the parallel sentence pairs.
    Page 6, “Experiments”
  8. In the fine-tuning phase, for each parallel sentence pair, we randomly select other ten sentence pairs which satisfy the criterion as negative instances.
    Page 6, “Experiments”
  9. This is not simply coincidence since we can interpret their approach as a special case in our neural network method: when a parallel sentence pair has
    Page 7, “Experiments”
  10. In addition, our method directly maximizes the similarity between parallel sentence pairs, which is ideal for SMT decoding.
    Page 9, “Related Work”
  11. We enrich contexts of parallel sentence pairs with topic related monolingual data
    Page 9, “Conclusion and Future Work”

See all papers in Proc. ACL 2014 that mention parallel sentence.

See all papers in Proc. ACL that mention parallel sentence.

Back to top.

NIST

Appears in 9 sentences as: NIST (9)
In Learning Topic Representation for SMT with Neural Networks
  1. Experimental results show that our method significantly improves translation accuracy in the NIST Chinese-to-English translation task compared to a state-of-the-art baseline.
    Page 1, “Abstract”
  2. We integrate topic similarity features in the log-linear model and evaluate the performance on the NIST Chinese-to-English translation task.
    Page 2, “Introduction”
  3. The NIST 2003 dataset is the development data.
    Page 6, “Experiments”
  4. The testing data consists of NIST 2004, 2005, 2006 and 2008 datasets.
    Page 6, “Experiments”
  5. NIST 2004
    Page 7, “Experiments”
  6. NIST 2006 39.2 r
    Page 7, “Experiments”
  7. NIST 2005
    Page 7, “Experiments”
  8. NIST 2008
    Page 7, “Experiments”
  9. An example of translation rule disambiguation for a sentence from the NIST 2005 dataset is shown in Figure 4.
    Page 8, “Experiments”

See all papers in Proc. ACL 2014 that mention NIST.

See all papers in Proc. ACL that mention NIST.

Back to top.

SMT systems

Appears in 8 sentences as: SMT system (2) SMT systems (6)
In Learning Topic Representation for SMT with Neural Networks
  1. For example, translation sense disambiguation approaches (Carpuat and Wu, 2005; Carpuat and Wu, 2007) are proposed for phrase-based SMT systems .
    Page 1, “Introduction”
  2. Meanwhile, for hierarchical phrase-based or syntax-based SMT systems , there is also much work involving rich contexts to guide rule selection (He et al., 2008; Liu et al., 2008; Marton and Resnik, 2008; Xiong et al., 2009).
    Page 1, “Introduction”
  3. Although these methods are effective and proven successful in many SMT systems , they only leverage within-
    Page 1, “Introduction”
  4. In addition, contemporary SMT systems often works on sentence level rather than document level due to the efficiency.
    Page 1, “Introduction”
  5. This makes previous approaches inefficient when applied them in real-world commercial SMT systems .
    Page 2, “Introduction”
  6. For the SMT system , the best translation candidate 6 is given by:
    Page 5, “Topic Similarity Model with Neural Network”
  7. This proves that bilingually induced topic representation with neural network helps the SMT system disambiguate translation candidates.
    Page 7, “Experiments”
  8. Experimental results show that our approach is promising for SMT systems to learn a better translation model.
    Page 9, “Conclusion and Future Work”

See all papers in Proc. ACL 2014 that mention SMT systems.

See all papers in Proc. ACL that mention SMT systems.

Back to top.

bag-of-words

Appears in 7 sentences as: bag-of-words (7)
In Learning Topic Representation for SMT with Neural Networks
  1. The levels inferred from neural network correspond to distinct levels of concepts, where high-level representations are obtained from low-level bag-of-words input.
    Page 2, “Introduction”
  2. The most relevant N documents d f and d6 are retrieved and converted to a high-dimensional, bag-of-words input f and e for the representation learningl.
    Page 3, “Topic Similarity Model with Neural Network”
  3. Assuming that the input is a n-of-V binary vector X representing the bag-of-words (V is the vocabulary size), an auto-encoder consists of an encoding process g(X) and a decoding process The objective of the auto-encoder is to minimize the reconstruction error £(h(g(X)), X).
    Page 3, “Topic Similarity Model with Neural Network”
  4. In our task, for each sentence, we treat the retrieved N relevant documents as a single large document and convert it to a bag-of-words vector X in Figure 2.
    Page 3, “Topic Similarity Model with Neural Network”
  5. After the bag-of-words input has been transformed, they are fed into a subsequent layer to model the highly nonlinear relations among words:
    Page 3, “Topic Similarity Model with Neural Network”
  6. Figure 2: Denoising auto-encoder with a bag-of-words input.
    Page 4, “Topic Similarity Model with Neural Network”
  7. These documents are converted to a bag-of-words input and fed into neural networks.
    Page 9, “Conclusion and Future Work”

See all papers in Proc. ACL 2014 that mention bag-of-words.

See all papers in Proc. ACL that mention bag-of-words.

Back to top.

topic modeling

Appears in 7 sentences as: Topic modeling (2) topic modeling (5)
In Learning Topic Representation for SMT with Neural Networks
  1. Topic modeling is a useful mechanism for discovering and characterizing various semantic concepts embedded in a collection of documents.
    Page 1, “Introduction”
  2. In this way, the topic of a sentence can be inferred with document-level information using off-the-shelf topic modeling toolkits such as Latent Dirichlet Allocation (LDA) (Blei et al., 2003) or Hidden Topic Markov Model (HTMM) (Gruber et al., 2007).
    Page 1, “Introduction”
  3. Since the information within the sentence is insufficient for topic modeling , we first enrich sentence contexts via Information Retrieval (IR) methods using content words in the sentence as queries, so that topic-related monolingual documents can be collected.
    Page 2, “Introduction”
  4. Topic modeling was first leveraged to improve SMT performance in (Zhao and Xing, 2006; Zhao and Xing, 2007).
    Page 8, “Related Work”
  5. Another direction of approaches leveraged topic modeling techniques for domain adaptation.
    Page 8, “Related Work”
  6. Generally, most previous research has leveraged conventional topic modeling techniques such as LDA or HTMM.
    Page 8, “Related Work”
  7. Compared to document-level topic modeling which uses the topic of a document for all sentences within the document (Xiao et al., 2012), our contributions are:
    Page 9, “Related Work”

See all papers in Proc. ACL 2014 that mention topic modeling.

See all papers in Proc. ACL that mention topic modeling.

Back to top.

similarity scores

Appears in 7 sentences as: similarity score (3) similarity scores (4)
In Learning Topic Representation for SMT with Neural Networks
  1. The similarity scores are integrated into the standard log-linear model for making translation decisions.
    Page 4, “Topic Similarity Model with Neural Network”
  2. The similarity score of the representation pair (zf, 26) is defined as the cosine similarity of the two vectors:
    Page 4, “Topic Similarity Model with Neural Network”
  3. Since a parallel sentence pair should have the same topic, our goal is to maximize the similarity score between the source sentence and target sentence.
    Page 4, “Topic Similarity Model with Neural Network”
  4. We incorporate the learned topic similarity scores into the standard log-linear framework for SMT.
    Page 5, “Topic Similarity Model with Neural Network”
  5. Topic-related features: rule similarity scores (2 features), rule sensitivity scores (2 features).
    Page 5, “Topic Similarity Model with Neural Network”
  6. Because topic-specific rules usually have a larger sensitivity score, they can beat general rules when they obtain the same similarity score against the input sentence.
    Page 7, “Experiments”
  7. The similarity scores indicate that “deliver X” and “distribute X” are more appropriate to translate the sentence.
    Page 8, “Experiments”

See all papers in Proc. ACL 2014 that mention similarity scores.

See all papers in Proc. ACL that mention similarity scores.

Back to top.

BLEU

Appears in 6 sentences as: BLEU (7)
In Learning Topic Representation for SMT with Neural Networks
  1. The reported BLEU scores are averaged over 5 times of running MERT (Och, 2003).
    Page 6, “Experiments”
  2. We illustrate the relationship among translation accuracy ( BLEU ), the number of retrieved documents (N) and the length of hidden layers (L) on different testing datasets.
    Page 6, “Experiments”
  3. Figure 3: End-to-end translation results ( BLEU %)
    Page 7, “Experiments”
  4. BLEU
    Page 7, “Experiments”
  5. Our method improves 0.86 BLEU points at most and 0.76 BLEU points on average over the baseline.
    Page 7, “Experiments”
  6. Finally, when all new features are integrated, the performance is the best, preforming substantially better than (Xiao et al., 2012) with 0.39 BLEU points on average.
    Page 7, “Experiments”

See all papers in Proc. ACL 2014 that mention BLEU.

See all papers in Proc. ACL that mention BLEU.

Back to top.

sentence-level

Appears in 6 sentences as: sentence-level (6)
In Learning Topic Representation for SMT with Neural Networks
  1. In this case, people understand the meaning because of the IT topical context which goes beyond sentence-level analysis and requires more relevant knowledge.
    Page 1, “Introduction”
  2. This underlying topic space is learned from sentence-level parallel data in order to share topic information across the source and target languages as much as possible.
    Page 2, “Introduction”
  3. Inspired by previous successful research, we first learn sentence representations using topic-related monolingual texts in the pre-training phase, and then optimize the bilingual similarity by leveraging sentence-level parallel data in the fine-tuning phase.
    Page 2, “Background: Deep Learning”
  4. learn topic representations using sentence-level parallel data.
    Page 3, “Topic Similarity Model with Neural Network”
  5. our method is that it is applicable to both sentence-level and document-level SMT, since we do not place any restrictions on the input.
    Page 9, “Related Work”
  6. 0 We directly optimized bilingual topic similarity in the deep learning framework with the help of sentence-level parallel data, so that the learned representation could be easily used in SMT decoding procedure.
    Page 9, “Related Work”

See all papers in Proc. ACL 2014 that mention sentence-level.

See all papers in Proc. ACL that mention sentence-level.

Back to top.

hidden layers

Appears in 6 sentences as: hidden layer (2) hidden layers (4)
In Learning Topic Representation for SMT with Neural Networks
  1. Assuming that the dimension of the 9(X) is L, the linear layer forms a L x V matriX W which projects the n-of-V vector to a L-dimensional hidden layer .
    Page 3, “Topic Similarity Model with Neural Network”
  2. Training neural networks involves many factors such as the learning rate and the length of hidden layers .
    Page 4, “Topic Similarity Model with Neural Network”
  3. The vocabulary size for the input layer is 100,000, and we choose different lengths for the hidden layer as L = {100, 300, 600, 1000} in the experiments.
    Page 6, “Experiments”
  4. 4.3 Effect of retrieved documents and length of hidden layers
    Page 6, “Experiments”
  5. We illustrate the relationship among translation accuracy (BLEU), the number of retrieved documents (N) and the length of hidden layers (L) on different testing datasets.
    Page 6, “Experiments”
  6. Another important factor is the length of hidden layers L in the network.
    Page 7, “Experiments”

See all papers in Proc. ACL 2014 that mention hidden layers.

See all papers in Proc. ACL that mention hidden layers.

Back to top.

deep learning

Appears in 6 sentences as: Deep learning (2) deep learning (4)
In Learning Topic Representation for SMT with Neural Networks
  1. Deep learning is an active topic in recent years which has triumphed in many machine learning research areas.
    Page 2, “Background: Deep Learning”
  2. Followed by fine-tuning in this parameter region, deep learning is able to achieve state-of-the-art performance in various research areas, including breakthrough results on the ImageNet dataset for objective recognition (Krizhevsky et al., 2012), significant error reduction in speech recognition (Dahl et al., 2012), etc.
    Page 2, “Background: Deep Learning”
  3. Deep learning has also been successfully applied in a variety of NLP tasks such as part-of-speech tagging, chunking, named entity recognition, semantic role labeling (Collobert et al., 2011), parsing (Socher et al., 2011a), sentiment analysis (Socher et al., 2011b), etc.
    Page 2, “Background: Deep Learning”
  4. Auto-encoder (Bengio et al., 2006) is one of the basic building blocks of deep learning .
    Page 3, “Topic Similarity Model with Neural Network”
  5. In deep learning , this parameter is often empirically tuned with human efforts.
    Page 7, “Experiments”
  6. 0 We directly optimized bilingual topic similarity in the deep learning framework with the help of sentence-level parallel data, so that the learned representation could be easily used in SMT decoding procedure.
    Page 9, “Related Work”

See all papers in Proc. ACL 2014 that mention deep learning.

See all papers in Proc. ACL that mention deep learning.

Back to top.

log-linear

Appears in 5 sentences as: log-linear (5)
In Learning Topic Representation for SMT with Neural Networks
  1. We integrate topic similarity features in the log-linear model and evaluate the performance on the NIST Chinese-to-English translation task.
    Page 2, “Introduction”
  2. The similarity scores are integrated into the standard log-linear model for making translation decisions.
    Page 4, “Topic Similarity Model with Neural Network”
  3. We incorporate the learned topic similarity scores into the standard log-linear framework for SMT.
    Page 5, “Topic Similarity Model with Neural Network”
  4. In addition to traditional SMT features, we add new topic-related features into the standard log-linear framework.
    Page 5, “Topic Similarity Model with Neural Network”
  5. We evaluate the performance of adding new topic-related features to the log-linear model and compare the translation accuracy with the method in (Xiao et al., 2012).
    Page 7, “Experiments”

See all papers in Proc. ACL 2014 that mention log-linear.

See all papers in Proc. ACL that mention log-linear.

Back to top.

LDA

Appears in 5 sentences as: LDA (5)
In Learning Topic Representation for SMT with Neural Networks
  1. In this way, the topic of a sentence can be inferred with document-level information using off-the-shelf topic modeling toolkits such as Latent Dirichlet Allocation ( LDA ) (Blei et al., 2003) or Hidden Topic Markov Model (HTMM) (Gruber et al., 2007).
    Page 1, “Introduction”
  2. Although we can easily apply LDA at the
    Page 1, “Introduction”
  3. Additionally, our model can be discriminatively trained with a large number of training instances, without expensive sampling methods such as in LDA or HTMM, thus it is more practicable and scalable.
    Page 2, “Introduction”
  4. Experiments show that their approach not only achieved better translation performance but also provided a faster decoding speed compared with previous lexicon-based LDA methods.
    Page 8, “Related Work”
  5. Generally, most previous research has leveraged conventional topic modeling techniques such as LDA or HTMM.
    Page 8, “Related Work”

See all papers in Proc. ACL 2014 that mention LDA.

See all papers in Proc. ACL that mention LDA.

Back to top.

phrase-based

Appears in 4 sentences as: phrase-based (4)
In Learning Topic Representation for SMT with Neural Networks
  1. For example, translation sense disambiguation approaches (Carpuat and Wu, 2005; Carpuat and Wu, 2007) are proposed for phrase-based SMT systems.
    Page 1, “Introduction”
  2. Meanwhile, for hierarchical phrase-based or syntax-based SMT systems, there is also much work involving rich contexts to guide rule selection (He et al., 2008; Liu et al., 2008; Marton and Resnik, 2008; Xiong et al., 2009).
    Page 1, “Introduction”
  3. In SMT training, an in-house hierarchical phrase-based SMT decoder is implemented for our experiments.
    Page 6, “Experiments”
  4. Following this work, (Xiao et al., 2012) extended topic-specific lexicon translation models to hierarchical phrase-based translation models, where the topic information of synchronous rules was directly inferred with the help of document-level information.
    Page 8, “Related Work”

See all papers in Proc. ACL 2014 that mention phrase-based.

See all papers in Proc. ACL that mention phrase-based.

Back to top.

significantly improves

Appears in 4 sentences as: significant improvement (1) significant improvements (1) significantly improves (2)
In Learning Topic Representation for SMT with Neural Networks
  1. Experimental results show that our method significantly improves translation accuracy in the NIST Chinese-to-English translation task compared to a state-of-the-art baseline.
    Page 1, “Abstract”
  2. Experimental results demonstrate that our model significantly improves translation
    Page 2, “Introduction”
  3. They incorporated the bilingual topic information into language model adaptation and lexicon translation model adaptation, achieving significant improvements in the large-scale evaluation.
    Page 8, “Related Work”
  4. It is a significant improvement over the state-of-the-art Hiero system, as well as a conventional LDA-based method.
    Page 9, “Conclusion and Future Work”

See all papers in Proc. ACL 2014 that mention significantly improves.

See all papers in Proc. ACL that mention significantly improves.

Back to top.

language model

Appears in 4 sentences as: language model (3) language modeling (2)
In Learning Topic Representation for SMT with Neural Networks
  1. Standard features: Translation model, including translation probabilities and lexical weights for both directions (4 features), 5-gram language model (1 feature), word count (1 feature), phrase count (1 feature), NULL penalty (1 feature), number of hierarchical rules used (1 feature).
    Page 5, “Topic Similarity Model with Neural Network”
  2. An in-house language modeling toolkit is used to train the 5-gram language model with modified Kneser-Ney smoothing (Kneser and Ney, 1995).
    Page 6, “Experiments”
  3. The English monolingual data used for language modeling is the same as in Table 1.
    Page 6, “Experiments”
  4. They incorporated the bilingual topic information into language model adaptation and lexicon translation model adaptation, achieving significant improvements in the large-scale evaluation.
    Page 8, “Related Work”

See all papers in Proc. ACL 2014 that mention language model.

See all papers in Proc. ACL that mention language model.

Back to top.

topic distributions

Appears in 4 sentences as: topic distributions (4)
In Learning Topic Representation for SMT with Neural Networks
  1. Since different sentences may have very similar topic distributions , we select negative instances that are dissimilar with the positive instances based on the following criteria:
    Page 4, “Topic Similarity Model with Neural Network”
  2. The PLDA toolkit (Liu et al., 2011) is used to infer topic distributions , which takes 34.5 hours to finish.
    Page 6, “Experiments”
  3. In contrast, with our neural network based approach, the learned topic distributions of “deliver X” or “distribute X” are more similar with the input sentence than “send X”, which is shown in Figure 4.
    Page 8, “Experiments”
  4. (2007) used bilingual LSA to learn latent topic distributions across different languages and enforce one-to-one topic correspondence during model training.
    Page 8, “Related Work”

See all papers in Proc. ACL 2014 that mention topic distributions.

See all papers in Proc. ACL that mention topic distributions.

Back to top.

Machine Translation

Appears in 3 sentences as: Machine Translation (2) machine translation (1)
In Learning Topic Representation for SMT with Neural Networks
  1. Statistical Machine Translation (SMT) usually utilizes contextual information to disambiguate translation candidates.
    Page 1, “Abstract”
  2. Making translation decisions is a difficult task in many Statistical Machine Translation (SMT) systems.
    Page 1, “Introduction”
  3. We evaluate the performance of our neural network based topic similarity model on a Chinese-to-English machine translation task.
    Page 5, “Experiments”

See all papers in Proc. ACL 2014 that mention Machine Translation.

See all papers in Proc. ACL that mention Machine Translation.

Back to top.

log-linear model

Appears in 3 sentences as: log-linear model (3)
In Learning Topic Representation for SMT with Neural Networks
  1. We integrate topic similarity features in the log-linear model and evaluate the performance on the NIST Chinese-to-English translation task.
    Page 2, “Introduction”
  2. The similarity scores are integrated into the standard log-linear model for making translation decisions.
    Page 4, “Topic Similarity Model with Neural Network”
  3. We evaluate the performance of adding new topic-related features to the log-linear model and compare the translation accuracy with the method in (Xiao et al., 2012).
    Page 7, “Experiments”

See all papers in Proc. ACL 2014 that mention log-linear model.

See all papers in Proc. ACL that mention log-linear model.

Back to top.

content words

Appears in 3 sentences as: content words (3)
In Learning Topic Representation for SMT with Neural Networks
  1. Since the information within the sentence is insufficient for topic modeling, we first enrich sentence contexts via Information Retrieval (IR) methods using content words in the sentence as queries, so that topic-related monolingual documents can be collected.
    Page 2, “Introduction”
  2. One problem with auto-encoder is that it treats all words in the same way, making no distinguish-ment between function words and content words .
    Page 3, “Topic Similarity Model with Neural Network”
  3. For each positive instance ( f, e), we select 6’ which contains at least 30% different content words from 6.
    Page 4, “Topic Similarity Model with Neural Network”

See all papers in Proc. ACL 2014 that mention content words.

See all papers in Proc. ACL that mention content words.

Back to top.

translation probability

Appears in 3 sentences as: translation probabilities (1) translation probability (2)
In Learning Topic Representation for SMT with Neural Networks
  1. where the translation probability is given by:
    Page 5, “Topic Similarity Model with Neural Network”
  2. Standard features: Translation model, including translation probabilities and lexical weights for both directions (4 features), 5-gram language model (1 feature), word count (1 feature), phrase count (1 feature), NULL penalty (1 feature), number of hierarchical rules used (1 feature).
    Page 5, “Topic Similarity Model with Neural Network”
  3. Although the translation probability of “send X” is much higher, it is inappropriate in this context since it is usually used in IT texts.
    Page 8, “Experiments”

See all papers in Proc. ACL 2014 that mention translation probability.

See all papers in Proc. ACL that mention translation probability.

Back to top.

translation quality

Appears in 3 sentences as: translation quality (3)
In Learning Topic Representation for SMT with Neural Networks
  1. The evaluation metric for the overall translation quality is case-insensitive BLEU4 (Papineni et al., 2002).
    Page 6, “Experiments”
  2. They reported extensive empirical analysis and improved word alignment accuracy as well as translation quality .
    Page 8, “Related Work”
  3. They estimated phrase-topic distributions in translation model adaptation and generated better translation quality .
    Page 8, “Related Work”

See all papers in Proc. ACL 2014 that mention translation quality.

See all papers in Proc. ACL that mention translation quality.

Back to top.

translation task

Appears in 3 sentences as: translation task (3)
In Learning Topic Representation for SMT with Neural Networks
  1. Experimental results show that our method significantly improves translation accuracy in the NIST Chinese-to-English translation task compared to a state-of-the-art baseline.
    Page 1, “Abstract”
  2. We integrate topic similarity features in the log-linear model and evaluate the performance on the NIST Chinese-to-English translation task .
    Page 2, “Introduction”
  3. We evaluate the performance of our neural network based topic similarity model on a Chinese-to-English machine translation task .
    Page 5, “Experiments”

See all papers in Proc. ACL 2014 that mention translation task.

See all papers in Proc. ACL that mention translation task.

Back to top.

word alignment

Appears in 3 sentences as: word alignment (3)
In Learning Topic Representation for SMT with Neural Networks
  1. using GIZA++ in both directions, and the diag-grow-final heuristic is used to refine symmetric word alignment .
    Page 6, “Experiments”
  2. They proposed a bilingual topical admixture approach for word alignment and assumed that each word-pair follows a topic-
    Page 8, “Related Work”
  3. They reported extensive empirical analysis and improved word alignment accuracy as well as translation quality.
    Page 8, “Related Work”

See all papers in Proc. ACL 2014 that mention word alignment.

See all papers in Proc. ACL that mention word alignment.

Back to top.