Bilingually-constrained Phrase Embeddings for Machine Translation
Zhang, Jiajun and Liu, Shujie and Li, Mu and Zhou, Ming and Zong, Chengqing

Article Structure

Abstract

We propose Bilingually-constrained Recursive Auto-encoders (BRAE) to learn semantic phrase embeddings (compact vector representations for phrases), which can distinguish the phrases with different semantic meanings.

Introduction

Due to the powerful capacity of feature learning and representation, Deep (multilayer) Neural Networks (DNN) have achieved a great success in speech and image processing (Kavukcuoglu et al., 2010; Krizhevsky et al., 2012; Dahl et al., 2012).

Related Work

Recently, phrase embedding has drawn more and more attention.

Bilingually-constrained Recursive Auto-encoders

This section introduces the Bilingually-constrained Recursive Auto-encoders (BRAE), that is inspired by two observations.

Experiments

With the semantic phrase embeddings and the vector space transformation function, we apply the BRAE to measure the semantic similarity between a source phrase and its translation candidates in the phrase-based SMT.

Discussions

5.1 Applications of The BRAE model

Conclusions and Future Work

This paper has explored the bilingually-constrained recursive auto-encoders in learning phrase embeddings, which can distinguish phrases with different semantic meanings.

Topics

embeddings

Appears in 34 sentences as: embeddings (37)
In Bilingually-constrained Phrase Embeddings for Machine Translation
  1. We propose Bilingually-constrained Recursive Auto-encoders (BRAE) to learn semantic phrase embeddings (compact vector representations for phrases), which can distinguish the phrases with different semantic meanings.
    Page 1, “Abstract”
  2. The models using word embeddings as the direct inputs to DNN cannot make full use of the whole syntactic and semantic information of the phrasal translation rules.
    Page 1, “Introduction”
  3. (2011) make the phrase embeddings capture the sentiment information.
    Page 1, “Introduction”
  4. (2013a) enable the phrase embeddings to mainly capture the syntactic knowledge.
    Page 1, “Introduction”
  5. (2013) attempt to encode the reordering pattern in the phrase embeddings .
    Page 1, “Introduction”
  6. Kalchbrenner and Blunsom (2013) utilize a simple convolution model to generate phrase embeddings from word embeddings .
    Page 1, “Introduction”
  7. Obviously, these methods of learning phrase embeddings either focus on some aspects of the phrase (e.g.
    Page 1, “Introduction”
  8. Therefore, these phrase embeddings are not suitable to fully represent the phrasal translation units in SMT due to the lack of semantic meanings of the phrase.
    Page 1, “Introduction”
  9. Instead, we focus on learning phrase embeddings from the view of semantic meaning, so that our phrase embedding can fully represent the phrase and best fit the phrase-based SMT.
    Page 1, “Introduction”
  10. of its internal words, we propose Bilingually-constrained Recursive Auto-encoders (BRAE) to learn semantic phrase embeddings .
    Page 2, “Introduction”
  11. Thus, they can supervise each other to learn their semantic phrase embeddings .
    Page 2, “Introduction”

See all papers in Proc. ACL 2014 that mention embeddings.

See all papers in Proc. ACL that mention embeddings.

Back to top.

semantic similarities

Appears in 18 sentences as: Semantic Similarities (1) semantic similarities (10) semantic similarity (9) semantically similar (1)
In Bilingually-constrained Phrase Embeddings for Machine Translation
  1. We evaluate our proposed method on two end-to-end SMT tasks (phrase table pruning and decoding with phrasal semantic similarities) which need to measure semantic similarity between a source phrase and its translation candidates.
    Page 1, “Abstract”
  2. With the learned model, we can accurately measure the semantic similarity between a source phrase and a translation candidate.
    Page 2, “Introduction”
  3. Accordingly, we evaluate the BRAE model on two end-to-end SMT tasks (phrase table pruning and decoding with phrasal semantic similarities ) which need to check whether a translation candidate and the source phrase are in the same meaning.
    Page 2, “Introduction”
  4. In phrase table pruning, we discard the phrasal translation rules with low semantic similarity .
    Page 2, “Introduction”
  5. In decoding with phrasal semantic similarities, we apply the semantic similarities of the phrase pairs as new features during decoding to guide translation can-
    Page 2, “Introduction”
  6. The experiments show that up to 72% of the phrase table can be discarded without significant decrease on the translation quality, and in decoding with phrasal semantic similarities up to 1.7 BLEU score improvement over the state-of-the-art baseline can be achieved.
    Page 2, “Introduction”
  7. With the semantic phrase embeddings and the vector space transformation function, we apply the BRAE to measure the semantic similarity between a source phrase and its translation candidates in the phrase-based SMT.
    Page 6, “Experiments”
  8. Two tasks are involved in the experiments: phrase table pruning that discards entries whose semantic similarity is very low and decoding with the phrasal semantic similarities as additional new features.
    Page 6, “Experiments”
  9. 3To avoid the situation that all the translation candidates for a source phrase are pruned, we always keep the first 10 best according to the semantic similarity .
    Page 7, “Experiments”
  10. tuitive because it is directly based on the semantic similarity .
    Page 8, “Experiments”
  11. 4.4 Decoding with Phrasal Semantic Similarities
    Page 8, “Experiments”

See all papers in Proc. ACL 2014 that mention semantic similarities.

See all papers in Proc. ACL that mention semantic similarities.

Back to top.

phrase table

Appears in 15 sentences as: Phrase Table (1) phrase table (14)
In Bilingually-constrained Phrase Embeddings for Machine Translation
  1. We evaluate our proposed method on two end-to-end SMT tasks ( phrase table pruning and decoding with phrasal semantic similarities) which need to measure semantic similarity between a source phrase and its translation candidates.
    Page 1, “Abstract”
  2. Accordingly, we evaluate the BRAE model on two end-to-end SMT tasks ( phrase table pruning and decoding with phrasal semantic similarities) which need to check whether a translation candidate and the source phrase are in the same meaning.
    Page 2, “Introduction”
  3. In phrase table pruning, we discard the phrasal translation rules with low semantic similarity.
    Page 2, “Introduction”
  4. The experiments show that up to 72% of the phrase table can be discarded without significant decrease on the translation quality, and in decoding with phrasal semantic similarities up to 1.7 BLEU score improvement over the state-of-the-art baseline can be achieved.
    Page 2, “Introduction”
  5. Two tasks are involved in the experiments: phrase table pruning that discards entries whose semantic similarity is very low and decoding with the phrasal semantic similarities as additional new features.
    Page 6, “Experiments”
  6. 4.3 Phrase Table Pruning
    Page 7, “Experiments”
  7. Pruning most of the phrase table without much impact on translation quality is very important for translation especially in environments where memory and time constraints are imposed.
    Page 7, “Experiments”
  8. We can see a common phenomenon in both of the algorithms: for the first few thresholds, the phrase table becomes smaller and smaller while the translation quality is not much decreased, but the performance jumps a lot at a certain threshold (16 for Significance pruning, 0.8 for BRAE-based one).
    Page 7, “Experiments”
  9. Specifically, the Significance algorithm can safely discard 64% of the phrase table at its threshold 12 with only 0.1 BLEU loss in the overall test.
    Page 7, “Experiments”
  10. In contrast, our BRAE-based algorithm can remove 72% of the phrase table at its threshold 0.7 with only 0.06 BLEU loss in the overall evaluation.
    Page 7, “Experiments”
  11. When the two algorithms using a similar portion of the phrase table 4 (35% in BRAE and 36% in Significance), the BRAE-based algorithm outperforms the Significance algorithm on all the test sets except for MT04.
    Page 7, “Experiments”

See all papers in Proc. ACL 2014 that mention phrase table.

See all papers in Proc. ACL that mention phrase table.

Back to top.

word embedding

Appears in 14 sentences as: word embedding (7) word embeddings (7)
In Bilingually-constrained Phrase Embeddings for Machine Translation
  1. The models using word embeddings as the direct inputs to DNN cannot make full use of the whole syntactic and semantic information of the phrasal translation rules.
    Page 1, “Introduction”
  2. Kalchbrenner and Blunsom (2013) utilize a simple convolution model to generate phrase embeddings from word embeddings .
    Page 1, “Introduction”
  3. One method considers the phrases as bag-of-words and employs a convolution model to transform the word embeddings to phrase embeddings (Collobert et al., 2011; Kalchbrenner and Blunsom, 2013).
    Page 2, “Related Work”
  4. Instead, our bilingually-constrained recursive auto-encoders not only learn the composition mechanism of generating phrases from words, but also fine tune the word embeddings during the model training stage, so that we can induce the full information of the phrases and internal words.
    Page 2, “Related Work”
  5. After learning word embeddings with DNN (Bengio et al., 2003; Collobert and Weston, 2008; Mikolov et al., 2013), each word in the vocabulary V corresponds to a vector cc 6 R”, and all the vectors are stacked into an embedding matrix L E R“ M.
    Page 3, “Bilingually-constrained Recursive Auto-encoders”
  6. Since word embeddings for two languages are learned separately and locate in different vector space, we do not enforce the phrase embeddings in two languages to be in the same semantic vector space.
    Page 5, “Bilingually-constrained Recursive Auto-encoders”
  7. 6 L: word embedding matrix L for two languages (Section 3.1.1);
    Page 5, “Bilingually-constrained Recursive Auto-encoders”
  8. For the word embedding L8, there are two choices.
    Page 6, “Bilingually-constrained Recursive Auto-encoders”
  9. Second, the word embedding matrix L8 is pre-trained with DNN (Bengio et al., 2003; Collobert and Weston, 2008; Mikolov et al., 2013) using large-scale unlabeled monolingual data.
    Page 6, “Bilingually-constrained Recursive Auto-encoders”
  10. We prefer to the second one since this kind of word embedding has already encoded some semantics of the words.
    Page 6, “Bilingually-constrained Recursive Auto-encoders”
  11. In this work, we employ the toolkit Word2Vec (Mikolov et al., 2013) to pre-train the word embedding for the source and target languages.
    Page 6, “Bilingually-constrained Recursive Auto-encoders”

See all papers in Proc. ACL 2014 that mention word embedding.

See all papers in Proc. ACL that mention word embedding.

Back to top.

vector representation

Appears in 13 sentences as: vector representation (6) Vector Representations (1) vector representations (5) vector represents (1)
In Bilingually-constrained Phrase Embeddings for Machine Translation
  1. We propose Bilingually-constrained Recursive Auto-encoders (BRAE) to learn semantic phrase embeddings (compact vector representations for phrases), which can distinguish the phrases with different semantic meanings.
    Page 1, “Abstract”
  2. embedding, which converts a word into a dense, low dimensional, real-valued vector representation (Bengio et al., 2003; Bengio et al., 2006; Collobert and Weston, 2008; Mikolov et al., 2013).
    Page 1, “Introduction”
  3. Therefore, in order to successfully apply DNN to model the whole translation process, such as modelling the decoding process, learning compact vector representations for the basic phrasal translation units is the essential and fundamental work.
    Page 1, “Introduction”
  4. In contrast, our method attempts to learn the semantic vector representation for any phrase.
    Page 3, “Related Work”
  5. 3.1.1 Word Vector Representations
    Page 3, “Bilingually-constrained Recursive Auto-encoders”
  6. In phrase embedding using composition, the word vector representation is the basis and serves as the input to the neural network.
    Page 3, “Bilingually-constrained Recursive Auto-encoders”
  7. Given a phrase which is an ordered list of m words, each word has an index i into the columns of the embedding matrix L. The index i is used to retrieve the word’s vector representation using a simple multiplication with a binary vector 6 which is zero in all positions except for the ith index:
    Page 3, “Bilingually-constrained Recursive Auto-encoders”
  8. The RAE learns the vector representation of the phrase by recursively combining two children vectors in a bottom-up manner (Socher et al., 2011).
    Page 3, “Bilingually-constrained Recursive Auto-encoders”
  9. To assess how well the parent’s vector represents its children, the standard auto-encoder reconstructs the children in a reconstruction layer:
    Page 4, “Bilingually-constrained Recursive Auto-encoders”
  10. We know from the semi-supervised phrase embedding that the learned vector representation can be well adapted to the given label.
    Page 4, “Bilingually-constrained Recursive Auto-encoders”
  11. Therefore, we can imagine that learning semantic phrase embedding is reasonable if we are given gold vector representations of the phrases.
    Page 4, “Bilingually-constrained Recursive Auto-encoders”

See all papers in Proc. ACL 2014 that mention vector representation.

See all papers in Proc. ACL that mention vector representation.

Back to top.

phrase pair

Appears in 13 sentences as: phrase pair (8) Phrase pairs (1) phrase pairs (5)
In Bilingually-constrained Phrase Embeddings for Machine Translation
  1. In decoding with phrasal semantic similarities, we apply the semantic similarities of the phrase pairs as new features during decoding to guide translation can-
    Page 2, “Introduction”
  2. We can make inference from this fact that if a model can learn the same embedding for any phrase pair sharing the same meaning, the learned embedding must encode the semantics of the phrases and the corresponding model is our desire.
    Page 4, “Bilingually-constrained Recursive Auto-encoders”
  3. For a phrase pair (3, If), two kinds of errors are involved:
    Page 5, “Bilingually-constrained Recursive Auto-encoders”
  4. For the phrase pair (3, t), the joint error is:
    Page 5, “Bilingually-constrained Recursive Auto-encoders”
  5. The final BRAE objective over the phrase pairs training set (S, T) becomes:
    Page 5, “Bilingually-constrained Recursive Auto-encoders”
  6. To obtain high-quality bilingual phrase pairs to train our BRAE model, we perform forced decoding for the bilingual training sentences and collect the phrase pairs used.
    Page 7, “Experiments”
  7. After removing the duplicates, the remaining 1.12M bilingual phrase pairs (length ranging from 1 to 7) are obtained.
    Page 7, “Experiments”
  8. These algorithms are based on corpus statistics including co-occurrence statistics, phrase pair usage and composition information.
    Page 7, “Experiments”
  9. The higher the p-value, the more likely of the phrase pair to be spurious.
    Page 7, “Experiments”
  10. Our work has the same objective, but instead of using corpus statistics, we attempt to measure the quality of the phrase pair from the view of semantic meaning.
    Page 7, “Experiments”
  11. Given a phrase pair (3, t), the BRAE model first obtains their semantic phrase representations (p8, pt), and then transforms p8 into target semantic space 198*, pt into source semantic space 1975*.
    Page 7, “Experiments”

See all papers in Proc. ACL 2014 that mention phrase pair.

See all papers in Proc. ACL that mention phrase pair.

Back to top.

recursive

Appears in 13 sentences as: Recursive (4) recursive (9)
In Bilingually-constrained Phrase Embeddings for Machine Translation
  1. We propose Bilingually-constrained Recursive Auto-encoders (BRAE) to learn semantic phrase embeddings (compact vector representations for phrases), which can distinguish the phrases with different semantic meanings.
    Page 1, “Abstract”
  2. of its internal words, we propose Bilingually-constrained Recursive Auto-encoders (BRAE) to learn semantic phrase embeddings.
    Page 2, “Introduction”
  3. In our method, the standard recursive auto-encoder (RAE) pre-trains the phrase embedding with an unsupervised algorithm by minimizing the reconstruction error (Socher et al., 2010), while the bilingually-constrained model learns to fine-tune the phrase embedding by minimizing the semantic distance between translation equivalents and maximizing the semantic distance between non-translation pairs.
    Page 2, “Introduction”
  4. Instead, our bilingually-constrained recursive auto-encoders not only learn the composition mechanism of generating phrases from words, but also fine tune the word embeddings during the model training stage, so that we can induce the full information of the phrases and internal words.
    Page 2, “Related Work”
  5. The recursive auto-encoder is typically adopted to learn the way of composition (Socher et al., 2010; Socher et al., 2011; Socher et al., 2013a; Socher et al., 2013b; Li et al., 2013).
    Page 3, “Related Work”
  6. This section introduces the Bilingually-constrained Recursive Auto-encoders (BRAE), that is inspired by two observations.
    Page 3, “Bilingually-constrained Recursive Auto-encoders”
  7. First, the recursive auto-encoder provides a reasonable composition mechanism to embed each phrase.
    Page 3, “Bilingually-constrained Recursive Auto-encoders”
  8. Figure 2: A recursive auto-encoder for a four-word phrase.
    Page 3, “Bilingually-constrained Recursive Auto-encoders”
  9. Accordingly, we propose the Bilingually-constrained Recursive Auto-encoders (BRAE),
    Page 4, “Bilingually-constrained Recursive Auto-encoders”
  10. Figure 4: An illustration of the bilingual-constrained recursive auto-encoders.
    Page 5, “Bilingually-constrained Recursive Auto-encoders”
  11. 67.60: recursive auto-encoder parameter matrices W0), Wm, and bias terms 19(1), 19(2) for two languages (Section 3.1.2);
    Page 5, “Bilingually-constrained Recursive Auto-encoders”

See all papers in Proc. ACL 2014 that mention recursive.

See all papers in Proc. ACL that mention recursive.

Back to top.

semi-supervised

Appears in 11 sentences as: Semi-supervised (1) semi-supervised (10)
In Bilingually-constrained Phrase Embeddings for Machine Translation
  1. This kind of semi-supervised phrase embedding is in fact performing phrase clustering with respect to the phrase label.
    Page 3, “Related Work”
  2. Obviously, this kind methods of semi-supervised phrase embedding do not fully address the semantic meaning of the phrases.
    Page 3, “Related Work”
  3. And the semi-supervised phrase embedding (Socher et al., 2011; Socher et al., 2013a; Li et al., 2013) further indicates that phrase embedding can be tuned with respect to the label.
    Page 3, “Bilingually-constrained Recursive Auto-encoders”
  4. We will first briefly present the unsupervised phrase embedding, and then describe the semi-supervised framework.
    Page 3, “Bilingually-constrained Recursive Auto-encoders”
  5. 3.2 Semi-supervised Phrase Embedding
    Page 4, “Bilingually-constrained Recursive Auto-encoders”
  6. Figure 3: An illustration of a semi-supervised RAE unit.
    Page 4, “Bilingually-constrained Recursive Auto-encoders”
  7. Several researchers extend the original RAEs to a semi-supervised setting so that the induced phrase embedding can predict a target label, such as polarity in sentiment analysis (Socher et al., 2011), syntactic category in parsing (Socher et al., 2013a) and phrase reordering pattern in SMT (Li et al., 2013).
    Page 4, “Bilingually-constrained Recursive Auto-encoders”
  8. In the semi-supervised RAE for phrase embedding, the objective function over a (phrase, label) pair (av, 25) includes the reconstruction error and the prediction error, as illustrated in Fig.
    Page 4, “Bilingually-constrained Recursive Auto-encoders”
  9. We know from the semi-supervised phrase embedding that the learned vector representation can be well adapted to the given label.
    Page 4, “Bilingually-constrained Recursive Auto-encoders”
  10. Like semi-supervised RAE (Li et al., 2013), the parameters 6 in our BRAE model can also be divided into three sets:
    Page 5, “Bilingually-constrained Recursive Auto-encoders”
  11. Assuming the target phrase representation p; is available, the optimization of the source-side parameters is similar to that of semi-supervised RAE.
    Page 6, “Bilingually-constrained Recursive Auto-encoders”

See all papers in Proc. ACL 2014 that mention semi-supervised.

See all papers in Proc. ACL that mention semi-supervised.

Back to top.

phrase-based

Appears in 7 sentences as: phrase-based (7)
In Bilingually-constrained Phrase Embeddings for Machine Translation
  1. However, in the conventional ( phrase-based ) SMT, phrases are the basic translation units.
    Page 1, “Introduction”
  2. Instead, we focus on learning phrase embeddings from the view of semantic meaning, so that our phrase embedding can fully represent the phrase and best fit the phrase-based SMT.
    Page 1, “Introduction”
  3. With the semantic phrase embeddings and the vector space transformation function, we apply the BRAE to measure the semantic similarity between a source phrase and its translation candidates in the phrase-based SMT.
    Page 6, “Experiments”
  4. We have implemented a phrase-based translation system with a maximum entropy based reordering model using the bracketing transduction grammar (Wu, 1997; Xiong et al., 2006).
    Page 7, “Experiments”
  5. Typically, four translation probabilities are adopted in the phrase-based SMT, including phrase translation probability and lexical weights in both directions.
    Page 8, “Experiments”
  6. The semantic similarities in two directions Sim(ps*, pt) and Sim(pt*, p8) are integrated into our baseline phrase-based model.
    Page 8, “Experiments”
  7. As the semantic phrase embedding can fully represent the phrase, we can go a step further in the phrase-based SMT and feed the semantic phrase embeddings to DNN in order to model the whole translation process (e.g.
    Page 8, “Discussions”

See all papers in Proc. ACL 2014 that mention phrase-based.

See all papers in Proc. ACL that mention phrase-based.

Back to top.

BLEU

Appears in 6 sentences as: BLEU (6)
In Bilingually-constrained Phrase Embeddings for Machine Translation
  1. The experiments show that up to 72% of the phrase table can be discarded without significant decrease on the translation quality, and in decoding with phrasal semantic similarities up to 1.7 BLEU score improvement over the state-of-the-art baseline can be achieved.
    Page 2, “Introduction”
  2. (2013) also use bag-of-words but learn BLEU sensitive phrase embeddings.
    Page 2, “Related Work”
  3. Case-insensitive BLEU is employed as the evaluation metric.
    Page 7, “Experiments”
  4. Specifically, the Significance algorithm can safely discard 64% of the phrase table at its threshold 12 with only 0.1 BLEU loss in the overall test.
    Page 7, “Experiments”
  5. In contrast, our BRAE-based algorithm can remove 72% of the phrase table at its threshold 0.7 with only 0.06 BLEU loss in the overall evaluation.
    Page 7, “Experiments”
  6. The largest improvement can be up to 1.7 BLEU score (MT06 for n = 50).
    Page 8, “Experiments”

See all papers in Proc. ACL 2014 that mention BLEU.

See all papers in Proc. ACL that mention BLEU.

Back to top.

cross-lingual

Appears in 5 sentences as: cross-lingual (6)
In Bilingually-constrained Phrase Embeddings for Machine Translation
  1. Besides SMT, the semantic phrase embeddings can be used in other cross-lingual tasks (e.g.
    Page 2, “Introduction”
  2. cross-lingual question answering) and monolingual applications such as textual entailment, question answering and paraphrase detection.
    Page 2, “Introduction”
  3. Besides SMT, the semantic phrase embeddings can be used in other cross-lingual tasks, such as cross-lingual question answering, since the semantic similarity between phrases in different languages can be calculated accurately.
    Page 8, “Discussions”
  4. In addition to the cross-lingual applications, we believe the BRAE model can be applied in many
    Page 8, “Discussions”
  5. 3) we will apply the BRAE model in other monolingual and cross-lingual tasks.
    Page 9, “Conclusions and Future Work”

See all papers in Proc. ACL 2014 that mention cross-lingual.

See all papers in Proc. ACL that mention cross-lingual.

Back to top.

semantic space

Appears in 4 sentences as: semantic space (4) semantic spaces (1)
In Bilingually-constrained Phrase Embeddings for Machine Translation
  1. Furthermore, a transformation function between the Chinese and English semantic spaces can be learned as well.
    Page 2, “Introduction”
  2. Although we also follow the composition-based phrase embedding, we are the first to focus on the semantic meanings of the phrases and propose a bilingually-constrained model to induce the semantic information and learn transformation of the semantic space in one language to the other.
    Page 3, “Related Work”
  3. Given a phrase pair (3, t), the BRAE model first obtains their semantic phrase representations (p8, pt), and then transforms p8 into target semantic space 198*, pt into source semantic space 1975*.
    Page 7, “Experiments”
  4. the semantic space in one language to the other.
    Page 9, “Conclusions and Future Work”

See all papers in Proc. ACL 2014 that mention semantic space.

See all papers in Proc. ACL that mention semantic space.

Back to top.

translation probability

Appears in 4 sentences as: translation probabilities (2) translation probability (3)
In Bilingually-constrained Phrase Embeddings for Machine Translation
  1. Besides using the semantic similarities to prune the phrase table, we also employ them as two informative features like the phrase translation probability to guide translation hypotheses selection during decoding.
    Page 8, “Experiments”
  2. Typically, four translation probabilities are adopted in the phrase-based SMT, including phrase translation probability and lexical weights in both directions.
    Page 8, “Experiments”
  3. The phrase translation probability is based on co-occurrence statistics and the lexical weights consider the phrase as bag-of-W0rds.
    Page 8, “Experiments”
  4. Therefore, the semantic similarities computed using our BRAE model are complementary to the existing four translation probabilities .
    Page 8, “Experiments”

See all papers in Proc. ACL 2014 that mention translation probability.

See all papers in Proc. ACL that mention translation probability.

Back to top.

translation quality

Appears in 4 sentences as: translation quality (4)
In Bilingually-constrained Phrase Embeddings for Machine Translation
  1. The experiments show that up to 72% of the phrase table can be discarded without significant decrease on the translation quality , and in decoding with phrasal semantic similarities up to 1.7 BLEU score improvement over the state-of-the-art baseline can be achieved.
    Page 2, “Introduction”
  2. Pruning most of the phrase table without much impact on translation quality is very important for translation especially in environments where memory and time constraints are imposed.
    Page 7, “Experiments”
  3. We can see a common phenomenon in both of the algorithms: for the first few thresholds, the phrase table becomes smaller and smaller while the translation quality is not much decreased, but the performance jumps a lot at a certain threshold (16 for Significance pruning, 0.8 for BRAE-based one).
    Page 7, “Experiments”
  4. As shown in Table 2, no matter What n is, the BRAE model can significantly improve the translation quality in the overall test data.
    Page 8, “Experiments”

See all papers in Proc. ACL 2014 that mention translation quality.

See all papers in Proc. ACL that mention translation quality.

Back to top.

question answering

Appears in 3 sentences as: question answering (4)
In Bilingually-constrained Phrase Embeddings for Machine Translation
  1. cross-lingual question answering) and monolingual applications such as textual entailment, question answering and paraphrase detection.
    Page 2, “Introduction”
  2. Besides SMT, the semantic phrase embeddings can be used in other cross-lingual tasks, such as cross-lingual question answering , since the semantic similarity between phrases in different languages can be calculated accurately.
    Page 8, “Discussions”
  3. monolingual NLP tasks which depend on good phrase representations or semantic similarity between phrases, such as named entity recognition, parsing, textual entailment, question answering and paraphrase detection.
    Page 9, “Discussions”

See all papers in Proc. ACL 2014 that mention question answering.

See all papers in Proc. ACL that mention question answering.

Back to top.

semantic representation

Appears in 3 sentences as: semantic representation (2) semantic representations (1)
In Bilingually-constrained Phrase Embeddings for Machine Translation
  1. Fortunately, we know the fact that the two phrases should share the same semantic representation if they express the same meaning.
    Page 4, “Bilingually-constrained Recursive Auto-encoders”
  2. The above equation also indicates that the source-side parameters 65 can be optimized independently as long as the semantic representation pt of the target phrase 75 is given to compute Esem(s|t, 6) with Eq.
    Page 6, “Bilingually-constrained Recursive Auto-encoders”
  3. For example, as each node in the recursive auto-encoder shares the same weight matrix, the BRAE model would become weak at learning the semantic representations for long sentences with tens of words.
    Page 9, “Discussions”

See all papers in Proc. ACL 2014 that mention semantic representation.

See all papers in Proc. ACL that mention semantic representation.

Back to top.

objective function

Appears in 3 sentences as: Objective Function (1) objective function (2)
In Bilingually-constrained Phrase Embeddings for Machine Translation
  1. After that, we introduce the BRAE on the network structure, objective function and parameter inference.
    Page 3, “Bilingually-constrained Recursive Auto-encoders”
  2. In the semi-supervised RAE for phrase embedding, the objective function over a (phrase, label) pair (av, 25) includes the reconstruction error and the prediction error, as illustrated in Fig.
    Page 4, “Bilingually-constrained Recursive Auto-encoders”
  3. 3.3.1 The Objective Function
    Page 5, “Bilingually-constrained Recursive Auto-encoders”

See all papers in Proc. ACL 2014 that mention objective function.

See all papers in Proc. ACL that mention objective function.

Back to top.

end-to-end

Appears in 3 sentences as: end-to-end (3)
In Bilingually-constrained Phrase Embeddings for Machine Translation
  1. We evaluate our proposed method on two end-to-end SMT tasks (phrase table pruning and decoding with phrasal semantic similarities) which need to measure semantic similarity between a source phrase and its translation candidates.
    Page 1, “Abstract”
  2. Accordingly, we evaluate the BRAE model on two end-to-end SMT tasks (phrase table pruning and decoding with phrasal semantic similarities) which need to check whether a translation candidate and the source phrase are in the same meaning.
    Page 2, “Introduction”
  3. Two end-to-end SMT tasks are involved to test the power of the proposed model at learning the semantic phrase embeddings.
    Page 9, “Conclusions and Future Work”

See all papers in Proc. ACL 2014 that mention end-to-end.

See all papers in Proc. ACL that mention end-to-end.

Back to top.

bag-of-words

Appears in 3 sentences as: bag-of-words (3)
In Bilingually-constrained Phrase Embeddings for Machine Translation
  1. bag-of-words or indivisible n-gram).
    Page 1, “Introduction”
  2. One method considers the phrases as bag-of-words and employs a convolution model to transform the word embeddings to phrase embeddings (Collobert et al., 2011; Kalchbrenner and Blunsom, 2013).
    Page 2, “Related Work”
  3. (2013) also use bag-of-words but learn BLEU sensitive phrase embeddings.
    Page 2, “Related Work”

See all papers in Proc. ACL 2014 that mention bag-of-words.

See all papers in Proc. ACL that mention bag-of-words.

Back to top.