Abstract | Monolingual translation probabilities have recently been introduced in retrieval models to solve the lexical gap problem. |
Abstract | We also show that the monolingual translation probabilities obtained (i) are comparable to traditional semantic relatedness measures and (ii) significantly improve the results over the query likelihood and the vector-space model for answer finding. |
Introduction | To do so, we compare translation probabilities with concept vector based semantic relatedness measures with respect to human relatedness rankings for reference word pairs. |
Introduction | Section 3 presents the monolingual parallel datasets we used for obtaining monolingual translation probabilities . |
Parallel Datasets | We used the GIZA++ SMT Toolkit4 (Och and Ney, 2003) in order to obtain word-to-word translation probabilities from the parallel datasets described above. |
Parallel Datasets | The first method consists in a linear combination of the word-to-word translation probabilities after training: |
Related Work | The main drawback lies in the availability of suitable training data for the translation probabilities . |
Related Work | The rationale behind translation-based retrieval models is that monolingual translation probabilities encode some form of semantic knowledge. |
Related Work | While classical measures of semantic relatedness have been extensively studied and compared, based on comparisons with human relatedness judgements or word-choice problems, there is no comparable intrinsic study of the relatedness measures obtained through word translation probabilities . |
Phrase Pair Refinement and Parameterization | This refinement can be applied before finding the phrase pair with maximum probability (Line 12 in Figure 2) so that the duplicate words do not affect the calculation of translation probability of phrase pair. |
Phrase Pair Refinement and Parameterization | 5.2 Translation Probability Estimation |
Phrase Pair Refinement and Parameterization | It is well known that in the phrase-based SMT there are four translation probabilities and the reordering probability for each phrase pair. |
Probabilistic Bilingual Lexicon Acquisition | After using the log-likelihood-ratios algorithm2, we obtain a probabilistic bilingual lexicon with bidirectional translation probabilities from the out-of-domain data. |
Probabilistic Bilingual Lexicon Acquisition | It should be noted that there is no translation probability in this lexicon. |
Probabilistic Bilingual Lexicon Acquisition | In order to assign probabilities to each entry, we apply the Corpus Translation Probability which used in (Wu et al., 2008): given an in-domain source language monolingual data, we translate this data with the phrase-based model trained on the out-of-domain News data, the in-domain lexicon and the in-domain target language monolingual data (for language model estimation). |
Abstract | Several attempts have been made to learn phrase translation probabilities for phrase-based statistical machine translation that go beyond pure counting of phrases in word-aligned training data. |
Alignment | Therefore those long phrases are trained to fit only a few sentence pairs, strongly overestimating their translation probabilities and failing to generalize. |
Alignment | When using leaving-one-out, we modify the phrase translation probabilities for each sentence pair. |
Introduction | The phrase translation table, which contains the bilingual phrase pairs and the corresponding translation probabilities , is one of the main components of an SMT system. |
Introduction | In this work, we propose to directly train our phrase models by applying a forced alignment procedure where we use the decoder to find a phrase alignment between source and target sentences of the training data and then updating phrase translation probabilities based on this alignment. |
Phrase Model Training | We have developed two different models for phrase translation probabilities which make use of the force-aligned training data. |
Phrase Model Training | The simplest of our generative phrase models estimates phrase translation probabilities by their relative frequencies in the Viterbi alignment of the data, similar to the heuristic model but with counts from the phrase-aligned data produced in training rather than computed on the basis of a word alignment. |
Phrase Model Training | The translation probability of a phrase pair (1216) is estimated as |
Related Work | That in turn leads to over-fitting which shows in overly deter-minized estimates of the phrase translation probabilities . |
Related Work | They show that by applying a prior distribution over the phrase translation probabilities they can prevent over-fitting. |
Abstract | We show that cross-lingual XMEANT outperforms monolingual MEANT by (l) replacing the monolingual context vector model in MEANT with simple translation probabilities , and (2) incorporating bracketing ITG constraints. |
Conclusion | This is (l) accomplished by replacing monolingual MEANT’s context vector model with simple translation probabilities when computing similarities of semantic role fillers, and (2) further improved by incorporating BITG constraints for aligning the tokens in semantic role fillers. |
Introduction | XMEANT is obtained by (1) using simple lexical translation probabilities , instead of the monolingual context vector model used in MEANT for computing the semantic role fillers similarities, and (2) incorporating bracketing ITG constrains for word alignment within the semantic role fillers. |
Introduction | We therefore propose XMEANT, a cross-lingual MT evaluation metric, that modifies MEANT using (1) simple translation probabilities (in our experiments, |
Related Work | Apply the maximum weighted bipartite matching algorithm to align the semantic frames between the foreign input and MT output according to the lexical translation probabilities of the predicates. |
Related Work | For each pair of the aligned frames, apply the maximum weighted bipartite matching algorithm to align the arguments between the foreign input and MT output according to the aggregated phrasal translation probabilities of the role fillers. |
XMEANT: a cross-lingual MEANT | But whereas MEANT measures lexical similarity using a monolingual context vector model, XMEANT instead substitutes simple cross-lingual lexical translation probabilities . |
XMEANT: a cross-lingual MEANT | To aggregate individual lexical translation probabilities into phrasal similarities between cross-lingual semantic role fillers, we compared two natural approaches to generalizing MEANT’s method of comparing semantic parses, as described below. |
XMEANT: a cross-lingual MEANT | ing lexical translation probabilities within semantic role filler phrases. |
Conclusion and Future Work | cinct phrase table with more accurate translation probabilities . |
Conclusion and Future Work | In future work, we will also introduce incremental learning for phase pair extraction inside a domain, which means using the current translation probabilities already obtained as the base measure of sampling parameters for the upcoming domain. |
Experiment | Pialign-adaptive: Alignment and phrase pairs extraction are same to Pialign-batch, while translation probabilities are estimated by the adaptive method with monolingual topic information (Su et al., 2012). |
Experiment | The method established the relationship between the out-of-domain bilingual corpus and in-domain monolingual corpora via topic distribution to estimate the translation probability . |
Experiment | In the phrase table combination process, the translation probability of each phrase pair is estimated by the Hier-combin and the other features are also linearly combined by averaging |
Hierarchical Phrase Table Combination | For example in Figure 2 (a), the ith phrase pair (6,, fi> appears only in the domain 1 and domain 2, so its translation probability can be calculated by substituting Equation (3) with Equation (2): |
Hierarchical Phrase Table Combination | Algorithm 1 Translation Probabilities Estimation Input: , ti , ijase, 03, T9, dj and sj Output: The translation probabilities for each pair 1: for all phrase pair (6,, fi> do 2: Initialize the P((ei, = 0 and w,- = 1 3: for all domain (Ej, Fj> such that l g j g |
Phrase Pair Extraction with Unsupervised Phrasal ITGs | Translation probabilities of ITG phrasal align- |
Experiments | Additionally, we adopt GIZA++ to get the word alignment of in-domain parallel data and form the word translation probability table. |
Experiments | This table will be used to compute the translation probability of general-domain sentence pairs. |
Introduction | Meanwhile, the translation model measures the translation probability of sentence pair, being used to verify the parallelism of the selected domain-relevant bitext. |
Related Work | The reason is that a sentence pair contains a source language sentence and a target language sentence, while the existing methods are incapable of evaluating the mutual translation probability of sentence pair in the target domain. |
Training Data Selection Methods | However, in this paper, we adopt the translation model to evaluate the translation probability of sentence pair and develop a simple but effective variant of translation model to rank the sentence pairs in the general-domain corpus. |
Training Data Selection Methods | Where P(e| f) is the translation model, which is IBM Model 1 in this paper, it represents the translation probability of target language sentence e conditioned on source language sentence f. le and If are the number of words in sentence 6 and f respectively. |
Training Data Selection Methods | t(ej|fi) is the translation probability of word 61- conditioned on word fiand is estimated from the small in-domain parallel data. |
Abstract | For training, we derive growth transformations for phrase and lexicon translation probabilities to iteratively improve the objective. |
Abstract | For effective optimization, we derive updating formulas of growth transformation (GT) for phrase and lexicon translation probabilities . |
Abstract | Then the phrase translation probabilities were estimated based on the phrase alignments. |
Substructure Spaces for BTKs | The lexical features with directions are defined as conditional feature functions based on the conditional lexical translation probabilities . |
Substructure Spaces for BTKs | where P(v|u) refers to the lexical translation probability from the source word u to the target word 12 within the subtree spans, while P(u|v) refers to that from target to source; in(S) refers to the word set for the internal span of the source subtree 5, while in(T) refers to that of the target subtree T. |
Substructure Spaces for BTKs | Internal—External Lexical Features: These features are motivated by the fact that lexical translation probabilities within the translational equivalence tend to be high, and that of the non- |
Alignment Link Confidence Measure | which is defined as the word translation probability of the aligned word pair divided by the sum of the translation probabilities over all the target words in the sentence. |
Alignment Link Confidence Measure | Intuitively, the above link confidence definition compares the lexical translation probability of the aligned word pair with the translation probabilities of all the target words given the source word. |
Sentence Alignment Confidence Measure | 2 82“ ' > 2.1/mans) ( ) When computing the source-to-target alignment posterior probability, the numerator is the sentence translation probability calculated according to the given alignment A: |
Sentence Alignment Confidence Measure | It is the product of lexical translation probabilities for the aligned word pairs. |
Sentence Alignment Confidence Measure | The denominator is the sentence translation probability summing over all possible alignments, which can be calculated similar to IBM Model 1 in (Brown et al., 1994) |
Introduction | It multiples corresponding translation probabilities and lexical weights in source-pivot and pivot-target translation models to induce a new source-target phrase table. |
Pivot Methods for Phrase-based SMT | Based on these two models, we induce a source-target translation model, in which two important elements need to be induced: phrase translation probability and lexical weight. |
Pivot Methods for Phrase-based SMT | Phrase Translation Probability We induce the phrase translation probability by assuming the independence between the source and target phrases when given the pivot phrase. |
Pivot Methods for Phrase-based SMT | (2003), there are two important elements in the lexical weight: word alignment information a in a phrase pair (5, f) and lexical translation probability w (s |
Experiments | Here, s/t represent the source/target part of a rule in PTT, TRS, or PRS; and are translation probabilities and lexical weights of rules from PTT, TRS, and PRS. |
Fine-grained rule extraction | Maximum likelihood estimation is used to calculate the translation probabilities of each rule. |
Introduction | Table l: Bidirectional translation probabilities of rules, denoted in the brackets, change when voice is attached to “killed”. |
Introduction | As will be testified by our experiments, we argue that the simple POS/phrasal tags are too coarse to reflect the accurate translation probabilities of the translation rules. |
Related Work | (2006) constructed a derivation-forest, in which composed rules were generated, unaligned words of foreign language were consistently attached, and the translation probabilities of rules were estimated by using Expectation-Maximization (EM) (Dempster et al., 1977) training. |
A Joint Model with Unlabeled Parallel Text | The intuition here is that if the translation probability of two sentences is high, the probability that they have the same sentiment label should be high as well. |
A Joint Model with Unlabeled Parallel Text | where p(a,-) is the translation probability of the i-th sentence pair in U :2 37’ is the opposite of y’; the first term models the probability that x}, and x5, have the same label; and the second term models the probability that they have different labels. |
Experimental Setup 4.1 Data Sets and Preprocessing | Because sentence pairs in the ISI corpus are quite noisy, we rely on Giza++ (Och and Ney, 2003) to obtain a new translation probability for each sentence pair, and select the 100,000 pairs with the highest translation probabilities.5 |
Experimental Setup 4.1 Data Sets and Preprocessing | We first obtain translation probabilities for both directions (i.e. |
Results and Analysis | Preliminary experiments showed that Equation 5 does not significantly improve the performance in our case, which is reasonable since we choose only sentence pairs with the highest translation probabilities to be our unlabeled data (see Section 4.1). |
Experiments | Besides using the semantic similarities to prune the phrase table, we also employ them as two informative features like the phrase translation probability to guide translation hypotheses selection during decoding. |
Experiments | Typically, four translation probabilities are adopted in the phrase-based SMT, including phrase translation probability and lexical weights in both directions. |
Experiments | The phrase translation probability is based on co-occurrence statistics and the lexical weights consider the phrase as bag-of-W0rds. |
Decoding | In the topic-specific lexicon translation model, given a source document, it first calculates the topic-specific translation probability by normalizing the entire lexicon translation table, and then adapts the lexical weights of rules correspondingly. |
Estimation | The process of rule-topic distribution estimation is analogous to the traditional estimation of rule translation probability (Chiang, 2007). |
Experiments | The lexicon translation probability is adapted by: |
Introduction | Such models first estimate word translation probabilities conditioned on topics, and then adapt lexical weights of phrases |
Experimental Results | In estimating phrase translation probability , we use accumulated HMM-based phrase pair posteriors |
Experimental Results | as their ‘soft’ frequencies and then the final translation probability is the relative frequency. |
Introduction | Using relative frequency as translation probability is a common practice to measure goodness of a phrase pair. |
Introduction | The translation probability can also be discriminatively trained such as in Tillmann and Zhang (2006). |
Experiments | In this section, we propose a method to modify the translation probabilities of the t-table by interpolating the translation counts with transliteration counts. |
Experiments | We combine the transliteration probabilities with the translation probabilities of the IBM models and the HMM model. |
Experiments | The normal translation probability pta( f |e) of the word alignment models is computed with relative frequency estimates. |
Input Features for DNN Feature Learning | Following (Maskey and Zhou, 2012), we use the following 4 phrase features of each phrase pair (Koehn et al., 2003) in the phrase table as the first type of input features, bidirectional phrase translation probability (P (6| f) and P (f |e)), bidirectional lexical weighting (Lem(e|f) and Lex(f|e)), |
Input Features for DNN Feature Learning | where, p(ej|fi) and p(fi|ej) represents bidirectional word translation probability . |
Introduction | First, the input original features for the DBN feature learning are too simple, the limited 4 phrase features of each phrase pair, such as bidirectional phrase translation probability and bidirectional lexical weighting (Koehn et al., 2003), which are a bottleneck for learning effective feature representation. |
Related Work | (2012) improved translation quality of n-gram translation model by using a bilingual neural LM, where translation probabilities are estimated using a continuous representation of translation units in lieu of standard discrete representations. |
Head-Driven HPB Translation Model | o Phd_h7- and Phde- (3|t), translation probabilities for HD—HRs; |
Head-Driven HPB Translation Model | Plea, and Plea, (3|t), lexical translation probabilities for HD—HRs; |
Head-Driven HPB Translation Model | Ptyhcbhr = escp (—1), rule penalty for HD—HRs; PW (t|s), translation probability for NRRs; |
Static MT Quality Estimation | 0 17 decoding features, including phrase translation probabilities (source-to-target and target-to-source), word translation probabilities (also in both directions), maxent prob-abilitiesl, word count, phrase count, distor- |
Static MT Quality Estimation | 1The maxent probability is the translation probability |
Static MT Quality Estimation | 0 The average translation probability of the phrase translation pairs in the final translation, which provides the overall translation quality on the phrase level. |
Experiments | Although the translation probability of “send X” is much higher, it is inappropriate in this context since it is usually used in IT texts. |
Topic Similarity Model with Neural Network | where the translation probability is given by: |
Topic Similarity Model with Neural Network | Standard features: Translation model, including translation probabilities and lexical weights for both directions (4 features), 5-gram language model (1 feature), word count (1 feature), phrase count (1 feature), NULL penalty (1 feature), number of hierarchical rules used (1 feature). |
DNN for word alignment | where Plegc is the lexical translation probability and Pd is the jump distance distortion probability. |
DNN for word alignment | In the classic HMM word alignment model, context is not considered in the lexical translation probability . |
Training | our model from raw sentence pairs, they are too computational demanding as the lexical translation probabilities must be computed from neural networks. |
Related Work | In principle, our architecture can support all mixture operations that (Razmara et al., 2012) describe, plus additional ones such as forms of instance weighting, which are not possible after the translation probabilities have been computed. |
Translation Model Architecture | Traditionally, the phrase translation probabilities p(§|f) and p(f|§) are estimated through un-smoothed maximum likelihood estimation (MLE). |
Translation Model Architecture | The word translation probabilities w (t,- | 33) are de- |
Sentence Selection: Single Language Pair | Additionally phrase translation probabilities need to be estimated accurately, which means sentences that offer phrases whose occurrences in the corpus were rare are informative. |
Sentence Selection: Single Language Pair | estimating accurately the phrase translation probabilities . |
Sentence Selection: Single Language Pair | Smoothing techniques partly handle accurate estimation of translation probabilities when the events occur rarely (indeed it is the main reason for smoothing). |
Using Translation Probability | More specifically, by using translation probabilities , we can rewrite equation (11) and (12) as follow: |
Using Translation Probability | From Table 6, we see that the performance of our approach can be further boosted by using translation probability . |
Using Translation Probability | Using Translation Probability 5 Related Work |