Index of papers in Proc. ACL that mention
  • translation probabilities
Bernhard, Delphine and Gurevych, Iryna
Abstract
Monolingual translation probabilities have recently been introduced in retrieval models to solve the lexical gap problem.
Abstract
We also show that the monolingual translation probabilities obtained (i) are comparable to traditional semantic relatedness measures and (ii) significantly improve the results over the query likelihood and the vector-space model for answer finding.
Introduction
To do so, we compare translation probabilities with concept vector based semantic relatedness measures with respect to human relatedness rankings for reference word pairs.
Introduction
Section 3 presents the monolingual parallel datasets we used for obtaining monolingual translation probabilities .
Parallel Datasets
We used the GIZA++ SMT Toolkit4 (Och and Ney, 2003) in order to obtain word-to-word translation probabilities from the parallel datasets described above.
Parallel Datasets
The first method consists in a linear combination of the word-to-word translation probabilities after training:
Related Work
The main drawback lies in the availability of suitable training data for the translation probabilities .
Related Work
The rationale behind translation-based retrieval models is that monolingual translation probabilities encode some form of semantic knowledge.
Related Work
While classical measures of semantic relatedness have been extensively studied and compared, based on comparisons with human relatedness judgements or word-choice problems, there is no comparable intrinsic study of the relatedness measures obtained through word translation probabilities .
translation probabilities is mentioned in 23 sentences in this paper.
Topics mentioned in this paper:
Zhang, Jiajun and Zong, Chengqing
Phrase Pair Refinement and Parameterization
This refinement can be applied before finding the phrase pair with maximum probability (Line 12 in Figure 2) so that the duplicate words do not affect the calculation of translation probability of phrase pair.
Phrase Pair Refinement and Parameterization
5.2 Translation Probability Estimation
Phrase Pair Refinement and Parameterization
It is well known that in the phrase-based SMT there are four translation probabilities and the reordering probability for each phrase pair.
Probabilistic Bilingual Lexicon Acquisition
After using the log-likelihood-ratios algorithm2, we obtain a probabilistic bilingual lexicon with bidirectional translation probabilities from the out-of-domain data.
Probabilistic Bilingual Lexicon Acquisition
It should be noted that there is no translation probability in this lexicon.
Probabilistic Bilingual Lexicon Acquisition
In order to assign probabilities to each entry, we apply the Corpus Translation Probability which used in (Wu et al., 2008): given an in-domain source language monolingual data, we translate this data with the phrase-based model trained on the out-of-domain News data, the in-domain lexicon and the in-domain target language monolingual data (for language model estimation).
translation probabilities is mentioned in 13 sentences in this paper.
Topics mentioned in this paper:
Wuebker, Joern and Mauser, Arne and Ney, Hermann
Abstract
Several attempts have been made to learn phrase translation probabilities for phrase-based statistical machine translation that go beyond pure counting of phrases in word-aligned training data.
Alignment
Therefore those long phrases are trained to fit only a few sentence pairs, strongly overestimating their translation probabilities and failing to generalize.
Alignment
When using leaving-one-out, we modify the phrase translation probabilities for each sentence pair.
Introduction
The phrase translation table, which contains the bilingual phrase pairs and the corresponding translation probabilities , is one of the main components of an SMT system.
Introduction
In this work, we propose to directly train our phrase models by applying a forced alignment procedure where we use the decoder to find a phrase alignment between source and target sentences of the training data and then updating phrase translation probabilities based on this alignment.
Phrase Model Training
We have developed two different models for phrase translation probabilities which make use of the force-aligned training data.
Phrase Model Training
The simplest of our generative phrase models estimates phrase translation probabilities by their relative frequencies in the Viterbi alignment of the data, similar to the heuristic model but with counts from the phrase-aligned data produced in training rather than computed on the basis of a word alignment.
Phrase Model Training
The translation probability of a phrase pair (1216) is estimated as
Related Work
That in turn leads to over-fitting which shows in overly deter-minized estimates of the phrase translation probabilities .
Related Work
They show that by applying a prior distribution over the phrase translation probabilities they can prevent over-fitting.
translation probabilities is mentioned in 15 sentences in this paper.
Topics mentioned in this paper:
Lo, Chi-kiu and Beloucif, Meriem and Saers, Markus and Wu, Dekai
Abstract
We show that cross-lingual XMEANT outperforms monolingual MEANT by (l) replacing the monolingual context vector model in MEANT with simple translation probabilities , and (2) incorporating bracketing ITG constraints.
Conclusion
This is (l) accomplished by replacing monolingual MEANT’s context vector model with simple translation probabilities when computing similarities of semantic role fillers, and (2) further improved by incorporating BITG constraints for aligning the tokens in semantic role fillers.
Introduction
XMEANT is obtained by (1) using simple lexical translation probabilities , instead of the monolingual context vector model used in MEANT for computing the semantic role fillers similarities, and (2) incorporating bracketing ITG constrains for word alignment within the semantic role fillers.
Introduction
We therefore propose XMEANT, a cross-lingual MT evaluation metric, that modifies MEANT using (1) simple translation probabilities (in our experiments,
Related Work
Apply the maximum weighted bipartite matching algorithm to align the semantic frames between the foreign input and MT output according to the lexical translation probabilities of the predicates.
Related Work
For each pair of the aligned frames, apply the maximum weighted bipartite matching algorithm to align the arguments between the foreign input and MT output according to the aggregated phrasal translation probabilities of the role fillers.
XMEANT: a cross-lingual MEANT
But whereas MEANT measures lexical similarity using a monolingual context vector model, XMEANT instead substitutes simple cross-lingual lexical translation probabilities .
XMEANT: a cross-lingual MEANT
To aggregate individual lexical translation probabilities into phrasal similarities between cross-lingual semantic role fillers, we compared two natural approaches to generalizing MEANT’s method of comparing semantic parses, as described below.
XMEANT: a cross-lingual MEANT
ing lexical translation probabilities within semantic role filler phrases.
translation probabilities is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Zhu, Conghui and Watanabe, Taro and Sumita, Eiichiro and Zhao, Tiejun
Conclusion and Future Work
cinct phrase table with more accurate translation probabilities .
Conclusion and Future Work
In future work, we will also introduce incremental learning for phase pair extraction inside a domain, which means using the current translation probabilities already obtained as the base measure of sampling parameters for the upcoming domain.
Experiment
Pialign-adaptive: Alignment and phrase pairs extraction are same to Pialign-batch, while translation probabilities are estimated by the adaptive method with monolingual topic information (Su et al., 2012).
Experiment
The method established the relationship between the out-of-domain bilingual corpus and in-domain monolingual corpora via topic distribution to estimate the translation probability .
Experiment
In the phrase table combination process, the translation probability of each phrase pair is estimated by the Hier-combin and the other features are also linearly combined by averaging
Hierarchical Phrase Table Combination
For example in Figure 2 (a), the ith phrase pair (6,, fi> appears only in the domain 1 and domain 2, so its translation probability can be calculated by substituting Equation (3) with Equation (2):
Hierarchical Phrase Table Combination
Algorithm 1 Translation Probabilities Estimation Input: , ti , ijase, 03, T9, dj and sj Output: The translation probabilities for each pair 1: for all phrase pair (6,, fi> do 2: Initialize the P((ei, = 0 and w,- = 1 3: for all domain (Ej, Fj> such that l g j g
Phrase Pair Extraction with Unsupervised Phrasal ITGs
Translation probabilities of ITG phrasal align-
translation probabilities is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Liu, Le and Hong, Yu and Liu, Hao and Wang, Xing and Yao, Jianmin
Experiments
Additionally, we adopt GIZA++ to get the word alignment of in-domain parallel data and form the word translation probability table.
Experiments
This table will be used to compute the translation probability of general-domain sentence pairs.
Introduction
Meanwhile, the translation model measures the translation probability of sentence pair, being used to verify the parallelism of the selected domain-relevant bitext.
Related Work
The reason is that a sentence pair contains a source language sentence and a target language sentence, while the existing methods are incapable of evaluating the mutual translation probability of sentence pair in the target domain.
Training Data Selection Methods
However, in this paper, we adopt the translation model to evaluate the translation probability of sentence pair and develop a simple but effective variant of translation model to rank the sentence pairs in the general-domain corpus.
Training Data Selection Methods
Where P(e| f) is the translation model, which is IBM Model 1 in this paper, it represents the translation probability of target language sentence e conditioned on source language sentence f. le and If are the number of words in sentence 6 and f respectively.
Training Data Selection Methods
t(ej|fi) is the translation probability of word 61- conditioned on word fiand is estimated from the small in-domain parallel data.
translation probabilities is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
He, Xiaodong and Deng, Li
Abstract
For training, we derive growth transformations for phrase and lexicon translation probabilities to iteratively improve the objective.
Abstract
For effective optimization, we derive updating formulas of growth transformation (GT) for phrase and lexicon translation probabilities .
Abstract
Then the phrase translation probabilities were estimated based on the phrase alignments.
translation probabilities is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Sun, Jun and Zhang, Min and Tan, Chew Lim
Substructure Spaces for BTKs
The lexical features with directions are defined as conditional feature functions based on the conditional lexical translation probabilities .
Substructure Spaces for BTKs
where P(v|u) refers to the lexical translation probability from the source word u to the target word 12 within the subtree spans, while P(u|v) refers to that from target to source; in(S) refers to the word set for the internal span of the source subtree 5, while in(T) refers to that of the target subtree T.
Substructure Spaces for BTKs
Internal—External Lexical Features: These features are motivated by the fact that lexical translation probabilities within the translational equivalence tend to be high, and that of the non-
translation probabilities is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Huang, Fei
Alignment Link Confidence Measure
which is defined as the word translation probability of the aligned word pair divided by the sum of the translation probabilities over all the target words in the sentence.
Alignment Link Confidence Measure
Intuitively, the above link confidence definition compares the lexical translation probability of the aligned word pair with the translation probabilities of all the target words given the source word.
Sentence Alignment Confidence Measure
2 82“ ' > 2.1/mans) ( ) When computing the source-to-target alignment posterior probability, the numerator is the sentence translation probability calculated according to the given alignment A:
Sentence Alignment Confidence Measure
It is the product of lexical translation probabilities for the aligned word pairs.
Sentence Alignment Confidence Measure
The denominator is the sentence translation probability summing over all possible alignments, which can be calculated similar to IBM Model 1 in (Brown et al., 1994)
translation probabilities is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Wu, Hua and Wang, Haifeng
Introduction
It multiples corresponding translation probabilities and lexical weights in source-pivot and pivot-target translation models to induce a new source-target phrase table.
Pivot Methods for Phrase-based SMT
Based on these two models, we induce a source-target translation model, in which two important elements need to be induced: phrase translation probability and lexical weight.
Pivot Methods for Phrase-based SMT
Phrase Translation Probability We induce the phrase translation probability by assuming the independence between the source and target phrases when given the pivot phrase.
Pivot Methods for Phrase-based SMT
(2003), there are two important elements in the lexical weight: word alignment information a in a phrase pair (5, f) and lexical translation probability w (s
translation probabilities is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Wu, Xianchao and Matsuzaki, Takuya and Tsujii, Jun'ichi
Experiments
Here, s/t represent the source/target part of a rule in PTT, TRS, or PRS; and are translation probabilities and lexical weights of rules from PTT, TRS, and PRS.
Fine-grained rule extraction
Maximum likelihood estimation is used to calculate the translation probabilities of each rule.
Introduction
Table l: Bidirectional translation probabilities of rules, denoted in the brackets, change when voice is attached to “killed”.
Introduction
As will be testified by our experiments, we argue that the simple POS/phrasal tags are too coarse to reflect the accurate translation probabilities of the translation rules.
Related Work
(2006) constructed a derivation-forest, in which composed rules were generated, unaligned words of foreign language were consistently attached, and the translation probabilities of rules were estimated by using Expectation-Maximization (EM) (Dempster et al., 1977) training.
translation probabilities is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Lu, Bin and Tan, Chenhao and Cardie, Claire and K. Tsou, Benjamin
A Joint Model with Unlabeled Parallel Text
The intuition here is that if the translation probability of two sentences is high, the probability that they have the same sentiment label should be high as well.
A Joint Model with Unlabeled Parallel Text
where p(a,-) is the translation probability of the i-th sentence pair in U :2 37’ is the opposite of y’; the first term models the probability that x}, and x5, have the same label; and the second term models the probability that they have different labels.
Experimental Setup 4.1 Data Sets and Preprocessing
Because sentence pairs in the ISI corpus are quite noisy, we rely on Giza++ (Och and Ney, 2003) to obtain a new translation probability for each sentence pair, and select the 100,000 pairs with the highest translation probabilities.5
Experimental Setup 4.1 Data Sets and Preprocessing
We first obtain translation probabilities for both directions (i.e.
Results and Analysis
Preliminary experiments showed that Equation 5 does not significantly improve the performance in our case, which is reasonable since we choose only sentence pairs with the highest translation probabilities to be our unlabeled data (see Section 4.1).
translation probabilities is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Zhang, Jiajun and Liu, Shujie and Li, Mu and Zhou, Ming and Zong, Chengqing
Experiments
Besides using the semantic similarities to prune the phrase table, we also employ them as two informative features like the phrase translation probability to guide translation hypotheses selection during decoding.
Experiments
Typically, four translation probabilities are adopted in the phrase-based SMT, including phrase translation probability and lexical weights in both directions.
Experiments
The phrase translation probability is based on co-occurrence statistics and the lexical weights consider the phrase as bag-of-W0rds.
translation probabilities is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Xiao, Xinyan and Xiong, Deyi and Zhang, Min and Liu, Qun and Lin, Shouxun
Decoding
In the topic-specific lexicon translation model, given a source document, it first calculates the topic-specific translation probability by normalizing the entire lexicon translation table, and then adapts the lexical weights of rules correspondingly.
Estimation
The process of rule-topic distribution estimation is analogous to the traditional estimation of rule translation probability (Chiang, 2007).
Experiments
The lexicon translation probability is adapted by:
Introduction
Such models first estimate word translation probabilities conditioned on topics, and then adapt lexical weights of phrases
translation probabilities is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Deng, Yonggang and Xu, Jia and Gao, Yuqing
Experimental Results
In estimating phrase translation probability , we use accumulated HMM-based phrase pair posteriors
Experimental Results
as their ‘soft’ frequencies and then the final translation probability is the relative frequency.
Introduction
Using relative frequency as translation probability is a common practice to measure goodness of a phrase pair.
Introduction
The translation probability can also be discriminatively trained such as in Tillmann and Zhang (2006).
translation probabilities is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Sajjad, Hassan and Fraser, Alexander and Schmid, Helmut
Experiments
In this section, we propose a method to modify the translation probabilities of the t-table by interpolating the translation counts with transliteration counts.
Experiments
We combine the transliteration probabilities with the translation probabilities of the IBM models and the HMM model.
Experiments
The normal translation probability pta( f |e) of the word alignment models is computed with relative frequency estimates.
translation probabilities is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Lu, Shixiang and Chen, Zhenbiao and Xu, Bo
Input Features for DNN Feature Learning
Following (Maskey and Zhou, 2012), we use the following 4 phrase features of each phrase pair (Koehn et al., 2003) in the phrase table as the first type of input features, bidirectional phrase translation probability (P (6| f) and P (f |e)), bidirectional lexical weighting (Lem(e|f) and Lex(f|e)),
Input Features for DNN Feature Learning
where, p(ej|fi) and p(fi|ej) represents bidirectional word translation probability .
Introduction
First, the input original features for the DBN feature learning are too simple, the limited 4 phrase features of each phrase pair, such as bidirectional phrase translation probability and bidirectional lexical weighting (Koehn et al., 2003), which are a bottleneck for learning effective feature representation.
Related Work
(2012) improved translation quality of n-gram translation model by using a bilingual neural LM, where translation probabilities are estimated using a continuous representation of translation units in lieu of standard discrete representations.
translation probabilities is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Li, Junhui and Tu, Zhaopeng and Zhou, Guodong and van Genabith, Josef
Head-Driven HPB Translation Model
o Phd_h7- and Phde- (3|t), translation probabilities for HD—HRs;
Head-Driven HPB Translation Model
Plea, and Plea, (3|t), lexical translation probabilities for HD—HRs;
Head-Driven HPB Translation Model
Ptyhcbhr = escp (—1), rule penalty for HD—HRs; PW (t|s), translation probability for NRRs;
translation probabilities is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Huang, Fei and Xu, Jian-Ming and Ittycheriah, Abraham and Roukos, Salim
Static MT Quality Estimation
0 17 decoding features, including phrase translation probabilities (source-to-target and target-to-source), word translation probabilities (also in both directions), maxent prob-abilitiesl, word count, phrase count, distor-
Static MT Quality Estimation
1The maxent probability is the translation probability
Static MT Quality Estimation
0 The average translation probability of the phrase translation pairs in the final translation, which provides the overall translation quality on the phrase level.
translation probabilities is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Cui, Lei and Zhang, Dongdong and Liu, Shujie and Chen, Qiming and Li, Mu and Zhou, Ming and Yang, Muyun
Experiments
Although the translation probability of “send X” is much higher, it is inappropriate in this context since it is usually used in IT texts.
Topic Similarity Model with Neural Network
where the translation probability is given by:
Topic Similarity Model with Neural Network
Standard features: Translation model, including translation probabilities and lexical weights for both directions (4 features), 5-gram language model (1 feature), word count (1 feature), phrase count (1 feature), NULL penalty (1 feature), number of hierarchical rules used (1 feature).
translation probabilities is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Yang, Nan and Liu, Shujie and Li, Mu and Zhou, Ming and Yu, Nenghai
DNN for word alignment
where Plegc is the lexical translation probability and Pd is the jump distance distortion probability.
DNN for word alignment
In the classic HMM word alignment model, context is not considered in the lexical translation probability .
Training
our model from raw sentence pairs, they are too computational demanding as the lexical translation probabilities must be computed from neural networks.
translation probabilities is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Sennrich, Rico and Schwenk, Holger and Aransa, Walid
Related Work
In principle, our architecture can support all mixture operations that (Razmara et al., 2012) describe, plus additional ones such as forms of instance weighting, which are not possible after the translation probabilities have been computed.
Translation Model Architecture
Traditionally, the phrase translation probabilities p(§|f) and p(f|§) are estimated through un-smoothed maximum likelihood estimation (MLE).
Translation Model Architecture
The word translation probabilities w (t,- | 33) are de-
translation probabilities is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Haffari, Gholamreza and Sarkar, Anoop
Sentence Selection: Single Language Pair
Additionally phrase translation probabilities need to be estimated accurately, which means sentences that offer phrases whose occurrences in the corpus were rare are informative.
Sentence Selection: Single Language Pair
estimating accurately the phrase translation probabilities .
Sentence Selection: Single Language Pair
Smoothing techniques partly handle accurate estimation of translation probabilities when the events occur rarely (indeed it is the main reason for smoothing).
translation probabilities is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Duan, Huizhong and Cao, Yunbo and Lin, Chin-Yew and Yu, Yong
Using Translation Probability
More specifically, by using translation probabilities , we can rewrite equation (11) and (12) as follow:
Using Translation Probability
From Table 6, we see that the performance of our approach can be further boosted by using translation probability .
Using Translation Probability
Using Translation Probability 5 Related Work
translation probabilities is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: