Abstract | Monolingual translation probabilities have recently been introduced in retrieval models to solve the lexical gap problem. |
Abstract | We also show that the monolingual translation probabilities obtained (i) are comparable to traditional semantic relatedness measures and (ii) significantly improve the results over the query likelihood and the vector-space model for answer finding. |
Introduction | To do so, we compare translation probabilities with concept vector based semantic relatedness measures with respect to human relatedness rankings for reference word pairs. |
Introduction | Section 3 presents the monolingual parallel datasets we used for obtaining monolingual translation probabilities . |
Parallel Datasets | We used the GIZA++ SMT Toolkit4 (Och and Ney, 2003) in order to obtain word-to-word translation probabilities from the parallel datasets described above. |
Parallel Datasets | The first method consists in a linear combination of the word-to-word translation probabilities after training: |
Related Work | The main drawback lies in the availability of suitable training data for the translation probabilities . |
Related Work | The rationale behind translation-based retrieval models is that monolingual translation probabilities encode some form of semantic knowledge. |
Related Work | While classical measures of semantic relatedness have been extensively studied and compared, based on comparisons with human relatedness judgements or word-choice problems, there is no comparable intrinsic study of the relatedness measures obtained through word translation probabilities . |
Alignment Link Confidence Measure | which is defined as the word translation probability of the aligned word pair divided by the sum of the translation probabilities over all the target words in the sentence. |
Alignment Link Confidence Measure | Intuitively, the above link confidence definition compares the lexical translation probability of the aligned word pair with the translation probabilities of all the target words given the source word. |
Sentence Alignment Confidence Measure | 2 82“ ' > 2.1/mans) ( ) When computing the source-to-target alignment posterior probability, the numerator is the sentence translation probability calculated according to the given alignment A: |
Sentence Alignment Confidence Measure | It is the product of lexical translation probabilities for the aligned word pairs. |
Sentence Alignment Confidence Measure | The denominator is the sentence translation probability summing over all possible alignments, which can be calculated similar to IBM Model 1 in (Brown et al., 1994) |
Introduction | It multiples corresponding translation probabilities and lexical weights in source-pivot and pivot-target translation models to induce a new source-target phrase table. |
Pivot Methods for Phrase-based SMT | Based on these two models, we induce a source-target translation model, in which two important elements need to be induced: phrase translation probability and lexical weight. |
Pivot Methods for Phrase-based SMT | Phrase Translation Probability We induce the phrase translation probability by assuming the independence between the source and target phrases when given the pivot phrase. |
Pivot Methods for Phrase-based SMT | (2003), there are two important elements in the lexical weight: word alignment information a in a phrase pair (5, f) and lexical translation probability w (s |
Sentence Selection: Single Language Pair | Additionally phrase translation probabilities need to be estimated accurately, which means sentences that offer phrases whose occurrences in the corpus were rare are informative. |
Sentence Selection: Single Language Pair | estimating accurately the phrase translation probabilities . |
Sentence Selection: Single Language Pair | Smoothing techniques partly handle accurate estimation of translation probabilities when the events occur rarely (indeed it is the main reason for smoothing). |