Conclusions and future work | The CRF segmentation provides a list of segmentations : A : A1, A2, ..., AN, with conditional probabilities P(A1|S), P(A2|S), ..., P(AN|S). |
Conclusions and future work | If we continue performing the CRF conversion to cover all N (N 2 k) segmentations , eventually we will get: |
Joint optimization and its fast decoding algorithm | The joint optimization considers all the segmentation possibilities and sums the probability over all the alternative segmentations which generate the same output. |
Joint optimization and its fast decoding algorithm | However, exact inference by listing all possible candidates explicitly and summing over all possible segmentations is intractable, because of the exponential computation complexity with the source word’s increasing length. |
Joint optimization and its fast decoding algorithm | In the segmentation step, the number of possible segmentations is 2N , where N is the length of the source word and 2 is the size of the tagging set. |
Alignment | However, (DeNero et al., 2006) experienced similar over-fitting with short phrases due to the fact that the same word sequence can be segmented in different ways, leading to specific segmentations being learned for specific training sentence pairs. |
Introduction | Ideally, we would produce all possible segmentations and alignments during training. |
Related Work | When given a bilingual sentence pair, we can usually assume there are a number of equally correct phrase segmentations and corresponding alignments. |
Related Work | As a result of this ambiguity, different segmentations are recruited for different examples during training. |