Jointly Optimizing a Two-Step Conditional Random Field Model for Machine Transliteration and Its Fast Decoding Algorithm

This paper presents a joint optimization method of a two-step conditional random field (CRF) model for machine transliteration and a fast decoding algorithm for the proposed method.

There are more than 6000 languages in the world and 10 languages of them have more than 100 million native speakers.

2.1 CRF introduction

3.1 Joint optimization

The J SCM represents how the source words and target names are generated simultaneously (Li et al., 2004):

We use several metrics from (Li et al., 2009) to measure the performance of our system.

In this paper we have presented our new joint optimization method for a two-step CRF model and its fast decoding algorithm.

Appears in 34 sentences as: CRF (41) CRF” (1)

In *Jointly Optimizing a Two-Step Conditional Random Field Model for Machine Transliteration and Its Fast Decoding Algorithm*

- This paper presents a joint optimization method of a two-step conditional random field ( CRF ) model for machine transliteration and a fast decoding algorithm for the proposed method.Page 1, “Abstract”
- In the two-step CRF model, the first CRF segments an input word into chunks and the second one converts each chunk into one unit in the target language.Page 1, “Abstract”
- In the “NEWS 2009 Machine Transliteration Shared Task”, a new two-step CRF model for transliteration task has been proposed (Yang et al., 2009), in which the first step is to segment a word in the source language into character chunks and the second step is to perform a context-dependent mapping from each chunk into one written unit in the target language.Page 2, “Introduction”
- In this paper, we propose to jointly optimize a two-step CRF model.Page 2, “Introduction”
- The rest of this paper is organized as follows: Section 2 explains the two-step CRF method, followed by Section 3 which describes our joint optimization method and its fast decoding algorithm; Section 4 introduces a rapid implementation of a J SCM system in the weighted finite state transducer (WFST) framework; and the last section reports the experimental results and conclusions.Page 2, “Introduction”
- 2.1 CRF introductionPage 2, “Two-step CRF method”
- CT. CRF training is usually performed through the L-BFGS algorithm (Wal-lach, 2002) and decoding is performed by the Viterbi algorithm.Page 2, “Two-step CRF method”
- We formalize machine transliteration as a CRF tagging problem, as shown in Figure 2.Page 2, “Two-step CRF method”
- Figure 2: An pictorial description of a CRF segmenter and a CRF converterPage 2, “Two-step CRF method”
- 2.2 CRF segmenterPage 2, “Two-step CRF method”
- In the CRF , a feature function describes a co-occurrence relation, and it is usually a binary function, taking the value 1 when both an observation and a label transition are observed.Page 2, “Two-step CRF method”

See all papers in *Proc. ACL 2010* that mention CRF.

See all papers in *Proc. ACL* that mention CRF.

Back to top.

Appears in 7 sentences as: segmentations (7)

In *Jointly Optimizing a Two-Step Conditional Random Field Model for Machine Transliteration and Its Fast Decoding Algorithm*

- The joint optimization considers all the segmentation possibilities and sums the probability over all the alternative segmentations which generate the same output.Page 3, “Joint optimization and its fast decoding algorithm”
- However, exact inference by listing all possible candidates explicitly and summing over all possible segmentations is intractable, because of the exponential computation complexity with the source word’s increasing length.Page 3, “Joint optimization and its fast decoding algorithm”
- In the segmentation step, the number of possible segmentations is 2N , where N is the length of the source word and 2 is the size of the tagging set.Page 3, “Joint optimization and its fast decoding algorithm”
- Is it really necessary to perform the second CRF for all the segmentations ?Page 3, “Joint optimization and its fast decoding algorithm”
- If we can guarantee that, even performing the 2nd CRF decoding for all the remaining segmentations Ak+1, A19”, ..., AN, the top 1 candidate does not change, then we can stop decoding.Page 3, “Joint optimization and its fast decoding algorithm”
- The CRF segmentation provides a list of segmentations : A : A1, A2, ..., AN, with conditional probabilities P(A1|S), P(A2|S), ..., P(AN|S).Page 5, “Conclusions and future work”
- If we continue performing the CRF conversion to cover all N (N 2 k) segmentations , eventually we will get:Page 5, “Conclusions and future work”

See all papers in *Proc. ACL 2010* that mention segmentations.

See all papers in *Proc. ACL* that mention segmentations.

Back to top.

Appears in 4 sentences as: conditional probabilities (2) conditional probability (2)

In *Jointly Optimizing a Two-Step Conditional Random Field Model for Machine Transliteration and Its Fast Decoding Algorithm*

- From the two-step CRF model we get the conditional probability PC RF (T |S ) and from the JSCM we get the joint probability P(S, T).Page 4, “Experiments”
- The conditional probability of PJSC M (T |S) can be calculuated as follows:Page 4, “Experiments”
- The CRF segmentation provides a list of segmentations: A : A1, A2, ..., AN, with conditional probabilities P(A1|S), P(A2|S), ..., P(AN|S).Page 5, “Conclusions and future work”
- The CRF conversion, given a segmentation Ai, provides a list of transliteration output T1,T2, ...,TM, with conditional probabilities P(T1|S,Ai),P(T2|S,Az-), ...,P(TM|S,Az-).Page 5, “Conclusions and future work”

See all papers in *Proc. ACL 2010* that mention conditional probabilities.

See all papers in *Proc. ACL* that mention conditional probabilities.

Back to top.