Experimental setup | We test our model on three corpora of bilingual parallel sentences : English-Korean, English-Urdu, and English-Chinese. |
Experimental setup | To obtain leXical alignments between the parallel sentences we employ GIZA++ (Och and Ney, 2003). |
Experimental setup | Then for each pair of parallel sentences we randomly sample an initial alignment tree for the two sampled trees. |
Introduction | Finally, parallel sentences are assembled from these generated part-of-speech sequences and word-level alignments. |
Model | We treat the part-of-speech tag sequences of parallel sentences , as well as their |
Model | Now we describe the stochastic process whereby the observed parallel sentences and their word-level alignments are generated, according to our model. |
Model | Then, each pair of word-aligned parallel sentences is generated through the following process: |
Related Work | Assuming that trees induced over parallel sentences have to exhibit certain structural regularities, Kuhn manually specifies a set of rules for determining when parsing decisions in the two languages are inconsistent with GIZA++ word-level alignments. |
Experimental Results | As we mentioned in Section 2, (Shi et al., 2006) reported that in total they mined 1,069,423 pairs of English-Chinese parallel sentences from bilingual web sites. |
Related Work | As far as we know, there is no publication available on mining parallel sentences directly from bilingual web pages. |
Related Work | (Shi et al., 2006), mined a total of 1,069,423 pairs of English-Chinese parallel sentences . |