Experiments and Results | Statistics of corpora, “Ch” denotes Chinese, “En” denotes English, “Sent.” row is the number of sentence pairs , “word” row is the number of words, |
Experiments and Results | Figure 5 illustrates examples of pseudo-words of one Chinese-to-English sentence pair . |
Experiments and Results | SSP has a strong constraint that all parts of a sentence pair should be aligned, so source sentence and target sentence have same length after merging words into |
Introduction | But computational complexity is prohibitively high for the exponentially large number of decompositions of a sentence pair into phrase pairs. |
Introduction | pressions by monotonically segmenting a given Spanish-English sentence pair into bilingual units, where word aligner is also used. |
Searching for Pseudo-words | X and Y are sentence pair and multi-word pairs respectively in this bilingual scenario. |
Searching for Pseudo-words | Pseudo-word pairs of one sentence pair are such pairs that maximize the sum of Span-pairs’ bilingual sequence significances: K pwpf = ARGMAXZkzlSigspan_pm-rk (6) span—patrl |
Searching for Pseudo-words | Searching for pseudo-word pairs pwpIK is equal to bilingual segmentation of a sentence pair into optimal Span-pairIK. |
Alignment | The idea of forced alignment is to perform a phrase segmentation and alignment of each sentence pair of the training data using the full translation system as in decoding. |
Alignment | Consequently, we can modify Equation 2 to define the best segmentation of a sentence pair as: |
Alignment | The training data that consists of N parallel sentence pairs fn and en for n = l, . |
Introduction | In this method, all phrases of the sentence pair that match constraints given by the alignment are extracted. |
Related Work | When given a bilingual sentence pair , we can usually assume there are a number of equally correct phrase segmentations and corresponding alignments. |
Bilingual subtree constraints | To solve the mapping problems, we use a bilingual corpus, which includes sentence pairs , to automatically generate the mapping rules. |
Bilingual subtree constraints | First, the sentence pairs are parsed by monolingual parsers on both sides. |
Bilingual subtree constraints | Figure 8 shows an example of a processed sentence pair that has tree structures on both sides and word alignment links. |
Experiments | Note that some sentence pairs were removed because they are not one-to-one aligned at the sentence level (Burkett and Klein, 2008; Huang et al., 2009). |
Experiments | Word alignments were generated from the Berkeley Aligner (Liang et al., 2006; DeNero and Klein, 2007) trained on a bilingual corpus having approximately 0.8M sentence pairs . |
Motivation | Suppose that we have an input sentence pair as shown in Figure l, where the source sentence is in English, the target is in Chinese, the dashed undirected links are word alignment links, and the directed links between words indicate that they have a (candidate) dependency relation. |
Collocation Model | The monolingual corpus is first replicated to generate a parallel corpus, where each sentence pair consists of two identical sentences in the same language. |
Experiments on Word Alignment | To investigate the quality of the generated word alignments, we randomly selected a subset from the bilingual corpus as test set, including 500 sentence pairs . |
Experiments on Word Alignment | (11), we also manually labeled a development set including 100 sentence pairs , in the same manner as the test set. |
Improving Statistical Bilingual Word Alignment | According to the BWA method, given a bilingual sentence pair E = e11 and F = fl’” , the optimal |
Improving Statistical Bilingual Word Alignment | Thus, the collocation probability of the alignment sequence of a sentence pair can be calculated according to Eq. |
Improving Statistical Bilingual Word Alignment | model to calculate the word alignment probability of a sentence pair , as shown in Eq. |
Basics of ITG Parsing | Larger and larger span pairs are recursively built until the sentence pair is built. |
Basics of ITG Parsing | Figure 1(a) shows one possible derivation for a toy example sentence pair with three words in each sentence. |
Evaluation | The 491 sentence pairs in this dataset are adapted to our own Chinese word segmentation standard. |
Evaluation | 250 sentence pairs are used as training data and the other 241 are test data. |
The DITG Models | The MERT module for DITG takes alignment F-score of a sentence pair as the performance measure. |
The DITG Models | Given an input sentence pair and the reference annotated alignment, MERT aims to maximize the F-score of DITG-produced alignment. |
The DPDI Framework | Discriminative approaches to word alignment use manually annotated alignment for sentence pairs . |
The DPDI Framework | Discriminative pruning, however, handles not only a sentence pair but every possible span pair. |
Substructure Spaces for BTKs | Chinese I English # of Sentence pair 5000 Avg. |
Substructure Spaces for BTKs | We randomly select 300 bilingual sentence pairs from the Chinese-English FBIS corpus with the length S 30 in both the source and target sides. |
Substructure Spaces for BTKs | The selected plain sentence pairs are further parsed by Stanford parser (Klein and Manning, 2003) on both the English and Chinese sides. |
Evaluation | This corpus consists of 1370 sentence pairs that were manually created from transcribed Broadcast News stories. |
Evaluation | a pattern of compression used many times in the BNC in sentence pairs such as “NPR’s Anne Gar-rels reports” / “Anne Garrels reports”. |
Sentence compression | An example sentence pair , which we use as a running example, is the following: |
The STSG Model | See Figure l for an example of how an STSG with these rules would operate in synchronously generating our example sentence pair . |
Experiments | It contains 239K sentence pairs with about 6.9M/8.9M words in Chi-neseflEnglish. |
Experiments | The alignment matrixes for sentence pairs are generated according to (Liu et al., 2009). |
Projected Classification Instance | Suppose a bilingual sentence pair , composed of a source sentence e and its target translation f. ye is the parse tree of the source sentence. |