Background | In the hybrid model, given an input sentence, a lattice that consists of word-level and character-level nodes is constructed. |
Background | Word-level nodes, which correspond to |
Background | In other words, we use word-level nodes to identify known words and character-level nodes to identify unknown words. |
Experiments | Since we were interested in finding an optimal combination of word-level and character-level nodes for training, we focused on tuning 7“. |
Policies for correct path selection | Ideally, we need to build a word-character hybrid model that effectively learns the characteristics of unknown words (with character-level nodes) as well as those of known words (with word-level nodes). |
Policies for correct path selection | If we select the correct path yt that corresponds to the annotated sentence, it will only consist of word-level nodes that do not allow learning for unknown words. |
Policies for correct path selection | We therefore need to choose character-level nodes as correct nodes instead of word-level nodes for some words. |
Training method | W0 <w0> for word-level W1 <p0> nodes |
Training method | Templates W0—W3 are basic word-level unigram features, where Length(w0) denotes the length of the word wo. |
Training method | Templates B0-B9 are basic word-level bigram features. |
Introduction | Word-level alignments are then drawn based on the tree alignment. |
Introduction | Finally, parallel sentences are assembled from these generated part-of-speech sequences and word-level alignments. |
Introduction | The model is trained using bilingual data with automatically induced word-level alignments, but is tested on purely monolingual data for each language. |
Model | word-level alignments, as observed data. |
Model | We obtain these word-level alignments using GIZA++ (Och and Ney, 2003). |
Model | Finally, word-level alignments are drawn based on the structure of the alignment tree. |
Related Work | Assuming that trees induced over parallel sentences have to exhibit certain structural regularities, Kuhn manually specifies a set of rules for determining when parsing decisions in the two languages are inconsistent with GIZA++ word-level alignments. |
Collaborative Decoding | 0 Word-level system combination (Rosti et al., 2007) of member decoders’ n-best outputs |
Discussion | Word-level system combination (system combination hereafter) (Rosti et al., 2007; He et al., 2008) has been proven to be an effective way to improve machine translation quality by using outputs from multiple systems. |
Experiments | We also implemented the word-level system combination (Rosti et al., 2007) and the hypothesis selection method (Hildebrand and Vogel, 2008). |
Experiments | Word-level Comb 4045/4085 2952/3035 Hypo Selection 40.09/40.50 29.02/29.71 |
Introduction | Most of the work focused on seeking better word alignment for consensus-based confusion network decoding (Matusov et al., 2006) or word-level system combination (He et al., 2008; Ayan et al., 2008). |
Introduction | We also conduct extensive investigations when different settings of co-decoding are applied, and make comparisons with related methods such as word-level system combination of hypothesis selection from multiple n-best lists. |
Abstract | We consider generative and discriminative models for dependency grammar induction that use word-level alignments and a source language parser (English) to constrain the space of possible target trees. |
Approach | A parallel corpus is word-level aligned using an alignment toolkit (Graca et al., 2009) and the source (English) is parsed using a dependency parser (McDonald et al., 2005). |
Introduction | For example, several early works (Yarowsky and Ngai, 2001; Yarowsky et al., 2001; Merlo et al., 2002) demonstrate transfer of shallow processing tools such as part-of-speech taggers and noun-phrase chunkers by using word-level alignment models (Brown et al., 1994; Och and Ney, 2000). |