Conclusion and Future work | The approach also exploits methods from statistical MT ( word alignment ) and therefore integrates techniques from statistical syntactic parsing, MT, and compositional semantics to produce an effective semantic parser. |
Experimental Evaluation | We also evaluated the impact of the word alignment component by replacing Giza++ by gold-standard word alignments manually annotated for the CLANG corpus. |
Experimental Evaluation | The results consistently showed that compared to using gold-standard word alignment , Giza++ produced lower semantic parsing accuracy when given very little training data, but similar or better results when given sufficient training data (> 160 examples). |
Experimental Evaluation | This suggests that, given sufficient data, Giza++ can produce effective word alignments, and that imperfect word alignments do not seriously impair our semantic parsers since the disambiguation model evaluates multiple possible interpretations of ambiguous words. |
Introduction | The learning system first employs a word alignment method from statistical machine translation (GIZA++ (Och and Ney, 2003)) to acquire a semantic lexicon that maps words to logical predicates. |
Learning Semantic Knowledge | We use an approach based on Wong and Mooney (2006), which constructs word alignments between NL sentences and their MRs. |
Learning Semantic Knowledge | Normally, word alignment is used in statistical machine translation to match words in one NL to words in another; here it is used to align words with predicates based on a ”parallel corpus” of NL sentences and MRs. We assume that each word alignment defines a possible mapping from words to predicates for building a SAPT and semantic derivation which compose the correct MR. A semantic lexicon and composition rules are then extracted directly from the |
Learning Semantic Knowledge | Generation of word alignments for each training example proceeds as follows. |
Learning a Disambiguation Model | Here, unique word alignments are not required, and alternative interpretations compete for the best semantic parse. |
Abstract | In this paper we present a confidence measure for word alignment based on the posterior probability of alignment links. |
Abstract | Based on these measures, we improve the alignment quality by selecting high confidence sentence alignments and alignment links from multiple word alignments of the same sentence pair. |
Abstract | Additionally, we remove low confidence alignment links from the word alignment of a bilingual training corpus, which increases the alignment F-score, improves Chinese-English and Arabic-English translation quality and significantly reduces the phrase translation table size. |
Introduction | Many MT systems, such as statistical phrase-based and syntax-based systems, learn phrase translation pairs or translation rules from large amount of bilingual data with word alignment . |
Introduction | The quality of the parallel data and the word alignment have significant impacts on the learned translation models and ultimately the quality of translation output. |
Introduction | Given the huge amount of bilingual training data, word alignments are automatically generated using various algorithms ((Brown et al., 1994), (Vogel et al., 1996) |
Conclusions and Future Work | Although the characteristic of more sensitiveness to word alignment error enables SncTSSG to capture the additional noncontiguous language phenomenon, it also induces many redundant noncontiguous rules. |
Experiments | We base on the m-to-n word alignments dumped by GIZA++ to extract the tree sequence pairs. |
Experiments | The STSSG or any contiguous translational equivalence based model is unable to attain the corresponding target output for this idiom word via the noncontiguous word alignment and consider it as an out-of—vocabulary (OOV). |
Experiments | On the contrary, the SncTSSG based model can capture the noncontiguous tree sequence pair consistent with the word alignment and further provide a reasonable target translation. |
Introduction | (2006) statistically report that discontinuities are very useful for translational equivalence analysis using binary branching structures under word alignment and parse tree constraints. |
Tree Sequence Pair Extraction | Data structure: p[j1, jg] to store tree sequence pairs covering source SpanU1,j2] 1: foreach source span [j1,j2], do 2: find a target span [i1,i2] with minimal length covering all the target words aligned to [j1,j2] 3: if all the target words in [i1,i2] are aligned with source words only in [j1,j2], then 4: Pair each source tree sequence covering [j1,j2] with those in target covering [i1,i2] as a contiguous tree sequence pair |
Tree Sequence Pair Extraction | 7: create sub-span set s([i1,i2]) to cover all the target words aligned to [j1,j2] |
Tree Sequence Pair Extraction | 13: find a source span [1'], jg] with minimal length covering all the source words aligned to [i1,i2] |
Experiments | We obtained word alignments of the training data by first running GIZA++ (Och and Ney, 2003) and then applying the refinement rule “grow-diag-final-and” (Koehn et al., 2003). |
Introduction | The solid lines denote hyperedges and the dashed lines denote word alignments . |
Model | The solid lines denote hyperedges and the dashed lines denote word alignments between the two forests. |
Rule Extraction | By constructing a theory that gives formal semantics to word alignments , Galley et al. |
Rule Extraction | Their GHKM procedure draws connections among word alignments , derivations, and rules. |
Rule Extraction | They first identify the tree nodes that subsume tree-string pairs consistent with word alignments and then extract rules from these nodes. |
Approach | First, parser and word alignment errors cause much of the transferred information to be wrong. |
Experiments | For both corpora, we performed word alignments with the open source PostCAT (Graca et al., 2009) toolkit. |
Experiments | Preliminary experiments showed that our word alignments were not always appropriate for syntactic transfer, even when they were correct for translation. |
Introduction | Nevertheless, several challenges to accurate training and evaluation from aligned bitext remain: (1) partial word alignment due to non-literal or distant translation; (2) errors in word alignments and source language parses, (3) grammatical annotation choices that differ across languages and linguistic theories (e. g., how to analyze auxiliary verbs, conjunctions). |
Related Work | (2005) found that transferring dependencies directly was not sufficient to get a parser with reasonable performance, even when both the source language parses and the word alignments are performed by hand. |
Experiments | To obtain word-level alignments, we ran GIZA++ (Och and Ney, 2000) on the remaining corpus in both directions, and applied the “grow-diag-final” refinement rule (Koehn et al., 2005) to produce the final many-to-many word alignments . |
Introduction | According to the word alignments , we define bracketable and unbracketable instances. |
The Acquisition of Bracketing Instances | Let c and e be the source sentence and the target sentence, W be the word alignment between them, T be the parse tree of c. We define a binary bracketing instance as a tuple (b,7'(cinj),7'(Cj+1nk),7'(cink)> where b E {bracketable,unbracketable}, cinj and cj+1nlc are two neighboring source phrases and 7'(T, 3) (7(3) for short) is a subtree function which returns the minimal subtree covering the source sequence 3 from the source parse tree T. Note that 7(cz-nk) includes both 7(cz-nj) and flog-+1.19). |
The Acquisition of Bracketing Instances | 1: Input: sentence pair (0, e), the parse tree T of c and the word alignment W between c and e 2: QR :2 (Z) 3: for each (i,j, k) E cdo 4: if There exist a target phrase can” aligned to Cinj and ep,,q aligned to Cj+1,_k; then |
Experimental Setup | Additionally, we compute a word alignment score to investigate the extent to which the input text is used to construct correct analyses. |
Results | The word alignment results from Table 2 indicate that the learners are mapping the correct words to actions for documents that are successfully completed. |
Results | For example, the models that perform best in the Windows domain achieve nearly perfect word alignment scores. |
Experiments | Many of the remaining errors are due to the garbage collection phenomenon familiar from word alignment models (Moore, 2004; Liang et al., 2006). |
Generative Model | The alignment aspect of our model is similar to the HMM model for word alignment (Ney and Vogel, 1996). |
Generative Model | (2008) perform joint segmentation and word alignment for machine translation, but the nature of that task is different from ours. |
Pivot Methods for Phrase-based SMT | (2003), there are two important elements in the lexical weight: word alignment information a in a phrase pair (5, f) and lexical translation probability w (s |
Pivot Methods for Phrase-based SMT | Let a1 and a2 represent the word alignment information inside the phrase pairs (5,13) and (13, 2?) |
Pivot Methods for Phrase-based SMT | Based on the the induced word alignment information, we estimate the co-occurring frequencies of word pairs directly from the induced phrase |
Experiment | GIZA++ (Och and Ney, 2003) and the heuristics “grow-diag-final-and” are used to generate m-to-n word alignments . |
Experiment | This is mainly because tree sequence rules are only sensitive to word alignment while tree rules, even extracted from a forest (like in FT2S), are also limited by syntax according to grammar parsing rules. |
Forest-based tree sequence to string model | Given a source forest F and target translation T S as well as word alignment A, our translation model is formulated as: |