Abstract | It enables our model to learn the effect of relative word order among NP candidates as well as to learn the effect of distances from the training data. |
Distortion Model for Phrase-Based SMT | One of the reasons for this difference is the relative word order between words. |
Distortion Model for Phrase-Based SMT | Thus, considering relative word order is important. |
Distortion Model for Phrase-Based SMT | In (d) and (e) in Figure 2, the word (kare) at the CP and the word order between katta and karita are the same. |
Introduction | Estimating appropriate word order in a target language is one of the most difficult problems for statistical machine translation (SMT). |
Introduction | This is particularly true when translating between languages with widely different word orders . |
Introduction | It enables our model to learn the effect of relative word order among NP candidates as well as to learn the effect of distances from the training data. |
Abstract | Previous work has shown using source syntactic trees is an effective way to tackle this problem between two languages with substantial word order difference. |
Abstract | In this work, we further extend this line of exploration and propose a novel but simple approach, which utilizes a ranking model based on word order precedence in the target language to reposition nodes in the syntactic parse tree of a source sentence. |
Integration into SMT system | Integrated method takes the original source sentence 6 as input, and ranking model generates a reordered e’ as a word order ref- |
Integration into SMT system | Essentially they are soft constraints to encourage the decoder to choose translations with word order similar to the prediction of ranking reorder model. |
Introduction | Long-distance word reordering between language pairs with substantial word order difference, such as Japanese with Subject-Object—Verb (SOV) structure and English with Subject-Verb-Object (SVO) structure, is generally viewed beyond the scope of the phrase-based systems discussed above, because of either distortion limits or lack of discriminative features for modeling. |
Introduction | The other is called syntax pre-reordering — an approach that re-positions source words to approximate target language word order as much as possible based on the features from source syntactic parse trees. |
Introduction | the word order in target language. |
Ranking Model Training | For a sentence pair (6, f, a) with syntax tree Te on the source side, we need to determine which reordered tree Té, best represents the word order in target sentence f. For a tree node 75 in T6, if its children align to disjoint target spans, we can simply arrange them in the order of their corresponding target |
Ranking Model Training | Due to this overlapping, it becomes unclear which permutation of “Problem” and “with latter procedure” is a better match of the target phrase; we need a better metric to measure word order similarity between reordered source and target sentences. |
Word Reordering as Syntax Tree Node Ranking | Given a source side parse tree T6, the task of word reordering is to transform Te to T4, so that 6’ can match the word order in target language as much as possible. |
Word Reordering as Syntax Tree Node Ranking | parse tree, we can obtain the same word order of Japanese translation. |
BLEU and PORT | Word ordering measures for MT compare two permutations of the original source-language word sequence: the permutation represented by the sequence of corresponding words in the MT output, and the permutation in the reference. |
BLEU and PORT | (10) and the word ordering measure v are combined in a harmonic mean: PORT: 2 (17) 1/Qmean(N) +1/v“ Here a is a free parameter that is tuned on held-out data. |
Conclusions | PORT incorporates precision, recall, strict brevity penalty and strict redundancy penalty, plus a new word ordering measure v. As an evaluation metric, PORT performed better than BLEU at the system level and the segment level, and it was competitive with or slightly superior to METEOR at the segment level. |
Experiments | Most WMT submissions involve language pairs with similar word order , so the ordering factor v in PORT won’t play a big role. |
Experiments | PORT differs from BLEU partly in modeling long-distance reordering more accurately; English and French have similar word order , but the other two language pairs don’t. |
Experiments | The results in section 3.3 (below) for Qmean, a version of PORT without word ordering factor v, suggest v may be defined suboptimally for French-English. |
Introduction | This expression is then further combined with a new measure of word ordering , v, designed to reflect long-distance as well as short-distance word reordering (BLEU only reflects short-distance reordering). |
Abstract | We extend a syntactic surface realisation system, which can be trained to choose among word order variants, such that the candidate set includes active and passive variants. |
Abstract | This allows us to study the interaction of voice and word order alternations in realistic German corpus data. |
Generation Architecture | An f-structure abstracts away from word order , i.e. |
Introduction | This paper1 presents work on modelling the usage of voice and word order alternations in a free word order language. |
Introduction | Thus it has been demonstrated that for free word order languages like German, word order prediction quality can be improved with carefully designed, linguistically informed models capturing information-structural strategies (Filippova and Strube, 2007; Cahill and Riester, 2009). |
Introduction | Quite obviously, word order is only one of the means at a speaker’s disposal for expressing some content in a contextually appropriate form; we add systematic alternations like the voice alternation (active vs. passive) to the picture. |
Related Work | Interestingly, the properties that have been used to model argument alternations in strict word order languages like English have been identified as factors that influence word order in free word order languages like German, see Filippova and Strube (2007) for a number of pointers. |
Related Work | Cahill and Riester (2009) implement a model for German word order variation that approximates the information status of constituents through morphological features like definiteness, pronominalisation etc. |
Related Work | We are not aware of any corpus-based generation studies investigating how these properties relate to argument alternations in free word order languages. |
Introduction | Previous work on English, a language with relatively fixed word order , has identified factors that contribute to local coherence, such as the grammatical roles associated with the entities. |
Introduction | For instance, freer-word-order languages exhibit word order patterns which are dependent on discourse factors relating to information structure, in addition to the grammatical roles of nominal arguments of the main verb. |
Introduction | We thus expect word order information to be particularly important in these languages in discourse analysis, which includes coherence modelling. |
Compositional distributional semantics | For instance, symmetric operations like vector addition are insensitive to syntactic structure, therefore meaning differences encoded in word order |
Evaluation | (2013) at the TFDS workshop (tfds below) was specifically designed to test compositional methods for their sensitivity to word order and the semantic effect of determiners. |
Evaluation | The foils have high lexical overlap with the targets but very different meanings, due to different determin-ers and/or word order . |
Evaluation | In the tfds task, not surprisingly the add and mult models, lacking determiner representations and being order-insensitive, fail to distinguish between true paraphrases and foils (indeed, for the mult model foils are significantly closer to the targets than the paraphrases, probably because the latter have lower content word overlap than the foils, that often differ in word order and determin-ers only). |
Conclusion | We have presented a data-driven approach for investigating generation architectures that address discourse-level reference and sentence-level syntax and word order . |
Experiments | The error propagation effects that we find in the first and second pipeline architecture clearly show that decisions at the levels of syntax, reference and word order interact, otherwise their predic- |
Experiments | Table 4 shows the performance of the REG module on varying input layers, providing a more detailed analysis of the interaction between RE, syntax and word order . |
Experiments | These results strengthen the evidence from the previous experiment that decisions at the level of syntax, reference and word order are interleaved. |
Generation Systems | REG is carried out prior to surface realization such that the RE component does not have access to surface syntax or word order whereas the SYN component has access to fully specified RE slots. |
Generation Systems | In this case, REG has access to surface syntax without word order but the surface realization is trained and applied on trees with underspecified RE slots. |
Introduction | Our main goal is to investigate how different architectural setups account for interactions between generation decisions at the level of referring expressions (REs), syntax and word order . |
Related Work | (ZarrieB et al., 2012) have recently argued that the good performance of these linguistically motivated word order models, which exploit morpho-syntactic features of noun phrases (i.e. |
Background | The current libraries include analyses of major constituent word order (SOV, SVO, etc), sentential negation, coordination, and yes-no question formation. |
Background | Perhaps the most striking feature of Wambaya is its word order : it is a radically non-configurational language with a second position auxiliary/clitic cluster. |
Background | That is, aside from the constraint that verbal clauses require a clitic cluster (marking subject and object agreement and tense, aspect and mood) in second position, the word order is otherwise free, to the point that noun phrases can be noncontiguous, with head nouns and their modifiers separated by unrelated words. |
Evaluation of Grammar Matrix | The customization-provided Wambaya-specific type definitions for word order , lexical types, and coordination constructions were used for inspiration, but most needed fairly extensive modification. |
Evaluation of Grammar Matrix | This is particularly unsurprising for basic word order, where the closest available option (“free word order” ) was taken, in the absence of a prepackaged analysis of non-configurationality and second-position phenomena. |
Evaluation of Grammar Matrix | This again suggests that lexical v. phrasal amalgamation should be encoded in the libraries, and selected according to the word order pattern of the language. |
Introduction | The LinGO Grammar Matrix (Bender et al., 2002; Bender and Flickinger, 2005; Drellishak and Bender, 2005) is a toolkit for reducing the cost of creating broad-coverage precision grammars by prepackaging both a cross-linguistic core grammar and a series of libraries of analyses of cross-linguistically variable phenomena, such as maj or-constituent word order or question formation. |
Wambaya grammar | 0 Word order: second position clitic cluster, otherwise free word order , discontinuous noun phrases |
Introduction | 1998) are marked (attribute tf a), together with so-called deep word order reflected by the order of nodes in the annotation (attribute deepord). |
Phenomena and Requirements | In the underlying word order , nodes representing the quasi-focus, although they are contextually bound, are placed to the right from their governing node. |
Phenomena and Requirements | The position of rhematizers in the surface word order is quite loose, however they almost always stand right before the eXpressions they rhematize, i.e. |
Phenomena and Requirements | the node representing the rhematizer) is placed as the closest left brother (in the underlying word order ) of the first node of the expression that is in its scope. |
Asymmetries in IS | 6Even if some of the sentences we are learning from are marked in terms of word order , the ratios allow us to still learn the predominant order, since the marked order should occur much less frequently and the ratio will remain low. |
Discussion | Pulman (1997) also uses information about parallelism to predict word order . |
Discussion | and Klabunde (2000) describe a sentence planner for German that annotates the propositional input with discourse-related features in order to determine the focus, and thus influence word order and accentuation. |
Generation Ranking | If we limit ourselves to single sentences, the task for the model is then to choose the string that is closest to the “default” expected word order (i.e. |
Introduction | There are many factors that influence word order , e. g. humanness, definiteness, linear order of grammatical functions, givenness, focus, constituent weight. |
Introduction | It is common knowledge that information status1 (henceforth, IS) has a strong influence on syntax and word order ; for instance, in inversions, where the subject follows some preposed element, Birner (1994) reports that the preposed element must not be newer in the discourse than the subject. |
Introduction | We believe, however, that despite this shortcoming, we can still take advantage of some of the insights gained from looking at the influence of IS on word order . |
Conclusions | For example, Chinese-align had fewer problems with word order , and most of those were due to local word-order problems. |
Results | Word order mixed up. |
Results | Garbled word order was chosen for 21-24% of the target-language system Who/W hat errors, but only 9% of the source-language system Who/W hat errors. |
Results | The source-language word order problems tended to be local, within-phrase errors (e. g., “the dispute over frozen funds” was translated as “the freezing of disputes”). |
Abstract | In statistical machine translation (SMT), syntax-based pre-ordering of the source language is an effective method for dealing with language pairs where there are great differences in their respective word orders . |
Dependency-based Pre-ordering Rule Set | If the reordering produced a Chinese phrase that had a closer word order to that of the English one, this structure would be a candidate pre-ordering rule. |
Dependency-based Pre-ordering Rule Set | In this example, with the application of an nsubj : rcmod rule, the phrase can be translated into “a senior official close to Sharon say”, which has a word order very close to English. |
Experiments | A bilingual speaker of Chinese and English looked at an original Chinese phrase and the pre-ordered one with their corresponding English phrase and judged whether the pre-ordering obtained a Chinese phrase that had a closer word order to the English one. |
Introduction | The reason for this is that there are great differences in their word orders . |
Introduction | Then, syntactic reordering rules are applied to these parse trees with the goal of reordering the source language sentences into the word order of the target language. |
Abstract | Preordering of a source language sentence to match target word order has proved to be useful for improving machine translation systems. |
Introduction | Dealing with word order differences between source and target languages presents a significant challenge for machine translation systems. |
Introduction | Recently, approaches that address the problem of word order differences between the source and target language without requiring a high quality source or target parser have been proposed (DeNero and Uszkoreit, 2011; Visweswariah et al., 2011; Neubig et al., 2012). |
Related work | Dealing with the problem of handling word order differences in machine translation has recently received much attention. |
Reordering issues in Urdu-English translation | In this section we describe the main sources of word order differences between Urdu and English since this is the language pair we experiment with in this paper. |
Reordering issues in Urdu-English translation | The typical word order in Urdu is Subject-Object-Verb unlike English in which the order is Subject-Verb-Object. |
Abstract | They show that MST parsing is almost as accurate as cubic-time dependency parsing in the case of English, and that it is more accurate with free word order languages. |
Conclusion and future work | It would also be interesting to apply these models to target languages that have free word order , which would presumably benefit more from the flexibility of non-projective dependency models. |
Dependency parsing for machine translation | Finally, dependency models are more flexible and account for (non-projective) head-modifier relations that CFG models fail to represent adequately, which is problematic with certain types of grammatical constructions and with free word order languages, |
Dependency parsing for machine translation | It is also linguistically desirable in the case of free word order languages such as Czech, Dutch, and German. |
Dependency parsing for machine translation | with relatively rigid word order such as English, there may be some concern that searching the space of non-projective dependency trees, which is considerably larger than the space of projective dependency trees, would yield poor performance. |
Related work | (2007) sidestep the need to operate large-scale word order changes during decoding (and thus lessening the need for syntactic decoding) by rearranging input words in the training data to match the syntactic structure of the target language. |
Transitions for Dependency Parsing | This has the effect that the order of the nodes i and j in the appended list 2 + B is reversed compared to the original word order in the sentence. |
Transitions for Dependency Parsing | It is important to note that SWAP is only permissible when the two nodes on top of the stack are in the original word order , which prevents the same two nodes from being swapped more than once, and when the leftmost node i is distinct from the root node 0. |
Transitions for Dependency Parsing | LEFT-ARC; or RIGHT-ARC; to subtrees whose yields are not adjacent according to the original word order . |
Conclusions and future plans | We will also address the problem of tailoring automatic evaluation measures to Russian — accounting for complex morphology and free word order . |
Conclusions and future plans | While the campaign was based exclusively on data in one language direction, the correlation results for automatic MT quality measures should be applicable to other languages with free word order and complex morphology. |
Introduction | One of the main challenges in developing MT systems for Russian and for evaluating them is the need to deal with its free word order and complex morphology. |
Results | While TER and GTM are known to provide better correlation with post-editing efforts for English (O’Brien, 2011), free word order and greater data sparseness on the sentence level makes TER much less reliable for Russian. |
Methods | However, another view of the DPK is possible by thinking of it as cheaply calculating rule production similarity by taking advantage of relatively strict English word ordering . |
Methods | This means, for example, that if the rule production NP —> NN J J DT were ever found in a tree, to DPK it would be indistinguishable from the common production NP —> DT JJ NN, despite having inverted word order , and thus would have a maximal similarity score. |
Methods | SST and PTK would assign this pair a much lower score for having completely different ordering, but we suggest that cases such as these are very rare due to the relatively strict word ordering of English. |
Properties of the Sentence Model | As regards the other neural sentence models, the class of NBoW models is by definition insensitive to word order . |
Properties of the Sentence Model | A sentence model based on a recurrent neural network is sensitive to word order , but it has a bias towards the latest words that it takes as input (Mikolov et al., 2011). |
Properties of the Sentence Model | Similarly, a recursive neural network is sensitive to word order but has a bias towards the topmost nodes in the tree; shallower trees mitigate this effect to some extent (Socher et al., 2013a). |
A Latent Variable Parser | However, topological fields explain a higher level of structure pertaining to clause-level word order , and we hypothesize that lexicalization is unlikely to be helpful. |
Introduction | Topic focus ordering and word order constraints that are sensitive to phenomena other than grammatical function produce discontinuous constituents, which are not naturally modelled by projective (i.e., without crossing branches) phrase structure trees. |
Introduction | Hocken-maier (2006) has translated the German TIGER corpus (Brants et al., 2002) into a CCG—based treebank to model word order variations in German. |
Topological Field Model of German | Topological fields are useful, because while Germanic word order is relatively free with respect to grammatical functions, the order of the topological fields is strict and unvarying. |
Experimental Results | This is hardly surprising, given the relatively small training set, and that the “the most difficult languages are those that combine a relatively free word order with a high degree of inflection”, as observed at the recent dependency parsing shared task (Nivre et al., 2007); both of these are characteristics of Latin. |
Introduction | Such a dilemma is not uncommon in languages with relatively free word order . |
Previous Work | Most previous work in morphological disambiguation, even when applied on morphologically complex languages with relatively free word order, |
Previous Work | However, it does have a relatively free word order , and is also highly inflected, with each word having up to nine morphological attributes, listed in Table 2. |
CMSMs Encode Vector Space Models | (2008) use permutations on vectors to account for word order . |
Compositionality and Matrices | Thereby, they realize a multiset (or bag-of-words) semantics that makes them insensitive to structural differences of phrases conveyed through word order . |
Introduction | This requires novel modeling paradigms, as most VSMs have been predominantly used for meaning representation of single words and the key problem of common bag-of-words-based VSMs is that word order information and thereby the structure of the language is lost. |
Related Work | Among the early attempts to provide more compelling combinatory functions to capture word order information and the non-commutativity of linguistic compositional operation in VSMs is the work of Kintsch (2001) who is using a more sophisticated addition function to model predicate-argument structures in VSMs. |
Introduction | The generative capacity of CCG is commonly attributed to its flexible composition rules, which allow it to model more complex word orders that context-free grammar can. |
Introduction | This means that word order in CCG cannot be fully lexicalized with the current formal tools; some ordering constraints must be specified via language-specific combination rules and not in lexicon entries. |
Multi-Modal CCG | Slash types make the derivation process sensitive to word order by restricting the use of compositions to categories with the appropriate type, and the transformation rules permute the order of the words in the string. |
The Generative Capacity of Pure CCG | Consider the examples in Figure 2: It is because we cannot ensure that the bs finish combining with the other bs before combining with the cs that the undesirable word order in Figure 2b has a derivation. |
Conclusion and Outlook | Our data analysis has confirmed that each of the feature groups contributes to the overall success of the RTE metric, and that its gains come from its better success at abstracting away from valid variation (such as word order or lexical substitution), while still detecting major semantic divergences. |
EXpt. 1: Predicting Absolute Scores | The first example (top) shows a good translation that is erroneously assigned a low score by METEORR because (a) it cannot align fact and reality (METEORR aligns only synonyms) and (b) it punishes the change of word order through its “penalty” term. |
Expt. 2: Predicting Pairwise Preferences | The human rater’s favorite translation deviates considerably from the reference in lexical choice, syntactic structure, and word order , for which it is punished by MTR (rank 3/5). |
Introduction | (2006) have identified a number of problems with BLEU and related n-gram-based scores: (1) BLEU-like metrics are unreliable at the level of individual sentences due to data sparsity; (2) BLEU metrics can be “gamed” by permuting word order ; (3) for some corpora and languages, the correlation to human ratings is very low even at the system level; (4) scores are biased towards statistical MT; (5) the quality gap between MT and human translations is not reflected in equally large BLEU differences. |
Discussion | Take the first line of Table 9 as an example, the paraphrased sentence “gob/How many fii/cigarettes Elwx/can fiEfiE/duty-free #7 /take 3Z/NULL” is not a fluent Chinese sentence, however, the rearranged word order is closer to English, which finally results in a much better translation. |
Forward-Translation vs. Back-Translation | Two possible reasons may explain this phenomenon: (l) in the first round of translation T 0 9 S1, some target word orders are reserved due to the reordering failure, and these reserved orders lead to a better result in the second round of translation; (2) the text generated by an MT system is more likely to be matched by the reversed but homologous MT system. |
Related Work | (2008) use grammars to paraphrase the source side of training data, covering aspects like word order and minor lexical variations (tenses etc.) |
Current practice in summary evaluation | rely solely on the surface sequence of words to determine similarity between summaries, but delves into what could be called a shallow semantic structure, comprising thematic roles such as subject and object, it is likely to notice identity of meaning where such identity is obscured by variations in word order . |
Discussion and future work | As a result, then, a lot of effort was put into developing metrics that can identify similar content despite non-similar form, which naturally led to the application of linguistically-oriented approaches that look beyond surface word order . |
Lexical-Functional Grammar and the LFG parser | C-structure represents the word order of the surface string and the hierarchical organisation of phrases in terms of trees. |
Discussion | The categories were: function word drop, content word drop, syntactic error (with a reasonable meaning), semantic error (regardless of syntax), word order issues, and function word mistranslation and “hallucination”. |
Discussion | ticeably had more word order and excess/wrong function word issues in the basic feature setting than any optimizer. |
Discussion | However, RM seemed to benefit the most from the sparse features, as its bad word order rate dropped close to MIRA, and its ex-cess/wrong function word rate dropped below that of MIRA with sparse features (MIRA’s rate actually doubled from its basic feature set). |
Abstract | This paper investigates the class of Tree-Tuple MCTAG with Shared Nodes, TT-MCTAG for short, an extension of Tree Adjoining Grammars that has been proposed for natural language processing, in particular for dealing with discontinuities and word order variation in languages such as German. |
TT-MCTAG 3.1 Introduction to TT-MCTAG | TT-MCTAG were introduced to deal with free word order phenomena in languages such as German. |
TT-MCTAG 3.1 Introduction to TT-MCTAG | Such a language has been proposed as an abstract description of the scrambling phenomenon as found in German and other free word order languages, |
Model Transfer | Word order information constitutes an implicit group that is always available. |
Related Work | This makes it hard to account for phenomena that are expressed differently in the languages considered, for example the syntactic function of a certain word may be indicated by a preposition, inflection or word order , depending on the language. |
Results | may be partly attributed to the fact that the mapping is derived from the same corpus as the evaluation data — Europarl (Koehn, 2005) — and partly by the similarity between English and French in terms of word order , usage of articles and prepositions. |
Discussion and Future Work | The representations offer information about dependency relations as well as word order , constituency and part-of-speech. |
ParGram and its Feature Space | In contrast, c-structures encode language particular differences in linear word order , surface morphological vs. syntactic structures, and constituency (Dalrymple, 2001). |
ParGram and its Feature Space | The left/upper c- and f-structures show the parse from the English ParGram grammar, the right/lower ones from Urdu ParGram grammar.4’5 The c-structures encode linear word order and constituency and thus look very different; e.g., the English structure is rather hierarchical while the Urdu structure is flat (Urdu is a free word-order language with no evidence for a VP; Butt (1995)). |
Background | This model improved upon the state-of-the-art in terms of automatic evaluation scores on held-out test data, but nevertheless an error analysis revealed a surprising number of word order , function word and inflection errors. |
Background | To improve word ordering decisions, White & Rajkumar (2012) demonstrated that incorporating a feature into the ranker inspired by Gibson’s (2000) dependency locality theory can deliver statistically significant improvements in automatic evaluation scores, better match the distributional characteristics of sentence orderings, and significantly reduce the number of serious ordering errors (some involving vicious ambiguities) as confirmed by a targeted human evaluation. |
Simple Reranking | Using dependencies allowed us to measure parse accuracy independently of word order . |
Introduction | Experimental evidence suggests that they do: 21-month-olds mistakenly interpreted word order in sentences such as “The girl and the boy kradded” as conveying agent-patient roles (Gertner and Fisher, 2006). |
Testing SRL Performance | We compared the behavior of noun pattern features to another simple representation of word order , position relative to the verb (Veeros). |
Testing SRL Performance | (2) Because NounPat features represent word order solely in terms of a sequence of nouns, an SRL equipped with these features will make the errors predicted by the structure-mapping account and documented in children (Gertner and Fisher, 2006). |
Introduction | Reordering models in statistical machine translation (SMT) model the word order difference when translating from one language to another. |
Related Work | Syntax-based reordering: Some previous work pre-ordered words in the source sentences, so that the word order of source and target sentences is similar. |
Related Work | (2012) obtained word order by using a reranking approach to reposition nodes in syntactic parse trees. |
Extensions of SemPOS | One of the major drawbacks of SemPOS is that it completely ignores word order . |
Extensions of SemPOS | This is too coarse even for languages with relatively free word order like Czech. |
Problems of BLEU | This focus goes directly against the properties of Czech: relatively free word order allows many permutations of words and rich morphology renders many valid word forms not confirmed by the reference.3 These problems are to some extent mitigated if several reference translations are available, but this is often not the case. |