Abstract | This paper describes log-linear models for a general-purpose sentence realizer based on dependency structures. |
Abstract | Then the best linearizations compatible with the relative order are selected by log-linear models. |
Abstract | The log-linear models incorporate three types of feature functions, including dependency relations, surface words and headwords. |
Introduction | The other is log-linear model with different syntactic and semantic features (Velldal and Oepen, 2005; Nakanishi et al., 2005; Cahill et al., 2007). |
Introduction | Compared with n-gram model, log-linear model is more powerful in that it is easy to integrate a variety of features, and to tune feature weights to maximize the probability. |
Introduction | This paper presents a general-purpose realizer based on log-linear models for directly linearizing dependency relations given dependency structures. |
Log-linear Models | We use log-linear models for selecting the sequence with the highest probability from all the possible linearizations of a subtree. |
Log-linear Models | 4.1 The Log-linear Model |
Log-linear Models | Log-linear models employ a set of feature functions to describe properties of the data, and a set of learned weights to determine the contribution of each feature. |
Abstract | Experimental results demonstrate that our method can produce compact and accurate models much more quickly than a state-of-the-art quasi-Newton method for Ll-regularized log-linear models. |
Introduction | Log-linear models (a.k.a maximum entropy models) are one of the most widely-used probabilistic models in the field of natural language processing (NLP). |
Introduction | Log-linear models have a major advantage over other |
Introduction | Kazama and Tsujii (2003) describe a method for training a Ll-regularized log-linear model with a bound constrained version of the BFGS algorithm (Nocedal, 1980). |
Log-Linear Models | In this section, we briefly describe log-linear models used in NLP tasks and L1 regularization. |
Log-Linear Models | A log-linear model defines the following probabilistic distribution over possible structure y for input x: |
Log-Linear Models | The weights of the features in a log-linear model are optimized in such a way that they maximize the regularized conditional log-likelihood of the training data: |
Abstract | We investigate the influence of information status (IS) on constituent order in German, and integrate our findings into a log-linear surface realisation ranking model. |
Abstract | We build a log-linear model that incorporates these asymmetries for ranking German string reali-sations from input LFG F-structures. |
Conclusions | By calculating strong asymmetries between pairs of IS labels, and establishing the most frequent syntactic characteristics of these asymmetries, we designed a new set of features for a log-linear ranking model. |
Generation Ranking | (2007), a log-linear model based on the Lexical Functional Grammar (LFG) Framework (Kaplan and Bresnan, 1982). |
Generation Ranking | (2007) describe a log-linear model that uses linguistically motivated features and improves over a simple trigram language model baseline. |
Generation Ranking | We take this log-linear model as our starting point.3 |
Generation Ranking Experiments | These are all automatically removed from the list of features to give a total of 130 new features for the log-linear ranking model. |
Generation Ranking Experiments | We train the log-linear ranking model on 7759 F-structures from the TIGER treebank. |
Generation Ranking Experiments | We tune the parameters of the log-linear model on a small development set of 63 sentences, and carry out the final evaluation on 261 unseen sentences. |
Product of Experts | These features have to be included in estimating pkn-d, which has log-linear component models (Eq. |
Product of Experts | For these bigram or trigram overlap features, a similar log-linear model has to be normalized with a partition function, which considers the (unnormalized) scores of all possible target sentences, given the source sentence. |
QG for Paraphrase Modeling | 5 We use log-linear models three times: for the configuration, the lexical semantics class, and the word. |
QG for Paraphrase Modeling | (2007),6 we employ a 14-feature log-linear model over all logically possible combinations of the 14 WordNet relations (Miller, 1995).7 Similarly to Eq. |
QG for Paraphrase Modeling | 14, we normalize this log-linear model based on the set of relations that are nonempty in WordNet for the word 3360-). |
A Log-Linear Model for Actions | Given a state 3 = (5, d, j, W), the space of possible next actions is defined by enumerating sub-spans of unused words in the current sentence (i.e., subspans of the jth sentence of d not in W), and the possible commands and parameters in environment state 5.4 We model the policy distribution p(a|s; 6) over this action space in a log-linear fashion (Della Pietra et al., 1997; Lafferty et al., 2001), giving us the flexibility to incorporate a diverse range of features. |
Abstract | We use a policy gradient algorithm to estimate the parameters of a log-linear model for action selection. |
Introduction | Our policy is modeled in a log-linear fashion, allowing us to incorporate features of both the instruction text and the environment. |
Reinforcement Learning | which is the derivative of a log-linear distribution. |
Collaborative Decoding | In our work, any Maximum A Posteriori (MAP) SMT model with log-linear formulation (Och, 2002) can be a qualified candidate for a baseline model. |
Collaborative Decoding | The requirement for a log-linear model aims to provide a natural way to integrate the new co-decoding features. |
Collaborative Decoding | Referring to the log-linear model formulation, the translation posterior P(e'|dk) can be computed as: |
Conclusion | In this paper, we present a framework of collaborative decoding, in which multiple MT decoders are coordinated to search for better translations by re-ranking partial hypotheses using augmented log-linear models with translation consensus -based features. |
Analysis | We want to further study the happenings after we integrate the constraint feature (our SDB model and Marton and Resnik’s XP+) into the log-linear translation model. |
Introduction | These constituent matching/violation counts are used as a feature in the decoder’s log-linear model and their weights are tuned via minimal error rate training (MERT) (Och, 2003). |
Introduction | Similar to previous methods, our SDB model is integrated into the decoder’s log-linear model as a feature so that we can inherit the idea of soft constraints. |
The Syntax-Driven Bracketing Model 3.1 The Model | new feature into the log-linear translation model: PSDB (b|T, This feature is computed by the SDB model described in equation (3) or equation (4), which estimates a probability that a source span is to be translated as a unit within particular syntactic contexts. |
Consensus Decoding Algorithms | The distribution P(e| f) can be induced from a translation system’s features and weights by expo-nentiating with base I) to form a log-linear model: |
Experimental Results | The log-linear model weights were trained using MIRA, a margin-based optimization procedure that accommodates many features (Crammer and Singer, 2003; Chiang et al., 2008). |
Experimental Results | We tuned b, the base of the log-linear model, to optimize consensus decoding performance. |