Abstract | To resolve conflicts in shift-reduce parsing, we propose a maximum entropy model trained on the derivation graph of training data. |
Introduction | 3 A Maximum Entropy Based Shift-Reduce Parsing Model |
Introduction | We propose a maximum entropy model to resolve the conflicts for “h+h”: 2 |
Introduction | 1. relative frequencies in two directions; 2. lexical weights in two directions; 3. phrase penalty; 4. distance-based reordering model; 5. lexicaized reordering model; 6. n-gram language model model; 7. word penalty; 8. ill-formed structure penalty; 9. dependency language model; 10. maximum entropy parsing model. |
Abstract | Then we propose two novel methods to handle the two PAS ambiguities for SMT accordingly: 1) inside context integration; 2) a novel maximum entropy PAS disambiguation (MEPD) model. |
Conclusion and Future Work | Towards the MEPD model, we design a maximum entropy model for each ambitious source-side PASs. |
Introduction | As to the role ambiguity, we design a novel maximum entropy PAS disambiguation (MEPD) model to combine various context features, such as context words of PAS. |
Maximum Entropy PAS Disambiguation (MEPD) Model | In order to handle the role ambiguities, in this section, we concentrate on utilizing a maximum entropy model to incorporate the context information for PAS disambiguation. |
Maximum Entropy PAS Disambiguation (MEPD) Model | The maximum entropy model is the classical way to handle this problem: |
Maximum Entropy PAS Disambiguation (MEPD) Model | We train a maximum entropy classifier for each Sp via the off-the-shelf MaxEnt toolkit3. |
PAS-based Translation Framework | Thus to overcome this problem, we design two novel methods to cope with the PAS ambiguities: inside-context integration and a maximum entropy PAS disambiguation (MEPD) model. |
Related Work | (2010) designed maximum entropy (ME) classifiers to do better rule section for hierarchical phrase-based model and tree-to-string model respectively. |
Conclusions | Next, as the learnt maximum entropy models show, the hedge classification task reduces to a lookup for single keywords or phrases and to the evaluation of the text based on the most relevant cue alone. |
Methods | We chose not to add any weighting of features (by frequency or importance) and for the Maximum Entropy Model classifier we included binary data about whether single features occurred in the given context or not. |
Methods | 2.4 Maximum Entropy Classifier |
Methods | Maximum Entropy Models (Berger et al., 1996) seek to maximise the conditional probability of classes, given certain observations (features). |
Results | This shows that the Maximum Entropy Model in this situation could not learn any meaningful hypothesis from the cooc-curence of individually weak keywords. |
Results | Here we decided not to check whether these keywords made sense in scientific texts or not, but instead left this task to the maximum entropy classifier, and added only those keywords that were found reliable enough to predict spec label alone by the maxent model trained on the training dataset. |
Results | The majority of these phrases were found to be reliable enough for our maximum entropy model to predict a speculative class based on that single feature. |
Parser Restriction | Consequently, we developed a Maximum Entropy model for supertagging using the OpenNLP implementation.2 Similarly to Zhang and Kordoni (2006), we took training data from the gold—standard lexical types in the treebank associated with ERG (in our case, the July-07 version). |
Parser Restriction | We held back the jh5 section of the treebank for testing the Maximum Entropy model. |
Parser Restriction | Again, the lexical items that were to be restricted were controlled by a threshold, in this case the probability given by the maximum entropy model. |
Unknown Word Handling | The same maximum entropy tagger used in Section 3 was used and each open class word was tagged with its most likely lexical type, as predicted by the maximum entropy model. |
Unknown Word Handling | Again it is clear that the use of POS tags as features obviously improves the maximum entropy model, since this second model has almost 10% better coverage on our unseen texts. |
Discussion | To validate this conjecture on our translation test data, we compare the reordering performance among the MR08 system, the improved systems and the maximum entropy classifiers. |
Discussion | Then we evaluate the automatic reordering outputs generated from both our translation systems and maximum entropy classifiers. |
Discussion | Potential improvement analysis: Table 7 also shows that our current maximum entropy classifiers have room for improvement, especially for semantic reordering. |
Related Work | Ge (2010) presented a syntax-driven maximum entropy reordering model that predicted the source word translation order. |
Unified Linguistic Reordering Models | In order to predict either the leftmost or rightmost reordering type for two adjacent constituents, we use a maximum entropy classifier to estimate the probability of the reordering type rt 6 {M, DM, 8, DS} as follows: |
Unified Linguistic Reordering Models | For each pair of constituents, it first extracts its leftmost and rightmost reordering types (line 6) and then gets their respective probabilities returned by the maximum entropy classifiers defined in Section 3.1 |
Generative state tracking | In this work we will use maximum entropy models. |
Generative state tracking | 4.1 Maximum entropy models |
Generative state tracking | The maximum entropy framework (Berger et al. |
Abstract | We take a maximum entropy reranking approach to the problem which admits arbitrary features on a permutation of modifiers, exploiting hundreds of thousands of features in total. |
Conclusion | The straightforward maximum entropy reranking approach is able to significantly outperform preVious computational approaches by allowing for a richer model of the prenominal modifier ordering process. |
Introduction | By mapping a set of features across the training data and using a maximum entropy reranking model, we can learn optimal weights for these features and then order each set of modifiers in the test data according to our features and the learned weights. |
Introduction | In Section 3 we present the details of our maximum entropy reranking approach. |
Model | At test time, we choose an ordering cc 6 7r(B) using a maximum entropy reranking approach (Collins and Koo, 2005). |
Related Work | In this next section, we describe our maximum entropy reranking approach that tries to develop a more comprehensive model of the modifier ordering process to avoid the sparsity issues that previous ap- |
Abstract | We introduce a novel algorithm that is language independent: it exploits a maximum entropy letters model trained over the known words observed in the corpus and the distribution of the unknown words in known tag contexts, through iterative approximation. |
Conclusion | The algorithm we have proposed is language independent: it exploits a maximum entropy letters model trained over the known words observed in the corpus and the distribution of the unknown words in known tag contexts, through iterative approximation. |
Method | Letters A maximum entropy model is built for all unknown tokens in order to estimate their tag distribution. |
Method | For each possible such segmentation, the full feature vector is constructed, and submitted to the Maximum Entropy model. |
Method | To address this lack of precision, we learn a maximum entropy model on the basis of the following binary features: one feature for each pattern listed in column Formation of Table 3 (40 distinct patterns) and one feature for “no pattern”. |
Evaluation | For classification, we use a maximum entropy model (Berger et al., 1996), from the logistic regression package in Weka (Witten and Frank, 2005), with all default parameter settings. |
Evaluation | Note that our maximum entropy classifier actually produces a probability of non-referentiality, which is thresholded at 50% to make a classification. |
Results | When we inspect the probabilities produced by the maximum entropy classifier (Section 4.2), we see only a weak bias for the non-referential class on these examples, reflecting our classifier’s uncertainty. |
Results | The suitability of this kind of approach to correcting some of our system’s errors is especially obvious when we inspect the probabilities of the maximum entropy model’s output decisions on the Test-200 set. |
Results | Where the maximum entropy classifier makes mistakes, it does so with less confidence than when it classifies correct examples. |
Related Work | Additionally, as our tagger employs maximum entropy modeling, it is able to take into account a greater variety of contextual features, including those derived from parent nodes. |
The Approach | 3.2 Maximum Entropy Hypertagging |
The Approach | The resulting contextual features and gold-standard supertag for each predication were then used to train a maximum entropy classifier model. |
The Approach | Maximum entropy models describe a set of probability distributions of the form: |
Discussion | Par in Column 4 is the number of parameters in the trained maximum entropy model, which indicate the model complexity. |
Feature Construction | A maximum entropy multi-class classifier is trained and tested on the generated relation instances. |
Feature Construction | To implement the maximum entropy model, the toolkit provided by Le (2004) is employed. |
Introduction | We apply these approaches in a maximum entropy based system to extract relations from the ACE 2005 corpus. |
Related Work | The TRE systems use techniques such as: Rules (Regulars, Patterns and Propositions) (Miller et al., 1998), Kernel method (Zhang et al., 2006b; Zelenko et al., 2003), Belief network (Roth and Yih, 2002), Linear programming (Roth and Yih, 2007), Maximum entropy (Kambhatla, 2004) or SVM (GuoDong et al., 2005). |
Introduction | This is done using a maximum entropy model (call it MAXENT). |
Introduction | Then, the remaining constituents are ordered using a second maximum entropy model (MAXENTZ). |
Introduction | The maximum entropy model for both steps rely on the following features: |
Conclusions | Empirically, we show that the proposed measure, based on a maximum entropy classification, satisfied the constraints of the design of an objective measure to a high degree. |
Experimental Setup | 5.3.4 Maximum Entropy Model Classifier |
Experimental Setup | We used the maximum entropy classifier implementation in the MaxEnt toolkit4. |
Experimental Setup | One straightforward way of using the maximum entropy classifier’s prediction for our case is to directly use its predicted score-level — l, 2, 3 or 4. |
Models for Measuring Grammatical Competence | This is done by resorting to a maximum entropy model based approach, to which we turn next. |
Experiment | Following the description in (Lu et al., 2011), we remove neutral sentences and keep only high confident positive and negative sentences as predicted by a maximum entropy classifier trained on the labeled data. |
Experiment | This model use English labeled data and Chinese labeled data to obtain initial parameters for two maximum entropy classifiers (for English documents and Chinese documents), and then conduct EM-iterations to update the parameters to gradually improve the agreement of the two monolingual classifiers on the unlabeled parallel sentences. |
Related Work | (2002) compare the performance of three commonly used machine learning models (Naive Bayes, Maximum Entropy and SVM). |
Related Work | They propose a method of training two classifiers based on maximum entropy formulation to maximize their prediction agreement on the parallel corpus. |
Abstract | The proposed sense-based translation model enables the decoder to select appropriate translations for source words according to the inferred senses for these words using maximum entropy classifiers. |
Conclusion | We incorporate these learned word senses as translation evidences into maximum entropy classifiers which form the |
Experiments | Our baseline system is a state-of-the-art SMT system which adapts Bracketing Transduction Grammars (Wu, 1997) to phrasal translation and equips itself with a maximum entropy based reordering model (Xiong et al., 2006). |
Introduction | In order to incorporate word senses into SMT, we propose a sense-based translation model that is built on maximum entropy classifiers. |
Experiment | The L-BFGS method (Liu and Nocedal, 1989) was used to estimate the weight parameters of maximum entropy models. |
Experiment | The maximum entropy method with Gaussian prior smoothing was used to estimate the model parameters. |
Proposed Method | In this work, we use the maximum entropy method (Berger et al., 1996) as a discriminative machine learning method. |
Proposed Method | The reason for this is that a model based on the maximum entropy method can calculate probabilities. |
Argument Reordering Model | After all features are extracted, we use the maXimum entropy toolkit in Section 3.3 to train the maXimum entropy classifier as formulated in Eq. |
Predicate Translation Model | The essential component of our model is a maXimum entropy classifier pt(e|C that predicts the target translation 6 for a verbal predicate 2} given its surrounding context C(v). |
Predicate Translation Model | This will increase the number of classes to be predicted by the maximum entropy classifier. |
Predicate Translation Model | Using these events, we train one maximum entropy classifier per verbal predicate (16,121 verbs in total) via the off-the-shelf MaxEnt toolkit3. |
Abstract | We use a maximum entropy classifier to predict translation errors by integrating word posterior probability feature and linguistic features. |
Conclusions and Future Work | In this paper, we have presented a maximum entropy based approach to automatically detect errors in translation hypotheses generated by SMT |
Error Detection with a Maximum Entropy Model | For classification, we employ the maximum entropy model (Berger et al., 1996) to predict whether a word 21) is correct or incorrect given its feature vector p. |
Introduction | We integrate two sets of linguistic features into a maximum entropy (MaxEnt) model and develop a MaxEnt-based binary classifier to predict the category (correct or incorrect) for each word in a generated target sentence. |
Discussion | In the experiment described in Section 5, we used the linguistic information provided by human as the features on the maximum entropy method. |
Experiment | Here, we used the maximum entropy method tool (Zhang, 2008) with the default options except “-i 2000.” |
Linefeed Insertion Technique | These probabilities are estimated by the maximum entropy method. |
Linefeed Insertion Technique | 4.2 Features on Maximum Entropy Method |
Abstract | Methods like Maximum Entropy and Conditional Random Fields make use of features for the training purpose. |
Abstract | The feature reduction techniques lead to a substantial performance improvement over baseline Maximum Entropy technique. |
Introduction | In their Maximum Entropy (MaXEnt) based approach for Hindi NER development, Saha et al. |
Maximum Entropy Based Model for Hindi NER | Maximum Entropy (MaxEnt) principle is a commonly used technique which provides probability of belongingness of a token to a class. |
Sentence Completion via Language Modeling | 3.2 Maximum Entropy Class-Based N-gram Language Model |
Sentence Completion via Language Modeling | The key ideas are the modeling of word n—gram probabilities with a maximum entropy model, and the use of word—class information in the definition of the features. |
Sentence Completion via Language Modeling | Both components are themselves maximum entropy n—gram models in which the probability of a word or class label l given history h is determined by %exp(zk fk(h, The features fk(h, l) used are the presence of various patterns in the concatenation of hl, for example whether a particular suffix is present in hl. |
Comparative Study | 4.2 Maximum entropy reordering model |
Comparative Study | (Zens and Ney, 2006) proposed a maximum entropy classifier to predict the orientation of the next phrase given the current phrase. |
Introduction | The classifier can be trained with maximum likelihood like Moses lexicalized reordering (Koehn et al., 2007) and hierarchical lexicalized reordering model (Galley and Manning, 2008) or be trained under maximum entropy framework (Zens and Ney, 2006). |
A Joint Model with Unlabeled Parallel Text | Maximum entropy (MaxEnt) models1 have been widely used in many NLP tasks (Berger et al., 1996; Ratnaparkhi, 1997; Smith, 2006). |
Introduction | maximum entropy and SVM classifiers) as well as two alternative methods for leveraging unlabeled data (transductive SVMs (Joachims, 1999b) and co-training (Blum and Mitchell, 1998)). |
Related Work | Among the popular semi-supervised methods (e. g. EM on Nai've Bayes (Nigam et al., 2000), co-training (Blum and Mitchell, 1998), transductive SVMs (Joachims, 1999b), and co-regularization (Sindhwani et al., 2005; Amini et al., 2010)), our approach employs the EM algorithm, extending it to the bilingual case based on maximum entropy . |
Our Method | We have replaced KNN by other classifiers, such as those based on Maximum Entropy and Support Vector Machines, respectively. |
Our Method | Similarly, to study the effectiveness of the CRF model, it is replaced by its alternations, such as the HMM labeler and a beam search plus a maximum entropy based classifier. |
Related Work | Other methods, such as classification based on Maximum Entropy models and sequential application of Per-ceptron or Winnow (Collins, 2002), are also practiced. |
Distant Supervision | To increase coverage, we train a Maximum Entropy (MaxEnt) classifier (Manning and Klein, |
Features | Following previous sequence classification work with Maximum Entropy models (e. g., (Ratna-parkhi, 1996)), we use selected features of adjacent sentences. |
Sentiment Relevance | We divide both the SR and P&L corpora into training (50%) and test sets (50%) and train a Maximum Entropy (MaxEnt) classifier (Manning and Klein, 2003) with bag-of-word features. |
Conclusions and Future Work | In this paper, we presented a novel structured approach to EC prediction, which utilizes a maximum entropy model with various syntactic features and shows significantly higher accuracy than the state-of-the-art approaches. |
Experimental Results | We parse our test set with a maximum entropy based statistical parser (Ratna—parkhi, 1997) first. |
Related Work | A maximum entropy model is utilized to predict the tags, but different types of ECs are not distinguished. |
Automated Classification | We use a Maximum Entropy (Berger et al., 1996) classifier with a large number of boolean features, some of which are novel (e. g., the inclusion of words from WordNet definitions). |
Automated Classification | Maximum Entropy classifiers have been effective on a variety of NLP problems including preposition sense disambiguation (Ye and Baldwin, 2007), which is somewhat similar to noun compound interpretation. |
Automated Classification | The results for these runs using the Maximum Entropy classifier are presented in Table 4. |
Identification and Labeling Models | As in previous approaches to SRL, Brutus uses a two-stage pipeline of maximum entropy classifiers. |
Introduction | For the identification and labeling steps, we train a maximum entropy classifier (Berger et al., 1996) over sections 02-21 of a version of the CCGbank corpus (Hockenmaier and Steedman, 2007) that has been augmented by projecting the Propbank semantic annotations (Boxwell and White, 2008). |
Results | 6G&H use a generative model with a back-off lattice, whereas we use a maximum entropy classifier. |
A Classifier for Merging Basic-Edits | To predict whether two basic-edits address the same writing problem more discriminatively, we train a Maximum Entropy binary classifier based on features extracted from relevant contexts for the basic edits. |
Experimental Setup | MaXEntMerger We use the Maximum Entropy classifier to predict whether we should merge the two edits, as described in Section 34. |
Experimental Setup | We use a Maximum Entropy classifier along with features suggested by Swanson and Yamangil for this task. |