Conclusions | Next, as the learnt maximum entropy models show, the hedge classification task reduces to a lookup for single keywords or phrases and to the evaluation of the text based on the most relevant cue alone. |
Methods | We chose not to add any weighting of features (by frequency or importance) and for the Maximum Entropy Model classifier we included binary data about whether single features occurred in the given context or not. |
Methods | 2.4 Maximum Entropy Classifier |
Methods | Maximum Entropy Models (Berger et al., 1996) seek to maximise the conditional probability of classes, given certain observations (features). |
Results | This shows that the Maximum Entropy Model in this situation could not learn any meaningful hypothesis from the cooc-curence of individually weak keywords. |
Results | Here we decided not to check whether these keywords made sense in scientific texts or not, but instead left this task to the maximum entropy classifier, and added only those keywords that were found reliable enough to predict spec label alone by the maxent model trained on the training dataset. |
Results | The majority of these phrases were found to be reliable enough for our maximum entropy model to predict a speculative class based on that single feature. |
Parser Restriction | Consequently, we developed a Maximum Entropy model for supertagging using the OpenNLP implementation.2 Similarly to Zhang and Kordoni (2006), we took training data from the gold—standard lexical types in the treebank associated with ERG (in our case, the July-07 version). |
Parser Restriction | We held back the jh5 section of the treebank for testing the Maximum Entropy model. |
Parser Restriction | Again, the lexical items that were to be restricted were controlled by a threshold, in this case the probability given by the maximum entropy model. |
Unknown Word Handling | The same maximum entropy tagger used in Section 3 was used and each open class word was tagged with its most likely lexical type, as predicted by the maximum entropy model. |
Unknown Word Handling | Again it is clear that the use of POS tags as features obviously improves the maximum entropy model, since this second model has almost 10% better coverage on our unseen texts. |
Abstract | We introduce a novel algorithm that is language independent: it exploits a maximum entropy letters model trained over the known words observed in the corpus and the distribution of the unknown words in known tag contexts, through iterative approximation. |
Conclusion | The algorithm we have proposed is language independent: it exploits a maximum entropy letters model trained over the known words observed in the corpus and the distribution of the unknown words in known tag contexts, through iterative approximation. |
Method | Letters A maximum entropy model is built for all unknown tokens in order to estimate their tag distribution. |
Method | For each possible such segmentation, the full feature vector is constructed, and submitted to the Maximum Entropy model. |
Method | To address this lack of precision, we learn a maximum entropy model on the basis of the following binary features: one feature for each pattern listed in column Formation of Table 3 (40 distinct patterns) and one feature for “no pattern”. |
Evaluation | For classification, we use a maximum entropy model (Berger et al., 1996), from the logistic regression package in Weka (Witten and Frank, 2005), with all default parameter settings. |
Evaluation | Note that our maximum entropy classifier actually produces a probability of non-referentiality, which is thresholded at 50% to make a classification. |
Results | When we inspect the probabilities produced by the maximum entropy classifier (Section 4.2), we see only a weak bias for the non-referential class on these examples, reflecting our classifier’s uncertainty. |
Results | The suitability of this kind of approach to correcting some of our system’s errors is especially obvious when we inspect the probabilities of the maximum entropy model’s output decisions on the Test-200 set. |
Results | Where the maximum entropy classifier makes mistakes, it does so with less confidence than when it classifies correct examples. |
Related Work | Additionally, as our tagger employs maximum entropy modeling, it is able to take into account a greater variety of contextual features, including those derived from parent nodes. |
The Approach | 3.2 Maximum Entropy Hypertagging |
The Approach | The resulting contextual features and gold-standard supertag for each predication were then used to train a maximum entropy classifier model. |
The Approach | Maximum entropy models describe a set of probability distributions of the form: |
Abstract | Methods like Maximum Entropy and Conditional Random Fields make use of features for the training purpose. |
Abstract | The feature reduction techniques lead to a substantial performance improvement over baseline Maximum Entropy technique. |
Introduction | In their Maximum Entropy (MaXEnt) based approach for Hindi NER development, Saha et al. |
Maximum Entropy Based Model for Hindi NER | Maximum Entropy (MaxEnt) principle is a commonly used technique which provides probability of belongingness of a token to a class. |