Index of papers in Proc. ACL 2008 that mention
  • maximum entropy
Szarvas, Gy"orgy
Conclusions
Next, as the learnt maximum entropy models show, the hedge classification task reduces to a lookup for single keywords or phrases and to the evaluation of the text based on the most relevant cue alone.
Methods
We chose not to add any weighting of features (by frequency or importance) and for the Maximum Entropy Model classifier we included binary data about whether single features occurred in the given context or not.
Methods
2.4 Maximum Entropy Classifier
Methods
Maximum Entropy Models (Berger et al., 1996) seek to maximise the conditional probability of classes, given certain observations (features).
Results
This shows that the Maximum Entropy Model in this situation could not learn any meaningful hypothesis from the cooc-curence of individually weak keywords.
Results
Here we decided not to check whether these keywords made sense in scientific texts or not, but instead left this task to the maximum entropy classifier, and added only those keywords that were found reliable enough to predict spec label alone by the maxent model trained on the training dataset.
Results
The majority of these phrases were found to be reliable enough for our maximum entropy model to predict a speculative class based on that single feature.
maximum entropy is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Dridan, Rebecca and Kordoni, Valia and Nicholson, Jeremy
Parser Restriction
Consequently, we developed a Maximum Entropy model for supertagging using the OpenNLP implementation.2 Similarly to Zhang and Kordoni (2006), we took training data from the gold—standard lexical types in the treebank associated with ERG (in our case, the July-07 version).
Parser Restriction
We held back the jh5 section of the treebank for testing the Maximum Entropy model.
Parser Restriction
Again, the lexical items that were to be restricted were controlled by a threshold, in this case the probability given by the maximum entropy model.
Unknown Word Handling
The same maximum entropy tagger used in Section 3 was used and each open class word was tagged with its most likely lexical type, as predicted by the maximum entropy model.
Unknown Word Handling
Again it is clear that the use of POS tags as features obviously improves the maximum entropy model, since this second model has almost 10% better coverage on our unseen texts.
maximum entropy is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Adler, Meni and Goldberg, Yoav and Gabay, David and Elhadad, Michael
Abstract
We introduce a novel algorithm that is language independent: it exploits a maximum entropy letters model trained over the known words observed in the corpus and the distribution of the unknown words in known tag contexts, through iterative approximation.
Conclusion
The algorithm we have proposed is language independent: it exploits a maximum entropy letters model trained over the known words observed in the corpus and the distribution of the unknown words in known tag contexts, through iterative approximation.
Method
Letters A maximum entropy model is built for all unknown tokens in order to estimate their tag distribution.
Method
For each possible such segmentation, the full feature vector is constructed, and submitted to the Maximum Entropy model.
Method
To address this lack of precision, we learn a maximum entropy model on the basis of the following binary features: one feature for each pattern listed in column Formation of Table 3 (40 distinct patterns) and one feature for “no pattern”.
maximum entropy is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Bergsma, Shane and Lin, Dekang and Goebel, Randy
Evaluation
For classification, we use a maximum entropy model (Berger et al., 1996), from the logistic regression package in Weka (Witten and Frank, 2005), with all default parameter settings.
Evaluation
Note that our maximum entropy classifier actually produces a probability of non-referentiality, which is thresholded at 50% to make a classification.
Results
When we inspect the probabilities produced by the maximum entropy classifier (Section 4.2), we see only a weak bias for the non-referential class on these examples, reflecting our classifier’s uncertainty.
Results
The suitability of this kind of approach to correcting some of our system’s errors is especially obvious when we inspect the probabilities of the maximum entropy model’s output decisions on the Test-200 set.
Results
Where the maximum entropy classifier makes mistakes, it does so with less confidence than when it classifies correct examples.
maximum entropy is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Espinosa, Dominic and White, Michael and Mehay, Dennis
Related Work
Additionally, as our tagger employs maximum entropy modeling, it is able to take into account a greater variety of contextual features, including those derived from parent nodes.
The Approach
3.2 Maximum Entropy Hypertagging
The Approach
The resulting contextual features and gold-standard supertag for each predication were then used to train a maximum entropy classifier model.
The Approach
Maximum entropy models describe a set of probability distributions of the form:
maximum entropy is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Saha, Sujan Kumar and Mitra, Pabitra and Sarkar, Sudeshna
Abstract
Methods like Maximum Entropy and Conditional Random Fields make use of features for the training purpose.
Abstract
The feature reduction techniques lead to a substantial performance improvement over baseline Maximum Entropy technique.
Introduction
In their Maximum Entropy (MaXEnt) based approach for Hindi NER development, Saha et al.
Maximum Entropy Based Model for Hindi NER
Maximum Entropy (MaxEnt) principle is a commonly used technique which provides probability of belongingness of a token to a class.
maximum entropy is mentioned in 4 sentences in this paper.
Topics mentioned in this paper: