Index of papers in Proc. ACL that mention
  • MaxEnt
Saha, Sujan Kumar and Mitra, Pabitra and Sarkar, Sudeshna
Introduction
based Feature Reduction for MaxEnt
Introduction
In their Maximum Entropy ( MaXEnt ) based approach for Hindi NER development, Saha et al.
Introduction
(2008) also observed that the performance of the MaXEnt based model often decreases when huge number of features are used in the model.
Maximum Entropy Based Model for Hindi NER
Maximum Entropy ( MaxEnt ) principle is a commonly used technique which provides probability of belongingness of a token to a class.
Maximum Entropy Based Model for Hindi NER
MaxEnt computes the probability p(0| h) for any 0 from the space of all possible outcomes 0, and for every h from the space of all possible histories H. In NER, history can be viewed as all information derivable from the training corpus relative to the current token.
Maximum Entropy Based Model for Hindi NER
The computation of probability (p(0|h)) of an outcome for a token in MaxEnt depends on a set of features that are helpful in making predictions about the outcome.
MaxEnt is mentioned in 16 sentences in this paper.
Topics mentioned in this paper:
Xiong, Deyi and Zhang, Min
Decoding with Sense-Based Translation Model
MaxEnt classifiers
Decoding with Sense-Based Translation Model
Once we get sense clusters for word tokens in test sentences, we load pre-trained MaXEnt classifiers of the corresponding word types.
Experiments
We trained our MaxEnt classifiers with the off-the-shelf MaxEnt tool.4 We performed 100 iterations of the L-BFGS algorithm implemented in the training toolkit on the collected training events from the sense-annotated data as described in Section 3.2.
Experiments
It took an average of 57.5 seconds for training a Maxent classifier.
Sense-Based Translation Model
entropy ( MaxEnt ) based classifier that is used to predict the translation probability p(é|C(c)).
Sense-Based Translation Model
The MaxEnt classifier can be formulated as follows.
Sense-Based Translation Model
This is not a issue for the MaxEnt classifier as it can deal with arbitrary overlapping features (Berger et al., 1996).
MaxEnt is mentioned in 14 sentences in this paper.
Topics mentioned in this paper:
Huang, Fei
Alignment Link Confidence Measure
We combine the HMM alignment, the BM alignment and the MaXEnt alignment (ME) using the above link selection algorithm.
Alignment Link Confidence Measure
Figure 3 shows such an example, where alignment errors in the MaXEnt alignment are shown with dotted lines.
Improved MaXEnt Aligner with Confidence-based Link Filtering
In addition to the alignment combination, we also improve the performance of the MaXEnt aligner through confidence-based alignment link filtering.
Improved MaXEnt Aligner with Confidence-based Link Filtering
Here we select the MaXEnt aligner because it has
Introduction
In section 4 we show how to improve a MaXEnt word alignment quality by removing low confidence alignment links, which also leads to improved translation quality as shown in section 5.
Related Work
Regarding word alignment combination, in addition to the commonly used ”intersection-union-refine” approach (Och and Ney, 2003), (Ayan and Dorr, 2006b) and (Ayan et al., 2005) combined alignment links from multiple word alignment based on a set of linguistic and alignment features within the MaXEnt framework or a neural net model.
Sentence Alignment Confidence Measure
HMM 54.72 -0.710 BM 62.53 -0.699 MaxEnt 69.26 -0.699
Sentence Alignment Confidence Measure
We randomly selected 512 Chinese-English (CE) sentence pairs and generated word alignment using the MaxEnt aligner (Ittycheriah and Roukos, 2005).
Sentence Alignment Confidence Measure
For each sentence pair in the CE test set, we calculate the confidence scores of the HMM alignment, the Block Model alignment and the MaXEnt alignment, then select the alignment with the highest confidence score.
Translation
We extract phrase translation tables from the baseline MaXEnt word alignment as well as the alignment with confidence-based link filtering, then translate the test set with each phrase translation table.
MaxEnt is mentioned in 12 sentences in this paper.
Topics mentioned in this paper:
Bhat, Suma and Xue, Huichao and Yoon, Su-Youn
Experimental Setup
Subsequently, the feature extraction stage (a VSM or a MaxEnt model as the case may be) generates the syntactic complexity feature which is then incorporated in a multiple linear regression model to generate a score.
Experimental Setup
We used the maximum entropy classifier implementation in the MaxEnt toolkit4.
Experimental Setup
The results that follow are based on MaxEnt classifier’s parameter settings initialized to zero.
Models for Measuring Grammatical Competence
The inductive classifier we use here is the maximum-entropy model ( MaxEnt ) which has been used to solve several statistical natural language processing problems with much success (Berger et al., 1996; Borthwick et al., 1998; Borthwick, 1999; Pang et al., 2002; Klein et al., 2003; Rosenfeld, 2005).
Models for Measuring Grammatical Competence
The productive feature engineering aspects of incorporating features into the discriminative MaxEnt classifier motivate the model choice for the problem at hand.
Models for Measuring Grammatical Competence
In particular, the ability of the MaxEnt model’s estimation routine to handle overlapping (correlated) features makes it directly applicable to address the first limitation of the VSM model.
MaxEnt is mentioned in 15 sentences in this paper.
Topics mentioned in this paper:
Liu, Jenny and Haghighi, Aria
Analysis
MAXENT seems to outperform the CLASS BASED baseline because it learns more from the training data.
Analysis
' —E|— MaxEnt —@— ClassBased I I I I I I I I I I I I I I I I I I I I I I I I I
Analysis
' —E|— MaXEnt —9— ClassB ased
Experiments
To evaluate our system ( MAXENT ) and our baselines, we partitioned the corpora into training and testing data.
Experiments
For each NP in the test data, we generated a set of modifiers and looked at the predicted orderings of the MAXENT , CLASS BASED, and GOOGLE N-GRAM methods.
Results
The MAXENT model consistently outperforms CLASS BASED across all test corpora and sequence lengths for both tokens and types, except when testing on the Brown and Switchboard corpora for modifier sequences of length 5, for which neither approach is able to make any correct predictions.
Results
MAXENT also outperforms the GOOGLE N-GRAM baseline for almost all test corpora and sequence lengths.
Results
For the Switchboard test corpus token and type accuracies, the GOOGLE N-GRAM baseline is more accurate than MAXENT for sequences of length 2 and overall, but the accuracy of MAXENT is competitive with that of GOOGLE N-GRAM.
MaxEnt is mentioned in 16 sentences in this paper.
Topics mentioned in this paper:
Lu, Bin and Tan, Chenhao and Cardie, Claire and K. Tsou, Benjamin
A Joint Model with Unlabeled Parallel Text
Maximum entropy ( MaxEnt ) models1 have been widely used in many NLP tasks (Berger et al., 1996; Ratnaparkhi, 1997; Smith, 2006).
A Joint Model with Unlabeled Parallel Text
With MaxEnt , we learn from the input data:
A Joint Model with Unlabeled Parallel Text
When 11 is 0, the algorithm ignores the unlabeled data and degenerates to two MaXEnt models trained on only the labeled data.
Experimental Setup 4.1 Data Sets and Preprocessing
MaxEnt: This method learns a MaxEnt classifier for each language given the monolingual labeled data; the unlabeled data is not used.
Results and Analysis
8 By making use of the unlabeled parallel data, our proposed approach improves the accuracy, compared to MaXEnt , by 8.12% (or 33.27% error reduction) on English and 3.44% (or 16.92% error reduction) on Chinese in the first setting, and by 5.07% (or 19.67% error reduction) on English and 3.87% (or 19.4% error reduction) on Chinese in the second setting.
Results and Analysis
8Significance is tested using paired t-tests with p<0.05: denotes statistical significance compared to the corresponding performance of MaXEnt ; * denotes statistical significance compared to SVM; and r denotes statistical significance compared to Co-SVM.
Results and Analysis
When 11 is set to 0, the joint model degenerates to two MaXEnt models trained with only the labeled data.
MaxEnt is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Chambers, Nathanael
Experiments and Results
The MaXEnt classifiers are also from the Stanford toolkit, and both the document and year mention classifiers use its default settings (quadratic prior).
Experiments and Results
MaXEnt Unigram is our new discriminative model for this task.
Experiments and Results
MaXEnt Time is the discriminative model with rich time features (but not NER) as described in Section 3.3.2 (Time+NER includes NER).
Learning Time Constraints
Figure 2: Distribution over years for a single document as output by a MaxEnt classifier.
Learning Time Constraints
We train a MaxEnt model on each year mention, to be described next.
Learning Time Constraints
We use a MaxEnt classifier trained on the individual year mentions.
Timestamp Classifiers
We used a MaxEnt model and evaluated with the same filtering methods based
Timestamp Classifiers
Ultimately, this MaxEnt model vastly outperforms these NLLR models.
Timestamp Classifiers
The above language modeling and MaxEnt approaches are token-based classifiers that one could apply to any topic classification domain.
MaxEnt is mentioned in 15 sentences in this paper.
Topics mentioned in this paper:
Scheible, Christian and Schütze, Hinrich
Distant Supervision
To increase coverage, we train a Maximum Entropy ( MaxEnt ) classifier (Manning and Klein,
Distant Supervision
The MaxEnt model achieves an F1 of 61.2% on the SR corpus (Table 3, line 2).
Distant Supervision
As described in Section 4, each document is represented as a graph of sentences and weights between sentences and source/sink nodes representing SR/SNR are set to the confidence values obtained from the distantly trained MaxEnt classifier.
Sentiment Relevance
We divide both the SR and P&L corpora into training (50%) and test sets (50%) and train a Maximum Entropy ( MaxEnt ) classifier (Manning and Klein, 2003) with bag-of-word features.
MaxEnt is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Zhao, Bing and Lee, Young-Suk and Luo, Xiaoqiang and Li, Liu
Discussions and Conclusions
We achieved a high accuracy of 84.7% for predicting such boundaries using MaXEnt model on machine parse trees.
Elementary Trees to String Grammar
During training, we label nodes with translation boundaries, as one additional fitnction tag; during decoding, we employ the MaxEnt model to predict the translation boundary label probability for each span associated with a subgraph y, and discourage derivations accordingly for using nonterminals over the non—translation boundary span.
Experiments
To learn our MaxEnt models defined in § 3.3, we collect the events during extracting elm2str grammar in training time, and learn the model using improved iterative scaling.
Experiments
There are 16 thousand human parse trees with human alignment; additional 1 thousand human parse and aligned sent-pairs are used as unseen test set to verify our MaxEnt models and parsers.
Experiments
It showed our MaxEnt model is very accurate using human trees: 94.5% of accuracy, and about 84.7% of accuracy for using the machine parsed trees.
Introduction
The boundary cases were not addressed in the previous literature for trees, and here we include them in our feature sets for learning a MaxEnt model to predict the transformations.
Introduction
The rest of the paper is organized as follows: in section 2, we analyze the projectable structures using human aligned and parsed data, to identify the problems for SCFG in general; in section 3, our proposed approach is explained in detail, including the statistical operators using a MaxEnt model; in section 4, we illustrate the integration of the proposed approach in our decoder; in section 5, we present experimental results; in section 6, we conclude with discussions and future work.
MaxEnt is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Szarvas, Gy"orgy
Methods
This is performed by weighting features to maximise the likelihood of data and, for each instance, decisions are made based on features present at that point, thus maxent classification is quite suitable for our purposes.
Methods
As feature weights are mutually estimated, the maxent classifier is capable of taking feature dependence into account.
Methods
By downweighting such features, maxent is capable of modelling to a certain extent the special characteristics which arise from the automatic or weakly supervised training data acquisition procedure.
Results
Here we decided not to check whether these keywords made sense in scientific texts or not, but instead left this task to the maximum entropy classifier, and added only those keywords that were found reliable enough to predict spec label alone by the maxent model trained on the training dataset.
Results
This 54—keyword maxent classifier got an F5=1(spec) score of 79.73%.
Results
We manually examined all keywords that had a P(spec) > 0.5 given as a standalone instance for our maxent model, and constructed a dictionary of hedge cues from the promising candidates.
MaxEnt is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Xiong, Deyi and Zhang, Min and Li, Haizhou
Error Detection with a Maximum Entropy Model
We tune our model feature weights using an off-the-shelf MaXEnt toolkit (Zhang, 2004).
Error Detection with a Maximum Entropy Model
During test, if the probability p(correct|¢) is larger than p(incorrect|¢) according the trained MaXEnt model, the word is labeled as correct otherwise incorrect.
Experiments
Starting with MaXEnt models with single linguistic feature or word posterior probability based feature, we incorporated additional features incre-mentally by combining features together.
Experiments
We conducted three groups of experiments using the MaXEnt based error detection model with various feature combinations.
Experiments
Using discrete word posterior probabilities as features in the MaxEnt based error detection model is marginally better than word posterior probability thresholding in terms of CER, but obtains a 13.79% relative improvement in F measure.
Introduction
We integrate two sets of linguistic features into a maximum entropy ( MaxEnt ) model and develop a MaxEnt-based binary classifier to predict the category (correct or incorrect) for each word in a generated target sentence.
MaxEnt is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Chen, Liwei and Feng, Yansong and Huang, Songfang and Qin, Yong and Zhao, Dongyan
Conclusions
Furthermore, our framework is scalable for other local sentence level extractors in addition to the MaxEnt model.
Experiments
Our ILP model and its variants all outperform Mintz++ in precision in both datasets, indicating that our approach helps filter out incorrect predictions from the output of MaxEnt model.
Experiments
However, in the Riedel’s dataset, Mintz++, the MaxEnt relation extractor, does not perform well, and our framework cannot improve its performance.
Experiments
Hence, our framework does not perform well due to the poor performance of MaXEnt extractor and the lack of clues.
The Framework
By adopting ILP, we can combine the local information including MaXEnt confidence scores and the implicit relation backgrounds that are embedded into global consistencies of the entity tuples together.
MaxEnt is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
DeNero, John and Chiang, David and Knight, Kevin
Consensus Decoding Algorithms
The standard Viterbi decoding objective is to find 6* = arg maxe A - 6( f, e).
Consensus Decoding Algorithms
6 = arg maxe EP(e/|f) [8(6; (3’)] arg maxe Z P(e’|f) - 8(6; 6’)
Consensus Decoding Algorithms
arg maerE EP(e’|f) [3(6; 6/” = arg maxe Z P(e/|f) - ij'(€) ° ¢j(el) e’EE j
MaxEnt is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Doyle, Gabriel and Bicknell, Klinton and Levy, Roger
Experiment
To establish performance for the phonological standard, we use the IBPOT learner to find constraint weights but do not update M. The resultant learner is essentially MaxEnt OT with the weights estimated through Metropolis sampling instead of gradient ascent.
Introduction
We consider this question by examining the dominant framework in modern phonology, Optimality Theory (Prince and Smolensky, 1993, OT), implemented in a log-linear framework, MaXEnt OT (Goldwater and Johnson, 2003), with output forms’ probabilities based on a weighted sum of
Phonology and Optimality Theory 2.1 OT structure
In IBPOT, we use the log-linear EVAL developed by Goldwater and J ohn-son (2003) in their MaxEnt OT system.
Phonology and Optimality Theory 2.1 OT structure
MEOT also is motivated by the general MaxEnt framework, whereas most other OT formulations are ad hoc constructions specific to phonology.
Phonology and Optimality Theory 2.1 OT structure
In MaXEnt OT, each constraint has a weight, and the candidates’ scores are the sums of the weights of violated constraints.
MaxEnt is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Visweswariah, Karthik and Khapra, Mitesh M. and Ramanathan, Ananthakrishnan
Generating reference reordering from parallel sentences
This model was significantly better than the MaxEnt aligner (Ittycheriah and Roukos, 2005) and is also flexible in the sense that it allows for arbitrary features to be introduced while still keeping training and decoding tractable by using a greedy decoding algorithm that explores potential alignments in a small neighborhood of the current alignment.
Generating reference reordering from parallel sentences
The model thus needs a reasonably good initial alignment to start with for which we use the MaxEnt aligner (Ittycheriah and Roukos, 2005) as in McCarley et al.
Results and Discussions
None - 35.5 Manual 180K 52.5 MaxEnt 70.0 3.9M 49.5
Results and Discussions
We see that the quality of the alignments matter a great deal to the reordering model; using MaxEnt alignments cause a degradation in performance over just using a small set of manual word alignments.
MaxEnt is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Zhang, Longkai and Li, Li and He, Zhengyan and Wang, Houfeng and Sun, Ni
Experiment
Method P R F OOV—R Stanford 0.861 0.853 0.857 0.639 ICTCLAS 0.812 0.861 0.836 0.602 Li-Sun 0.707 0.820 0.760 0.734 Maxent 0.868 0.844 0.856 0.760 No-punc 0.865 0.829 0.846 0.760 No-balance 0.869 0.877 0.873 0.757 Our method 0.875 0.875 0.875 0.773
Experiment
Maxent only uses the PKU data for training, with neither punctuation information nor self-training framework incorporated.
Experiment
The comparison of Maxent and No-punctuation
MaxEnt is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Huang, Fei and Xu, Jian-Ming and Ittycheriah, Abraham and Roukos, Salim
Document-specific MT System
ment (HMM (Vogel et al., 1996) and MaxEnt (Ittycheriah and Roukos, 2005) alignment models, phrase pair extraction, MT model training (Ittycheriah and Roukos, 2007) and LM model training.
Related Work
Target part-of-speech and null dependency link are exploited in a MaXEnt classifier to improve the MT quality estimation (Xiong et al., 2010).
Static MT Quality Estimation
0 17 decoding features, including phrase translation probabilities (source-to-target and target-to-source), word translation probabilities (also in both directions), maxent prob-abilitiesl, word count, phrase count, distor-
Static MT Quality Estimation
1The maxent probability is the translation probability
MaxEnt is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Li, Junhui and Marton, Yuval and Resnik, Philip and Daumé III, Hal
Discussion
It shows that 1) as expected, our classifiers do worse on the harder semantic reordering prediction than syntactic reordering prediction; 2) thanks to the high accuracy obtained by the maxent classifiers, integrating either the syntactic or the semantic reordering constraints results in better reordering performance from both syntactic and semantic perspectives; 3) in terms of the mutual impact, the syntactic reordering models help improving semantic reordering more than the semantic reordering
Discussion
Syntactic Semantic l-m rm l-m rm MR08 75.0 78.0 66.3 68.5 +syn-reorder 78.4 80.9 69.0 70.2 +sem—reorder 76.0 78.8 70.7 72.7 +b0th 78.6 81.7 70.6 72.1 Maxent Classifier 80.7 85.6 70.9 73.5
Experiments
tactic parsing and semantic role labeling on the Chinese sentences, then train the models by using MaxEnt toolkit with L1 regularizer (Tsuruoka et al., 2009).3 Table 3 shows the reordering type distribution over the training data.
Related Work
Marton and Resnik (2008) employed soft syntactic constraints with weighted binary features and no MaXEnt model.
MaxEnt is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Pershina, Maria and Min, Bonan and Xu, Wei and Grishman, Ralph
Available at http://nlp. stanford.edu/software/mimlre. shtml.
—l— Guided DS Semi—MIML —.— DS+upsampling —'— MaxEnt
Available at http://nlp. stanford.edu/software/mimlre. shtml.
Our baselines: 1) MaXEnt is a supervised maximum entropy baseline trained on a human-labeled data; 2) DS+upsamp|ing is an upsampling experiment, where MIML was trained on a mix of a distantly-labeled and human-labeled data; 3) Semi-MIML is a recent semi-supervised extension.
The Challenge
We experimentally tested alternative feature sets by building supervised Maximum Entropy ( MaxEnt ) models using the hand-labeled data (Table 3), and selected an effective combination of three features from the full feature set used by Surdeanu et al., (2011):
The Challenge
Table 3: Performance of a MaxEnt , trained on hand-labeled data using all features (Surdeanu et al., 2011) vs using a subset of two (types of entities, dependency path), or three (adding a span word) features, and evaluated on the test set.
MaxEnt is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Liu, Yang
Introduction
| standard | 34.79 | 56.93 | + depLM 3529* 56.17** + maxent 35.40** 56.09**
Introduction
+ depLM & maxent 35.71** 55.87**
Introduction
Adding dependency language model (“depLM”) and the maximum entropy shift-reduce parsing model ( “maxent” ) significantly improves BLEU and TER on the development set, both separately and jointly.
MaxEnt is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Celikyilmaz, Asli and Hakkani-Tur, Dilek
MultiLayer Context Model - MCM
and predicted dialog act by arg maxa 13(a|ud*):
MultiLayer Context Model - MCM
* N M311 a; = arg maXa [6351 a * HF”, M1: ] (6)
MultiLayer Context Model - MCM
For each segment wuj in u, its predicted slot are determined by arg maXS P(sj|wuj,d*,sj_1):
MaxEnt is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Xiang, Bing and Luo, Xiaoqiang and Zhou, Bowen
Abstract
In this paper we present a comprehensive treatment of ECs by first recovering them with a structured MaxEnt model with a rich set of syntactic and lexical features, and then incorporating the predicted ECs into a Chinese-to-English machine translation task through multiple approaches, including the extraction of EC-specific sparse features.
Chinese Empty Category Prediction
We propose a structured MaXEnt model for predicting ECs.
Chinese Empty Category Prediction
(1) is the familiar log linear (or MaXEnt ) model, where fk,(ei_1,T, 6,) is the feature function and
MaxEnt is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Joshi, Aditya and Mishra, Abhijit and Senthamilselvan, Nivvedan and Bhattacharyya, Pushpak
Discussion
We use three sentiment classification techniques: Na‘1've Bayes, MaxEnt and SVM with un-igrams, bigrams and trigrams as features.
Discussion
MaxEnt (Movie) -0.29 (72.17) MaxEnt (Twitter) -0.26 (71.68) SVM (Movie) -().24 (66.27) SVM (Twitter) -().19 (73.15)
Discussion
MaxEnt has the highest negative correlation of -().29 and -().26.
MaxEnt is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Kalchbrenner, Nal and Grefenstette, Edward and Blunsom, Phil
Experiments
unigram, bigram, trigram 92.6 MAXENT POS, chunks, NE, supertags
Experiments
unigram, bigram, trigram 93.6 MAxENT POS, wh-word, head word
Experiments
SVM 81.6 BINB 82.7 MAXENT 83.0 MAX-TDNN 78.8 NBOW 80.9 DCNN 87.4
MaxEnt is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: