Abstract | In this paper, we present a robust extension of logistic regression that incorporates the possibility of mislabelling directly into the objective. |
Abstract | This model can be trained through nearly the same means as logistic regression , and retains its efficiency on high-dimensional datasets. |
Introduction | In this work we argue that incorrect examples should be explicitly modelled during training, and present a simple extension of logistic regression that incorporates the possibility of mislabelling directly into the objective. |
Introduction | It has a convex objective, is well-suited to high-dimensional data, and can be efficiently trained with minimal changes to the logistic regression pipeline. |
Introduction | In experiments on a large, noisy NER dataset, we find that this method can provide an improvement over standard logistic regression when annotation errors are present. |
Model | Recall that in binary logistic regression , the probability of an example :10, being positive is modeled as |
Model | Upon writing the objective in this way, we immediately see that it is convex, just as standard L1-penalized logistic regression is convex. |
Related Work | Bootkrajang and Kaban (2012) present an extension of logistic regression that models annotation errors through flipping probabilities. |
Related Work | In this work we adapt the technique to logistic regression . |
Related Work | To the best of our knowledge, we are the first to experiment with adding ‘shift parameters’ to logistic regression and demonstrate that the model is especially well-suited to the type of high-dimensional, noisy datasets commonly used in NLP. |
Empirical Evaluation | We can see that the chain based models, Linear Chain Markov Model (LCMM) and Global Chain Model (GCM), outperform the unstructured models, namely Logistic regression (LR) and Decision Trees (J48). |
Intervention Prediction Models | 3.2 Logistic Regression (LR) |
Intervention Prediction Models | Our first attempt at solving this problem involved training a logistic regression for the binary prediction task which models P(r|t). |
Intervention Prediction Models | Our logistic regression model uses the following two types of features: Thread only features and Aggregated post features. |
Introduction | The first uses a logistic regression model that primarily incorporates high level information about threads and posts. |
Generalized Additive Models | A well known GLM in the NLP community is logistic regression (which may alternatively be derived as a maximum entropy classifier). |
Generalized Additive Models | In logistic regression , the response is assumed to be Binomial and the chosen link function is the logit transformation, |
Generalized Additive Models | Our first new approach for handling differences in transcripts is an extension of the logistic regression model previously used in data fusion work, (Savoy et al., 1988). |
Previous Work | Perhaps the most closely related proposal, using logistic regression , was made first by Savoy et al. |
Previous Work | Logistic regression is one example from the broad class of models which GAMs encompass. |
Previous Work | Unlike GAMs in their full generality however, logistic regression imposes a comparatively high degree of linearity in the model structure. |
Set Expansion | The second method uses a standard machine learning algorithm, logistic regression . |
Set Expansion | Thus, logistic regression using Moby and no active learning is L(M,—). |
Set Expansion | For logistic regression , we set the regularization penalty 02 to 1, based on qualitative analysis during development (before seeing the test data). |
Query Classification | For each query class, we train a logistic regression classifier (Vapnik 1999) with L2 regularization. |
Query Classification | Given an input x, represented as a vector of m features: (x1, x2, xm), a logistic regression classifier with parameter vector w =(w1, w2, wm) computes the posterior probability of the output y, which is either 1 or -1, as 1 190’ Ix) - W |
Query Classification | We made a small modification to the objective function for logistic regression to take into account the prior distribution and to use 50% as a uniform decision boundary for all the classes. |
Analysis and discussion | The fitted logistic regression model (black line) has a statistically significant coefficient for response entropy (p < 0.001). |
Analysis and discussion | Figure 5 plots the relationship between the response entropy and the accuracy of our decision procedure, along with a fitted logistic regression model using response entropy to predict whether our system’s inference was correct. |
Methods | A logistic regression model can capture these facts. |
Methods | tract ages from the positive and negative snippets obtained, and we fit a logistic regression to these data. |
Methods | The logistic regression thus has only one factor — the unit of measure (age in the case of little kids). |
Selectional branching | This can be expressed as a logistic regression: |
Selectional branching | Algorithm 2 shows our adaptation of ADAGRAD with logistic regression for multi-class classification. |
Selectional branching | Note that when used with logistic regression , ADAGRAD takes a regular gradient instead of a subgradient method for updating weights. |
Problem Definition | 4.2.2 Logistic Regression |
Problem Definition | We implemented Logistic Regression using L2-Normalization, finding this to outperform Ll-Normalized and non-normalized versions. |
Problem Definition | The strength of the normalization in the logistic regression required cross-validation, which we limited to 20 values logarithmically spaced between 10—4 and 104. |
Automatically Identifying Biased Language | We trained a logistic regression model on a feature vector for every word that appears in the NPOV sentences from the training set, with the bias-inducing words as the positive class, and all the other words as the negative class. |
Automatically Identifying Biased Language | The types of features used in the logistic regression model are listed in Table 3, together with their value space. |
Automatically Identifying Biased Language | Logistic regression model that only uses the features based on Liu et al.’s (2005) lexicons of positive and negative words (i.e., features 26—29). |
Conclusions | However, our logistic regression model reveals that epistemological and other features can usefully augment the traditional sentiment and subjectivity features for addressing the difficult task of identifying the bias-inducing word in a biased sentence. |
Data | Deduplication removes 8.5% of articles.5 For topic filtering, we apply a series of keyword filters to remove sports and finance news, and also apply a text classifier for diplomatic and military news, trained on several hundred manually labeled news articles (using El-regularized logistic regression with unigram and bigram features). |
Experiments | We also create a baseline El-regularized logistic regression that uses normalized dependency path counts as the features (10,457 features). |
Experiments | The verb-path logistic regression performs strongly at AUC 0.62; it outperforms all of the vanilla frame models. |
Experiments | Green line is the verb-path logistic regression baseline. |
Experiments | (2013) use logistic regression ). |
Model and Feature Extraction | In addition, decision-tree classifiers learn nonlinear responses to inputs and often outperform logistic regression (Perlich et al., 2003).9 Our random forest classifier models the probability that the input syntactic relation is metaphorical. |
Model and Feature Extraction | (2013), we use a logistic regression classifier to propagate abstractness and imageability scores from MRC ratings to all words for which we have vector space representations. |
Model and Feature Extraction | 9In our experiments, random forests model slightly outperformed logistic regression and SVM classifiers. |
Experiments | o LRl, our most basic logistic regression baseline, uses only bag of words (BOW) features. |
Experiments | o LR-(W2V) is a logistic regression model trained on the average of the pretrained word embeddings for each sentence (Section 2.2). |
Experiments | The RNN framework, adding phrase-level data, and initializing with word2vec all improve performance over logistic regression baselines. |
Data | With this featurization and training data, we train a binary logistic regression classifier with 61 regularization (Where negative examples are comprised of all character entities in the previous 100 words not labeled as the true antecedent). |
Model | Since each multiplicand involves a binary prediction, we can avoid partition functions and use the classic binary logistic regression.7 We have converted the V-way multiclass logistic regression problem of Eq. |
Model | 7Recall that logistic regression lets PLR(y = 1 I :13, fl) = logit_1(a:Tfl) = 1/(1 —|— exp —:1:Tfl) for binary dependent variable y, independent variables :13, and coefficients fl. |
Model | This equates to solving 4V El-regularized logistic regressions (see Eq. |
Experiments | For multi-class classification, one possible extension is to use a multinomial logistic regression model for categorical variables Y by using topic representations Z as input features. |
Experiments | In fact, this is harder than the multinomial Bayesian logistic regression , which can be done via a coordinate strategy (Polson et al., 2012). |
Introduction | More specifically, we extend Polson’s method for Bayesian logistic regression (Polson et al., 2012) to the generalized logistic supervised topic models, which are much more challeng- |
Logistic Supervised Topic Models | Moreover, the latent variables Z make the inference problem harder than that of Bayesian logistic regression models (Chen et al., 1999; Meyer and Laud, 2002; Polson et al., 2012). |
Discussion and future work | While past research has used logistic regression as a binary classifier (Newman et al., 2003), our experiments show that the best-performing classifiers allow for highly nonlinear class boundaries; SVM and RF models achieve between 62.5% and 91.7% accuracy across age groups — a significant improvement over the baselines of LR and NB, as well as over previous results. |
Related Work | These features were obtained with the Linguistic Inquiry and Word Count (LIWC) tool and used in a logistic regression classifier which achieved, on average, 61% accuracy on test data. |
Results | We evaluate five classifiers: logistic regression (LR), a multilayer perceptron (MLP), nai've Bayes (NB), a random forest (RF), and a support vector machine (SVM). |
Results | Here, na‘1've Bayes, which assumes conditional independence of the features, and logistic regression , which has a linear decision boundary, are baselines. |
Experimental Results | Two representative methods were used as baselines: the generative model proposed by (Brill and Moore, 2000) referred to as generative and the logistic regression model proposed by (Okazaki et al., 2008) |
Experimental Results | When using their method for ranking, we used outputs of the logistic regression model as rank scores. |
Introduction | (2008) proposed using a logistic regression model for approximate dictionary matching. |
Related Work | (2008) utilized substring substitution rules and incorporated the rules into a L1-regularized logistic regression model. |
Analysis using a joint model | To model data with a binary dependent variable, a logistic regression model is an appropriate choice. |
Analysis using a joint model | In logistic regression , we model the log odds as a linear combination of feature values 2130 . |
Analysis using a joint model | Standard logistic regression models assume that all categorical features are fixed efi‘ects, meaning that all possible values for these features are known in advance, and each value may have an arbitrarily different effect on the outcome. |
Cue Discovery for Content Selection | Before describing extensions to the baseline logistic regression model, we define notation. |
Cue Discovery for Content Selection | We define classifiers as functions f (:E —> y E Y); in practice, we use logistic regression via LibLINEAR (Fan et al., 2008). |
Experimental Results | We compare our methods against baselines including a majority baseline, a baseline logistic regression classifier with L2 regularized features, and two common ensemble methods, AdaBoost (Freund and Schapire, 1996) and bagging (Breiman, 1996) with logistic regression base c1assifiers5. |
Definitions and Related Work | predicting asker-rated quality of answers was evaluated by using a logistic regression model. |
Experimental Results | We adapt the Support Vector Machine (SVM) and Logistic Regression (LR) which have been reported to be effective for classification and the Linear CRF (LCRF) which is used to summarize ordinary text documents in (Shen et al., 2007) as baselines for comparison. |
Introduction | The experimental results show that the proposed model improve the performance signifi-cantly(in terms of precision, recall and F1 measures) as well as the ROUGE-l, ROUGE-2 and ROUGE-L measures as compared to the state-of-the-art methods, such as Support Vector Machines (SVM), Logistic Regression (LR) and Linear CRF (LCRF) (Shen et al., 2007). |
Experiments and Results | We ran all of our experiments in Weka (Hall et al., 2009) using logistic regression over 10 runs of 10—fold cross-validation. |
Experiments and Results | We use logistic regression as our classifier because it has been shown that logistic regression typically has lower asymptotic error than naive Bayes for multiple classification tasks as well as for text classification (Ng and Jordan, 2002). |
Experiments and Results | We experimented with an SVM classifier and found logistic regression to do slightly better. |
Domain Adaptation | In our experiments, we used L2 reg-ularised logistic regression . |
Experiments and Results | The L-BFGS (Liu and Nocedal, 1989) method is used to train the CRF and logistic regression models. |
Experiments and Results | Specifically, in POS tagging, a CRF trained on source domain labeled sentences is applied to target domain test sentences, whereas in sentiment classification, a logistic regression classifier trained using source domain labeled reviews is applied to the target domain test reviews. |
ConceptResolver | Both the string similarity classifier and the relation classifier are trained using Lg-regularized logistic regression . |
ConceptResolver | As we trained both classifiers using logistic regression , we have models for the probabilities P(Y|X1) and P(Y|X2). |
ConceptResolver | (typically poorly calibrated) probability estimates of logistic regression . |
Experiments | To train our system on binarized data, we replaced the £2 -regularized linear regression model with an 62-regularized logistic regression and used Kendall’s 7' rank correlation between the predicted probabilities of the positive class and the binary gold standard labels as the grid search metric (§3.1) instead of Pearson’s 7“. |
Experiments | Therefore, we used the same learning algorithms as for our system (i.e., ridge regression for the ordinal task and logistic regression for the binary task).14 |
Experiments | 14In preliminary experiments, we observed little difference in performance between logistic regression and the original support vector classifier used by the system from Post (201 l). |
Analysis | To verify the quality of our subjectivity features, we measured their performance as predictors in a logistic regression classifying the 3,250 adjectives labelled as subjective or not in the Wilson et al. |
CONC (noun concreteness) | ( logistic regression with 10-fold cross-validation), we test whether our lexical representations based on subjectivity and concreteness convey sufficient information to perform the same classification. |
CONC (noun concreteness) | We again aim to classify the remaining 211 intersective and 93 sub-sective pairs with a logistic regression . |
Abstract | In Markov logic framework, logistic regression based data-driven user intention modeling is introduced, and human dialog knowledge are designed into two layers such as domain and discourse knowledge, then it is integrated with the data-driven model in generation time. |
Overall architecture | The formulas for user intention modeling based on logistic regression |
Overall architecture | A logistic regression model is used for the statistical user intention model in Markov logic. |
Bilingual Lexicon Extraction | model (Lin), a generalized linear model which is the logistic regression model (Logit) and non linear regression models such as polynomial regression model (Polyn) of order n. Given an input vector cc E R", where $1,...,:cm represent features, we find a prediction 3) E R" for the co-occurrence count of a couple of words 3/ E R using one of the regression models presented below: |
Experiments and Results | We contrast the simple linear regression model (Lin) with the second and the third order polynomial regressions (Poly2 and P0ly3) and the logistic regression model (Logit). |
Experiments and Results | This suggests that both linear and polynomial regressions are suitable as a preprocessing step of the standard approach, while the logistic regression seems to be inappropriate according to the results shown in Table 6. |
Application of the vector-space model | This measure of density was used as a factor in logistic regression to predict the first occurrence of a verb in the construction, coded as the binary variable OCCURRENCE, set to 1 for the first period in which the verb is attested in the construction, and to 0 for all preceding periods (later periods were discarded). |
Application of the vector-space model | Table 1: Summary of logistic regression results for different values of N. Model formula: OCCURRENCE ~ DENSITY + (1 + DENSITY|VERB). |
Conclusion | Using multidimensional scaling and logistic regression , it was shown that the occurrence of new items throughout the history of the construction can be predicted by the density of the semantic space in the neighborhood of these items in prior usage. |
Available at http://nlp. stanford.edu/software/mimlre. shtml. | form logistic regression baseline. |
Introduction | While prior work employed tens of thousands of human labeled examples (Zhang et al., 2012) and only got a 6.5% increase in F-score over a logistic regression baseline, our approach uses much less labeled data (about 1/8) but achieves much higher improvement on performance over stronger baselines. |
Training | 2All classifiers are implemented using L2-regularized logistic regression with Stanford CoreNLP package. |
Experimental Evaluation | We use logistic regression (LR) with L2 regularization (Fan et al., 2008) and the SVMWWW (SVM) system (Joachims, 2007) with its default settings as the classifiers. |
Proposed Tri-Training Algorithm | Many classification algorithms give such scores, e.g., SVM and logistic regression . |
Related Work | On developing effective learning techniques, supervised classification has been the dominant approach, e.g., neural networks (Graham et al., 2005; Zheng et al., 2006), decision tree (Uzuner and Katz, 2005; Zhao and Zobel, 2005), logistic regression (Madigan et al., 2005), SVM (Diederich et al., 2000; Gamon 2004; Li et al., 2006; Kim et al., 2011), etc. |
Abstract | Furthermore, using a product of experts (Hinton, 2002), we combine the model with a complementary logistic regression model based on state-of-the-art lexical overlap features. |
Introduction | We use a product of experts (Hinton, 2002) to bring together a logistic regression classifier built from n-gram overlap features and our syntactic model. |
Product of Experts | Probabilistic Lexical Overlap Model We devised a logistic regression (LR) model incorporating 18 simple features, computed directly from 51 and 52, without modeling any hidden correspondence. |
Classification results | We used the 192 sets from multi-document summarization DUC evaluations in 2002 (55 generic sets), 2003 (30 generic summary sets and 7 viewpoint sets) and 2004 (50 generic and 50 biography sets) to train and test a logistic regression classifier. |
Classification results | Table 6: Logistic regression classification results (accuracy, precision, recall and f—measure) for balanced data of 100-Word summaries from DUC’02 through DUC’04. |
Conclusions | Experiments with a logistic regression classifier based on the features further confirms that input cohesiveness is predictive of the difficulty it will pose to automatic summarizers. |