Abstract | We applied an error detection and correction technique to the results of positive and negative documents classified by the Support Vector Machines ( SVM ). |
Framework of the System | As error candidates, we focus on support vectors (SVs) extracted from the training documents by SVM . |
Framework of the System | Training by SVM is performed to find the optimal hyperplane consisting of SVs, and only the SVs affect the performance. |
Framework of the System | We set these selected documents to negative training documents (N1), and apply SVM to learn classifiers. |
Introduction | uses soft-margin SVM as the underlying classifiers (Liu et al., 2003). |
Introduction | They reported that the results were comparable to the current state-of-the-art biased SVM method. |
Introduction | Like much previous work on semi-supervised ML, we apply SVM to the positive and unlabeled data, and add the classification results to the training data. |
Background | Support vector machines ( SVM ) is a popular classification algorithm that attempts to find a decision boundary between classes that is the farthest from any point in the training dataset. |
Background | Given labeled training data (mt,yt),t = 1, ...,N where 30,; 6 RM and y; E {1, —1}, SVM tries to find a separating hyperplane with the maximum margin (Platt, 1998). |
Experiments | SVM was chosen as the classification algorithm as it was shown that it performs well in text classification tasks (J oachims, 1998; Yang and Liu, 1999) and it is robust to overfitting (Sebastiani, 2002). |
Experiments | Accordingly, the raw text of the reports and topic vectors are compiled into individual files with their corresponding outcomes in ARFF and then classified with SVM . |
Related Work | tor classification results with SVM , however (Sri-urai, 2011) uses a fixed number of topics, whereas we evaluated different number of topics since typically this is not known in advance. |
Results | Classification results using ATC and SVM are shown in Figures 2, 3, and 4 for precision, recall, and f-score respectively. |
Results | Best classification performance was achieved with 15 topics for ATC and 100 topics for SVM . |
Results | For smaller number of topics, ATC performed better than SVM . |
Discussion | Whilst the thresholding and simplify everything methods were not significantly different from each other, the SVM method was significantly different from the other two (p < 0.001). |
Discussion | This can be seen in the slightly lower recall, yet higher precision attained by the SVM . |
Discussion | This indicates that the SVM was better at distinguishing between complex and simple words, but also wrongly identified many CWs. |
Experimental Design | Support vector machines ( SVM ) are statistical classifiers which use labelled training data to predict the class of unseen inputs. |
Experimental Design | The training data consist of several features which the SVM uses to distinguish between classes. |
Experimental Design | The SVM was chosen as it has been used elsewhere for similar tasks (Gasperin et al., 2009; Hancke et al., 2012; J auhar and Specia, 2012). |
Results | Everything Thresholding SVM |
Results | To analyse the features of the SVM , the correlation coefficient between each feature vector and the vector of feature labels was calculated. |
Experiment | We re-implemented Xia and Wong (2008)’s extended Support Vector Machine ( SVM ) based microtext IWR system to compare with our method. |
Experiment | Both the SVM and DT models are provided by the Weka3 (Hall et al., 2009) toolkit, using its default configuration. |
Experiment | Adapted SVM for Joint Classification. |
Discussion and future work | RF MLP RF MLP RF SVM |
Discussion and future work | While past research has used logistic regression as a binary classifier (Newman et al., 2003), our experiments show that the best-performing classifiers allow for highly nonlinear class boundaries; SVM and RF models achieve between 62.5% and 91.7% accuracy across age groups — a significant improvement over the baselines of LR and NB, as well as over previous results. |
Related Work | Two classifiers, Nai've Bayes (NB) and a support vector machine (SVM), were applied on the tokenized and stemmed statements to obtain best classification accuracies of 70% (abortion topic, NB), 67.4% (death penalty topic, NB), and 77% (friend description, SVM ), where the baseline was taken to be 50%. |
Related Work | The authors note this as well by demonstrating significantly lower results of 59.8% for NB and 57.8% for SVM when cross-topic classification is performed by training each classifier on two topics and testing on the third. |
Results | We evaluate five classifiers: logistic regression (LR), a multilayer perceptron (MLP), nai've Bayes (NB), a random forest (RF), and a support vector machine ( SVM ). |
Results | The SVM is a parametric binary classifier that provides highly nonlinear decision boundaries given particular kernels. |
Results | The SVM classifier |
Conclusion | Model MAE RMSE p 0.5596 0.7053 MA 0.5184 0.6367 us 0.5888 0.7588 MT 0.6300 0.8270 Pooled SVM 0.5823 0.7472 Independent A SVM 0.5058 0.6351 EasyAdapt SVM 0.7027 0.8816 SINGLE-TASK LEARNING Independent A 0.5091 0.6362 Independents 0.5980 0.7729 Pooled 0.5834 0.7494 Pooled & {N} 0.4932 0.6275 MULTITASK LEARNING: Annotator Combined A 0.4815 0.6174 CombinedA & {N} 0.4909 0.6268 Combined+A 0.4855 0.6203 Combined+A & {N} 0.4833 0.6102 MULTITASK LEARNING: Translation system Combineds 0.5825 0.7482 MULTITASK LEARNING: Sentence pair CombinedT 0.5813 0.7410 MULTITASK LEARNING: Combinations Combined A, 5 0.4988 0.6490 Combined A, s & {N A, 5} 0.4707 0.6003 Combined+A, 5 0.4772 0.6094 Combined 14,51 0.4588 0.5852 Combined A, s,T & {N A, 5} 0.4723 0.6023 |
Gaussian Process Regression | In typical usage, the kernel hyperparameters for an SVM are fit using held-out estimation, which is inefficient and often involves tying together parameters to limit the search complexity (e.g., using a single scale parameter in the squared exponential). |
Gaussian Process Regression | Multiple-kemel learning (Go'nen and Alpaydin, 2011) goes some way to addressing this problem within the SVM framework, however this technique is limited to reweighting linear combinations of kernels and has high computational complexity. |
Multitask Quality Estimation 4.1 Experimental Setup | Baselines: The baselines use the SVM regression algorithm with radial basis function kernel and parameters 7, e and C optimised through grid-search and 5-fold cross validation on the training set. |
Multitask Quality Estimation 4.1 Experimental Setup | a 0.8279 0.9899 SVM 0.6889 0.8201 |
Multitask Quality Estimation 4.1 Experimental Setup | ,a is a baseline which predicts the training mean, SVM uses the same system as the WMT12 QE task, and the remainder are GP regression models with different kernels (all include additive noise). |
Discussion | We present here our model of text classification and compare it with SVM and KNN on two datasets. |
Discussion | Moreover, the QC performs well in text classification compared with SVM and KNN and outperforms them on small-scale training sets. |
Experiment | We compared the performance of QC with several classical classification methods, including Support Vector Machine ( SVM ) and K-nearest neighbor (KNN). |
Experiment | We randomly selected training samples from the training pool ten times to train QC, SVM , and KNN classifier respectively and then verified the three trained classifiers on the testing sets, the results of which are illustrated in Figure 4. |
Experiment | We noted that the QC performed better than both KNN and SVM on small-scale training sets, when the number of training samples is less than 50. |
Introduction | We focus on large-margin methods such as SVM (Joachims, 1998) and passive-aggressive algorithms such as MIRA. |
The Relative Margin Machine in SMT | It is maximized by minimizing the norm in SVM , or analogously, the proximity constraint in MIRA: arg minW — wt||2. |
The Relative Margin Machine in SMT | RMM was introduced as a generalization over SVM that incorporates both the margin constraint |
The Relative Margin Machine in SMT | Nonetheless, since structured RMM is a generalization of Structured SVM , which shares its underlying objective with MIRA, our intuition is that SMT should be able to benefit as well. |
Experiments and evaluation | As mentioned before, this classifier is a linear SVM . |
Improving a distributional thesaurus | More precisely, we follow (Lee and Ng, 2002), a reference work for WSD, by adopting a Support Vector Machines ( SVM ) classifier with a linear kernel and three kinds of features for characterizing each considered occur- |
Improving a distributional thesaurus | For the second type of features, we take more precisely the POS of the three words before E and those of the three words after E. Each pair {POS, position} corresponds to a binary feature for the SVM classifier. |
Improving a distributional thesaurus | Each instance of the 11 types of collocations is represented by a tuple (lemmal, positionl, lemma2, position2> and leads to a binary feature for the SVM classifier. |
Abstract | For classification we use an SVM binary classifier and training data taken from the EUROVOC thesaurus. |
Feature extraction | To align or map source and target terms we use an SVM binary classifier (J oachims, 2002) with a linear kernel and the tradeoff between training error and margin parameter c = 10. |
Method | For classification purposes we use an SVM binary classifier. |
Predicting politeness | The BOW classifier is an SVM using a unigram feature representation.6 We consider this to be a strong baseline for this new |
Predicting politeness | is an SVM using the linguistic features listed in Table 3 in addition to the unigram features. |
Predicting politeness | For new requests, we use class probability estimates obtained by fitting a logistic regression model to the output of the SVM (Witten and Frank, 2005) as predicted politeness scores (with values between 0 and l; henceforth politeness, by abuse of language). |
Discussion | As before, the linguistic, acoustic, and visual features are averaged over the entire video, and we use an SVM classifier in tenfold cross validation experiments. |
Experiments and Results | We use the entire set of 412 utterances and run ten fold cross validations using an SVM classifier, as implemented in the Weka toolkit.5 In line with previous work on emotion recognition in speech (Haq and Jackson, 2009; Anagnostopoulos and Vovoli, 2010) where utterances are selected in a speaker dependent manner (i.e., utterances from the same speaker are included in both training and test), as well as work on sentence-level opinion classification where document boundaries are not considered in the split performed between the training and test sets (Wilson et al., 2004; Wiegand and Klakow, 2009), the training/test split for each fold is performed at utterance level regardless of the video they belong to. |
Multimodal Sentiment Analysis | These simple weighted unigram features have been successfully used in the past to build sentiment classifiers on text, and in conjunction with Support Vector Machines ( SVM ) have been shown to lead to state-of-the-art performance (Maas et al., 2011). |
A Machine Learning based approach | We use the SVM classifier with features generated using the following steps. |
Conclusions and Future Work | This ontology guides a rule based approach to thwarting detection, and also provides features for an SVM based learning system. |
Results | We used the CVX3 library in Matlab to solve the optimization problem for learning weights and the LIBSVM4 library to implement the svm classifier. |
Attribute-based Classification | We used an L2-regularized L2-loss linear SVM (Fan et a1., 2008) to learn the attribute predictions. |
Attribute-based Classification | data was randomly split into a training and validation set of equal size in order to find the optimal cost parameter C. The final SVM for the attribute was trained on the entire training data, i.e., on all positive and negative examples. |
Attribute-based Classification | The SVM learners used the four different feature types proposed in Farhadi et al. |
Experiments | For gLDA, we learn a binary linear SVM on its topic representations using SVMLight (J oachims, 1999). |
Experiments | The results of DiscLDA (Lacoste-Jullien et al., 2009) and linear SVM on raw bag-of-words features were reported in (Zhu et al., 2012). |
Experiments | The fact that gLDA+SVM performs better than the standard gSLDA is due to the same reason, since the SVM part of gLDA+SVM can well capture the supervision information to learn a classifier for good prediction, while standard sLDA can’t well-balance the influence of supervision. |