Introduction | We formulate syllabification as a tagging problem, and learn a discriminative tagger from labeled data using a structured support vector machine ( SVM ) (Tsochantaridis et al., 2004). |
Structured SVMs | A structured support vector machine ( SVM ) is a large-margin training method that can learn to predict structured outputs, such as tag sequences or parse trees, instead of performing binary classification (Tsochantaridis et al., 2004). |
Structured SVMs | We employ a structured SVM that predicts tag sequences, called an SVM Hidden Markov Model, or SVM-HMM. |
Structured SVMs | This approach can be considered an SVM because the model parameters are trained discrimi-natively to separate correct tag sequences from incorrect ones by as large a margin as possible. |
Syllabification with Structured SVMs | The SVM framework is less restrictive: we can include 0 as an emission feature, but we can also include features indicating that the preceding and following letters are m and r respectively. |
Experiments | The baselines are standard squared-loss linear regression, linear kernel SVM, and nonlinear (Gaussian) kernel SVM . |
Experiments | We use the Statistical Toolbox’s linear regression implementation in Matlab, and LibSVM (Chang and Lin, 2011) for training and testing the SVM models. |
Experiments | The hyperparameter C in linear SVM, and the 7 and C hyperparameters in Gaussian SVM are tuned on the training set using 10-fold cross-validation. |
Introduction | 0 Our results significantly outperform standard linear regression and strong SVM baselines. |
Related Work | (2003) are among the first to study SVM and text mining methods in the market prediction domain, where they align financial news articles with multiple time series to simulate the 33 stocks in the Hong Kong Hang Seng Index. |
Related Work | (2009) model the SEC-mandated annual reports, and performs linear SVM regression with e-insensitive loss function to predict the measured volatility. |
Related Work | Traditional discriminative models, such as linear regression and linear SVM , have been very popular in various text regression tasks, such as predicting movie revenues from reviews (Joshi et al., 2010), understanding the geographic lexical variation (Eisenstein et al., 2010), and predicting food prices from menus (Chahuneau et al., 2012). |
Experimental Evaluation | We use logistic regression (LR) with L2 regularization (Fan et al., 2008) and the SVMWWW ( SVM ) system (Joachims, 2007) with its default settings as the classifiers. |
Experimental Evaluation | It self-trains two classifiers from the character 3- gram, lexical, and syntactic views using CNG and SVM classifiers (Kourtis and Stamatatos, 2011). |
Experimental Evaluation | The original method applied only CNG and SVM on the character n-gram view. |
Introduction | However, the self-training method in (Kourtis and Stamatatos, 2011) uses two classifiers (CNG and SVM ) on one view. |
Proposed Tri-Training Algorithm | Many classification algorithms give such scores, e.g., SVM and logistic regression. |
Related Work | On developing effective learning techniques, supervised classification has been the dominant approach, e.g., neural networks (Graham et al., 2005; Zheng et al., 2006), decision tree (Uzuner and Katz, 2005; Zhao and Zobel, 2005), logistic regression (Madigan et al., 2005), SVM (Diederich et al., 2000; Gamon 2004; Li et al., 2006; Kim et al., 2011), etc. |
Abstract | However, by using an SVM ranker to combine the realizer’s model score together with features from multiple parsers, including ones designed to make the ranker more robust to parsing mistakes, we show that significant increases in BLEU scores can be achieved. |
Abstract | Moreover, via a targeted manual analysis, we demonstrate that the SVM reranker frequently manages to avoid vicious ambiguities, while its ranking errors tend to affect fluency much more often than adequacy. |
Introduction | Consequently, we examine two reranking strategies, one a simple baseline approach and the other using an SVM reranker (J oachims, 2002). |
Introduction | Therefore, to develop a more nuanced self-monitoring reranker that is more robust to such parsing mistakes, we trained an SVM using dependency precision and recall features for all three parses, their n-best parsing results, and per-label precision and recall for each type of dependency, together with the realizer’s normalized perceptron model score as a feature. |
Introduction | With the SVM reranker, we obtain a significant improvement in BLEU scores over |
Reranking with SVMs 4.1 Methods | Similarly, we conjectured that large differences in the realizer’s perceptron model score may more reliably reflect human fluency preferences than small ones, and thus we combined this score with features for parser accuracy in an SVM ranker. |
Reranking with SVMs 4.1 Methods | Additionally, given that parsers may more reliably recover some kinds of dependencies than others, we included features for each dependency type, so that the SVM ranker might learn how to weight them appropriately. |
Reranking with SVMs 4.1 Methods | We trained the SVM ranker (J oachims, 2002) with a linear kernel and chose the hyper-parameter c, which tunes the tradeoff between training error and margin, with 6-fold cross-validation on the devset. |
Discussion and future work | RF MLP RF MLP RF SVM |
Discussion and future work | While past research has used logistic regression as a binary classifier (Newman et al., 2003), our experiments show that the best-performing classifiers allow for highly nonlinear class boundaries; SVM and RF models achieve between 62.5% and 91.7% accuracy across age groups — a significant improvement over the baselines of LR and NB, as well as over previous results. |
Related Work | Two classifiers, Nai've Bayes (NB) and a support vector machine (SVM), were applied on the tokenized and stemmed statements to obtain best classification accuracies of 70% (abortion topic, NB), 67.4% (death penalty topic, NB), and 77% (friend description, SVM ), where the baseline was taken to be 50%. |
Related Work | The authors note this as well by demonstrating significantly lower results of 59.8% for NB and 57.8% for SVM when cross-topic classification is performed by training each classifier on two topics and testing on the third. |
Results | We evaluate five classifiers: logistic regression (LR), a multilayer perceptron (MLP), nai've Bayes (NB), a random forest (RF), and a support vector machine ( SVM ). |
Results | The SVM is a parametric binary classifier that provides highly nonlinear decision boundaries given particular kernels. |
Results | The SVM classifier |
Experiment | We re-implemented Xia and Wong (2008)’s extended Support Vector Machine ( SVM ) based microtext IWR system to compare with our method. |
Experiment | Both the SVM and DT models are provided by the Weka3 (Hall et al., 2009) toolkit, using its default configuration. |
Experiment | Adapted SVM for Joint Classification. |
Discussion | Whilst the thresholding and simplify everything methods were not significantly different from each other, the SVM method was significantly different from the other two (p < 0.001). |
Discussion | This can be seen in the slightly lower recall, yet higher precision attained by the SVM . |
Discussion | This indicates that the SVM was better at distinguishing between complex and simple words, but also wrongly identified many CWs. |
Experimental Design | Support vector machines ( SVM ) are statistical classifiers which use labelled training data to predict the class of unseen inputs. |
Experimental Design | The training data consist of several features which the SVM uses to distinguish between classes. |
Experimental Design | The SVM was chosen as it has been used elsewhere for similar tasks (Gasperin et al., 2009; Hancke et al., 2012; J auhar and Specia, 2012). |
Results | Everything Thresholding SVM |
Results | To analyse the features of the SVM , the correlation coefficient between each feature vector and the vector of feature labels was calculated. |
Background | Support vector machines ( SVM ) is a popular classification algorithm that attempts to find a decision boundary between classes that is the farthest from any point in the training dataset. |
Background | Given labeled training data (mt,yt),t = 1, ...,N where 30,; 6 RM and y; E {1, —1}, SVM tries to find a separating hyperplane with the maximum margin (Platt, 1998). |
Experiments | SVM was chosen as the classification algorithm as it was shown that it performs well in text classification tasks (J oachims, 1998; Yang and Liu, 1999) and it is robust to overfitting (Sebastiani, 2002). |
Experiments | Accordingly, the raw text of the reports and topic vectors are compiled into individual files with their corresponding outcomes in ARFF and then classified with SVM . |
Related Work | tor classification results with SVM , however (Sri-urai, 2011) uses a fixed number of topics, whereas we evaluated different number of topics since typically this is not known in advance. |
Results | Classification results using ATC and SVM are shown in Figures 2, 3, and 4 for precision, recall, and f-score respectively. |
Results | Best classification performance was achieved with 15 topics for ATC and 100 topics for SVM . |
Results | For smaller number of topics, ATC performed better than SVM . |
Abstract | We applied an error detection and correction technique to the results of positive and negative documents classified by the Support Vector Machines ( SVM ). |
Framework of the System | As error candidates, we focus on support vectors (SVs) extracted from the training documents by SVM . |
Framework of the System | Training by SVM is performed to find the optimal hyperplane consisting of SVs, and only the SVs affect the performance. |
Framework of the System | We set these selected documents to negative training documents (N1), and apply SVM to learn classifiers. |
Introduction | uses soft-margin SVM as the underlying classifiers (Liu et al., 2003). |
Introduction | They reported that the results were comparable to the current state-of-the-art biased SVM method. |
Introduction | Like much previous work on semi-supervised ML, we apply SVM to the positive and unlabeled data, and add the classification results to the training data. |
Prediction Experiments | In the first experiment, we compare the prediction accuracy of our SME model to a widely used discriminative learner in NLP — the linear kernel support vector machine ( SVM )3. |
Prediction Experiments | In the second experiment, in addition to the linear kernel SVM , we also compare our SME model to a state-of-the-art sparse generative model of text (Eisenstein et al., 2011a), and vary the size of input vocabulary W exponentially from 29 to the full size of our training vocabulary4. |
Prediction Experiments | We use threefold cross-validation to infer the learning rate 6 and cost C hyperpriors in the SME and SVM model respectively. |
Related Work | Traditional discriminative methods, such as support vector machine ( SVM ) and logistic regression, have been very popular in various text categorization tasks (J oachims, 1998; Wang and McKe-own, 2010) in the past decades. |
Related Work | For example, SVM does not have latent variables to model the subtle differences and interactions of features from different domains (e.g. |
Experiment | MT—SVM: We translate the English labeled data to Chinese using Google Translate and use the translation results to train the SVM classifier for Chinese. |
Experiment | SVM: We train a SVM classifier on the Chinese labeled data. |
Experiment | First, two monolingual SVM classifiers are trained on English labeled data and Chinese data translated from English labeled data. |
Introduction | The experiment results show that CLMM yields 71% in accuracy when no Chinese labeled data are used, which significantly improves Chinese sentiment classification and is superior to the SVM and co-training based methods. |
Introduction | When Chinese labeled data are employed, CLMM yields 83% in accuracy, which is remarkably better than the SVM and achieve state-of-the-art performance. |
Related Work | (2002) compare the performance of three commonly used machine learning models (Naive Bayes, Maximum Entropy and SVM ). |
Related Work | Ga-mon (2004) shows that introducing deeper linguistic features into SVM can help to improve the performance. |
Related Work | English Labeled data are first translated to Chinese, and then two SVM classifiers are trained on English and Chinese labeled data respectively. |
Answer Grading System | Using each of these as features, we use Support Vector Machines ( SVM ) to produce a combined real-number grade. |
Answer Grading System | The weight vector u is trained to optimize performance in two scenarios: Regression: An SVM model for regression (SVR) is trained using as target function the grades assigned by the instructors. |
Answer Grading System | Ranking: An SVM model for ranking (SVMRank) is trained using as ranking pairs all pairs of student answers (AS,At) such that grade(Az-,AS) > grade(Az-,At), where A,- is the corresponding instructor answer. |
Discussion and Conclusions | The correlation for the BOW-only SVM model for SVMRank improved upon the best BOW feature |
Discussion and Conclusions | Likewise, using the BOW-only SVM model for SVR reduces the RMSE by .022 overall compared to the best BOW feature. |
Results | 5.4 SVM Score Grading |
Results | The SVM components of the system are run on the full dataset using a 12-fold cross validation. |
Results | Both SVM models are trained using a linear kernel.11 Results from both the SVR and the SVMRank implementations are reported in Table 7 along with a selection of other measures. |
Discussion | As expected, the structural methods on either skewed or flattened hierarchies are not significantly better than the flat SVM . |
Discussion | For the flattened hierarchy of 15 leaf genres the maximal accuracy is 54.2% vs. 52.4% for the flat SVM (Figure 3), a nonsignificant improvement. |
Experiments | As a baseline we use the accuracy achieved by a standard "flat" SVM. |
Experiments | A standard flat SVM achieves an accuracy of 64.4% whereas the best structural SVM based on Lin’s information content distance measure (IC-lin-word-bnc) achieves 68.8% accuracy, significantly better at the 1% level. |
Experiments | Table 1 summarizes the best performing measures that all outperform the flat SVM at the 1% level. |
Genre Distance Measures | The structural SVM (Section 2) requires a distance measure h between two genres. |
Structural SVMs | To strengthen the constraints, the zero value on the right hand side of the inequality for the flat SVM can be replaced by a positive value, corresponding to a distance measure h(yi, m) between two genre classes, leading to the following constraint: |
Empirical Evaluation 4.1 Evaluation Setup | SVM(CN): This method applies the inductive SVM with only Chinese features for sentiment classification in the Chinese view. |
Empirical Evaluation 4.1 Evaluation Setup | SVM(EN): This method applies the inductive SVM with only English features for sentiment classification in the English view. |
Empirical Evaluation 4.1 Evaluation Setup | SVM(ENCNI): This method applies the inductive SVM with both English and Chinese features for sentiment classification in the two views. |
Introduction | SVM , NB), and the classification performance is far from satisfactory because of the language gap between the original language and the translated language. |
Introduction | The SVM classifier is adopted as the basic classifier in the proposed approach. |
Related Work 2.1 Sentiment Classification | Standard Na'1've Bayes and SVM classifiers have been applied for subjectivity classification in Romanian (Mihalcea et al., 2007; Banea et al., 2008), and the results show that automatic translation is a viable alternative for the construction of resources and tools for subjectivity analysis in a new target language. |
Related Work 2.1 Sentiment Classification | To date, many semi-supervised learning algorithms have been developed for addressing the cross-domain text classification problem by transferring knowledge across domains, including Transductive SVM (Joachims, 1999), EM(Nigam et al., 2000), EM-based Na'1've Bayes classifier (Dai et al., 2007a), Topic-bridged PLSA (Xue et al., 2008), Co-Clustering based classification (Dai et al., 2007b), two-stage approach (Jiang and Zhai, 2007). |
The Co-Training Approach | Typical text classifiers include Support Vector Machine ( SVM ), Na'1've Bayes (NB), Maximum Entropy (ME), K-Nearest Neighbor (KNN), etc. |
The Co-Training Approach | In this study, we adopt the widely-used SVM classifier (Joachims, 2002). |
The Co-Training Approach | as two sets of vectors in a feature space, SVM constructs a separating hyperplane in the space by maximizing the margin between the two data sets. |
Abstract | We represent words as sequences of substrings, and use the substrings as features in a Support Vector Machine ( SVM ) ranker, which is trained to rank possible stress patterns. |
Automatic Stress Prediction | We use a support vector machine ( SVM ) to rank the possible patterns for each sequence (Section 3.2). |
Automatic Stress Prediction | These units are used to define the features and outputs used by the SVM ranker. |
Automatic Stress Prediction | The SVM can thus generalize from observed words to similarly-spelled, unseen examples. |
Introduction | We divide each word into a sequence of substrings, and use these substrings as features for a Support Vector Machine ( SVM ) ranker. |
Introduction | The task of the SVM is to rank the true stress pattern above the small number of acceptable alternatives. |
Introduction | The SVM ranker achieves exceptional 96.2% word accuracy on the challenging task of predicting the full stress pattern in English. |
Evaluation | Transductive SVM . |
Evaluation | Specifically, we begin by training an inductive SVM on one labeled example from each class, iteratively labeling the most uncertain unlabeled point on each side of the hyperplane and retraining the SVM until 100 points are labeled. |
Evaluation | Finally, we train a transductive SVM on the 100 labeled points and the remaining 1900 unlabeled points, obtaining the results in row 3 of Table 1. |
Our Approach | Specifically, we train a discriminative classifier using the support vector machine ( SVM ) learning algorithm (J oachims, 1999) on the set of unambiguous reviews, and then apply the resulting classifier to all the reviews in the training folds4 that are not seeds. |
Our Approach | As our weakly supervised learner, we employ a transductive SVM . |
Our Approach | Hence, instead of training just one SVM classifier, we aim to reduce classification errors by training an ensemble of five classifiers, each of which uses all 100 manually labeled reviews and a different subset of the 500 automatically labeled reviews. |
Data and Task | Dolan and Brockett (2005) remark that this corpus was created semiautomatically by first training an SVM classifier on a disjoint annotated 10,000 sentence pair dataset and then applying the SVM on an unseen 49,375 sentence pair corpus, with its output probabilities skewed towards over-identification, i.e., towards generating some false paraphrases. |
Experimental Evaluation | The SVM was trained to classify positive and negative examples of paraphrase using SVMlight (J oachims, 1999).8 Metaparameters, tuned on the development data, were the regularization constant and the degree of the polynomial kernel (chosen in [10—5, 102] and 1—5 respectively. |
Experimental Evaluation | It is unsurprising that the SVM performs very well on the MSRPC because of the corpus creation process (see Sec. |
Experimental Evaluation | 4) where an SVM was applied as well, with very similar features and a skewed decision process (Dolan and Brockett, 2005). |
Product of Experts | LR (like the QG) provides a probability distribution, but uses surface features (like the SVM ). |
Product of Experts | 2; this model is on par with the SVM , though trading recall in favor of precision. |
Product of Experts | We view it as a probabilistic simulation of the SVM more suitable for combination with the QG. |
Experiments | Features Model MRR Topl Tops Baseline — 42.3% 32.7% 54.5% QTCF SVM 51 .9% 44.6% 63.4% SSL 49.5% 43.1% 60.9% LexSem SVM 48.2% 40.6% 61.4% SSL 47.9% 40.1% 58.4% QComp SVM 54.2% 47.5% 64.3% SSL 51.9% 45.5% 62.4% |
Experiments | We performed manual iterative parameter optimization during training based on prediction accuracy to find the best k—nearest parameter for SSL, i.e., k = {3,5,10,20,50} , and best 0 = {10—2,..,102} and 7 2 {2‘2, ..,23} for RBF kernel SVM . |
Experiments | We applied SVM and our graph based SSL method with no summarization to learn models using labeled training and testing datasets. |
Feature Extraction for Entailment | The QC model is trained via support vector machines ( SVM ) (Vapnik, 1995) considering different features such as semantic headword feature based on variation of Collins rules, hypernym extraction via Lesk word disambiguation (Lesk, 1988), regular expressions for wh-word indicators, n-grams, word-shapes(capitals), etc. |
Graph Summarization | Using a separate learner, e.g., SVM (Vapnik, 1995), we obtain predicted outputs, Y5 = (33f, ..., 39214) of X 5 and append observed labels Y5 = Y5 U YL. |
Domain Adaptation in Sentiment Research | They applied an out-of-domain-trained SVM classifier to label examples from the target domain and then retrained the classifier using these new examples. |
Domain Adaptation in Sentiment Research | Depending on the similarity between domains, this method brought up to 15% gain compared to the baseline SVM . |
Experiments | Dataset Movie News Blogs PRs Dataset size 1066 800 800 1200 unigrams SVM 68.5 61.5 63.85 76.9 NB 60.2 59.5 60.5 74.25 nb features 5410 4544 3615 2832 bigrams SVM 59.9 63.2 61.5 75.9 NB 57.0 58.4 59.5 67.8 nb features 16286 14633 15182 12951 trigrams SVM 54.3 55.4 52.7 64.4 NB 53.3 57.0 56.0 69.7 nb features 20837 18738 19847 19132 |
Experiments | Table 4: Accuracy of SVM with unigram model |
Experiments | results depends on the genre and size of the n-gram: on product reviews, all results are statistically significant at oz 2 0.025 level; on movie reviews, the difference between NaVe Bayes and SVM is statistically significant at oz 2 0.01 but the significance diminishes as the size of the n- gram increases; on news, only bigrams produce a statistically significant (a = 0.01) difference between the two machine learning methods, while on blogs the difference between SVMs and NaVe Bayes is most pronounced when unigrams are used (a = 0.025). |
Factors Affecting System Performance | To our knowledge, the only work that describes the application of statistical classifiers ( SVM ) to sentence-level sentiment classification is (Gamon and Aue, 2005)1. |
Integrating the Corpus-based and Dictionary-based Approaches | Using then an SVM meta-classifier trained on a small number of target domain examples to combine the nine base classifiers, they obtained a statistically significant improvement on out-of-domain texts from book reviews, knowledge-base feedback, and product support services survey data. |
Lexicon-Based Approach | The baseline performance of the Lexicon-Based System (LBS) described above is presented in Table 5, along with the performance results of the in-domain- and out-of-domain-trained SVM classifier. |
Lexicon-Based Approach | Movies ‘ News ‘ Blogs ‘ PRs LBS 57.5 62.3 63.3 59.3 SVM in-dom. |
Lexicon-Based Approach | 68.5 61.5 63.85 76.9 SVM out-of-dom. |
Building a Discourse Parser | l SVM Training | ‘ Feature Extraction w Classification [ SVM Models (Binary and Multiclass) J v [ Scored RS sub-trees ] v ‘ Bottom-up Tree Construction |
Building a Discourse Parser | Support Vector Machines (SVM) (Vapnik, 1995) are used to model classifiers S and L. SVM refers to a set of supervised learning algorithms that are based on margin maximization. |
Building a Discourse Parser | This makes SVM well-fitted to treat classification problems involving relatively large feature spaces such as ours (% 105 features). |
Evaluation | 4.2 Raw SVM Classification |
Evaluation | Although our final goal is to achieve good performance on the entire tree-building task, a useful intermediate evaluation of our system can be conducted by measuring raw performance of SVM classifiers. |
Evaluation | Table 1: SVM Classifier performance. |
Features | Instrumental to our system’s performance is the choice of a set of salient characteristics (“features”) to be used as input to the SVM algorithm for training and classification. |
Conclusion | Model MAE RMSE p 0.5596 0.7053 MA 0.5184 0.6367 us 0.5888 0.7588 MT 0.6300 0.8270 Pooled SVM 0.5823 0.7472 Independent A SVM 0.5058 0.6351 EasyAdapt SVM 0.7027 0.8816 SINGLE-TASK LEARNING Independent A 0.5091 0.6362 Independents 0.5980 0.7729 Pooled 0.5834 0.7494 Pooled & {N} 0.4932 0.6275 MULTITASK LEARNING: Annotator Combined A 0.4815 0.6174 CombinedA & {N} 0.4909 0.6268 Combined+A 0.4855 0.6203 Combined+A & {N} 0.4833 0.6102 MULTITASK LEARNING: Translation system Combineds 0.5825 0.7482 MULTITASK LEARNING: Sentence pair CombinedT 0.5813 0.7410 MULTITASK LEARNING: Combinations Combined A, 5 0.4988 0.6490 Combined A, s & {N A, 5} 0.4707 0.6003 Combined+A, 5 0.4772 0.6094 Combined 14,51 0.4588 0.5852 Combined A, s,T & {N A, 5} 0.4723 0.6023 |
Gaussian Process Regression | In typical usage, the kernel hyperparameters for an SVM are fit using held-out estimation, which is inefficient and often involves tying together parameters to limit the search complexity (e.g., using a single scale parameter in the squared exponential). |
Gaussian Process Regression | Multiple-kemel learning (Go'nen and Alpaydin, 2011) goes some way to addressing this problem within the SVM framework, however this technique is limited to reweighting linear combinations of kernels and has high computational complexity. |
Multitask Quality Estimation 4.1 Experimental Setup | Baselines: The baselines use the SVM regression algorithm with radial basis function kernel and parameters 7, e and C optimised through grid-search and 5-fold cross validation on the training set. |
Multitask Quality Estimation 4.1 Experimental Setup | a 0.8279 0.9899 SVM 0.6889 0.8201 |
Multitask Quality Estimation 4.1 Experimental Setup | ,a is a baseline which predicts the training mean, SVM uses the same system as the WMT12 QE task, and the remainder are GP regression models with different kernels (all include additive noise). |
Authorship Attribution With LOWBOW Representations | For both types of representations we consider an SVM classifier under the one-vs-all formulation for facing the AA problem. |
Authorship Attribution With LOWBOW Representations | We consider SVM as base classifier because this method has proved to be very effective in a large number of applications, including AA (Houvardas and Stamatatos, 2006; Plakias and Stamatatos, 2008b; Plakias and Stamatatos, 2008a); further, since SVMs are kernel-based methods, they allow us to use local histograms for AA by considering kernels that work over sets of histograms. |
Authorship Attribution With LOWBOW Representations | We build a multiclass SVM classifier by considering the pairs of patterns-outputs associated to documents-authors. |
Experiments and Results | All our experiments use the SVM implementation provided by Canu et al. |
Experiments and Results | Columns show the true author for test documents and rows show the authors predicted by the SVM . |
Experiments and Results | The SVM with BOW representation of character n-grams achieved recognition rates of 40% and 50% for BL and JM respectively. |
Related Work | applied to this problem, including support vector machine ( SVM ) classifiers (Houvardas and Stamatatos, 2006) and variants thereon (Plakias and Stamatatos, 2008b; Plakias and Stamatatos, 2008a), neural networks (Tearle et al., 2008), Bayesian classifiers (Coyotl-Morales et al., 2006), decision tree methods (Koppel et al., 2009) and similarity based techniques (Keselj et al., 2003; Lambers and Veenman, 2009; Stamatatos, 2009b; Koppel et al., 2009). |
Related Work | In this work, we chose an SVM classifier as it has reported acceptable performance in AA and because it will allow us to directly compare results with previous work that has used this same classifier. |
Evaluation | We compared the Gibbs sampling compressor (GS) against a version of maximum a posteriori EM (with Dirichlet parameter greater than 1) and a discriminative STSG based on SVM training (Cohn and Lapata, 2008) ( SVM ). |
Evaluation | EM is a natural benchmark, while SVM is also appropriate since it can be taken as the state of the art for our task.4 |
Evaluation | Nonetheless, because the comparison system is a generalization of the extractive SVM compressor of Cohn and Lapata (2007), we do not expect that the results would differ qualitatively. |
Introduction | We achieve substantial improvements against a number of baselines including EM, support vector machine ( SVM ) based discriminative training, and variational Bayes (VB). |
Large-Margin Learning Framework | As we will see, it is possible to learn {Wm} using standard support vector machine ( SVM ) training (holding A fixed), and then make a simple gradient-based update to A (holding {Wm} fixed). |
Large-Margin Learning Framework | As is standard in the multi-class linear SVM (Crammer and Singer, 2001), we can solve the problem defined in Equation 6 via Lagrangian optimization: |
Large-Margin Learning Framework | If A is fixed, then the optimization problem is equivalent to a standard multi-class SVM , in the transformed feature space f (vi; A). |
Corpus Details | As our reference algorithm, we used the current state-of-the-art system developed by Boulis and Ostendorf (2005) using unigram and bigram features in a SVM framework. |
Corpus Details | Table 12 Top 20 ngram features for gender, ranked by the weights assigned by the linear SVM model |
Corpus Details | After extracting the ngrams, a SVM model was trained via the SVMlight toolkit (J oachims, 1999) using the linear kernel with the default toolkit settings. |
Context and Answer Detection | SVM , can be employed, where each pair of question and candidate context will be treated as an instance. |
Experiments | Model H Prec(%) l Rec(%) 1 F1(%) 1 Context Detection SVM 75.27 68.80 71.32 C4.5 70.16 64.30 67.21 L—CRF 75.75 72.84 74.45 Answer Detection SVM 73.31 47.35 57.52 C4.5 65.36 46.55 54.37 L—CRF 63.92 58.74 61.22 |
Experiments | This experiment is to evaluate Linear CRF model (Section 3.1) for context and answer detection by comparing with SVM and C4.5(Quinlan, 1993). |
Experiments | For SVM , we use SVMlightanchims, 1999). |
Introduction | Experimental results show that 1) Linear CRFs outperform SVM and decision tree in both context and answer detection; 2) Skip-chain CRFs outperform Linear CRFs for answer finding, which demonstrates that context improves answer finding; 3) 2D CRF model improves the performance of Linear CRFs and the combination of 2D CRFs and Skip-chain CRFs achieves better performance for context detection. |
Related Work | (2007) used SVM to extract input-reply pairs from forums for chatbot knowledge. |
Experimental Setup | Baselines To evaluate the performance of our relation extraction, we compare against an SVM classifier8 trained on the Gold Relations. |
Experimental Setup | We test the SVM baseline in a leave-one-out fashion. |
Experimental Setup | Model F-score 0.4 _ ---- -- SVM F-score ---------- -- All-text F-score |
Introduction | Our results demonstrate the strength of our relation extraction technique — while using planning feedback as its only source of supervision, it achieves a precondition relation extraction accuracy on par with that of a supervised SVM baseline. |
Results | We also show the performance of the supervised SVM baseline. |
Results | Feature Analysis Figure 7 shows the top five positive features for our model and the SVM baseline. |
Results | Figure 7: The top five positive features on words and dependency types learned by our model (above) and by SVM (below) for precondition prediction. |
Conclusion and Future Work | Unlike previous proposed approaches, we introduce a convex objective for the semi-supervised learning algorithm by combining a convex structured SVM loss and a convex least square loss. |
Introduction | In particular, they present an algorithm for multi-class unsupervised and semi-supervised SVM learning, which relaxes the original non-convex objective into a close convex approximation, thereby allowing a global solution to be obtained. |
Introduction | More specifically, for the loss on the unlabeled data part, we substitute the original unsupervised structured SVM loss with a least squares loss, but keep constraints on the inferred prediction targets, which avoids trivialization. |
Introduction | ing semi-supervised convex objective to dependency parsing, and obtain significant improvement over the corresponding supervised structured SVM . |
Semi-supervised Convex Training for Structured SVM | Although semi-supervised structured SVM learning has been an active research area, semi-supervised structured SVMs have not been used in many real applications to date. |
Semi-supervised Convex Training for Structured SVM | By combining the convex structured SVM loss on labeled data (shown in Equation (5)) and the convex least squares loss on unlabeled data (shown in Equation (8)), we obtain a semi-supervised structured large margin loss |
Semi-supervised Structured Large Margin Objective | The objective of standard semi-supervised structured SVM is a combination of structured large margin losses on both labeled and unlabeled data. |
Abbreviation Disambiguation | : Maximum Entropy, SVM and C50. |
Abstract | An accuracy of 96.09% has been achieved by SVM . |
Experiments | Several well-known supervised ML methods have been selected: artificial neural networks (ANN), Nai've Bayes (NB), Support Vector Machines ( SVM ) and J48 (Witten and Frank, 1999) an improved variant of the C4.5 decision tree induction. |
Experiments | Table 2 shows that SVM achieved the best result with 96.09% accuracy. |
Experiments | ants ML Method ANN NB SVM J48 |
Experimental Results | We adapt the Support Vector Machine ( SVM ) and Logistic Regression (LR) which have been reported to be effective for classification and the Linear CRF (LCRF) which is used to summarize ordinary text documents in (Shen et al., 2007) as baselines for comparison. |
Experimental Results | Table 2 shows that our general CRF model based on question segmentation with group L1 regularization outperforms the baselines significantly in all three measures (gCRF—QS-ll is 13.99% better than SVM in precision, 9.77% better in recall and 11.72% better in F1 score). |
Experimental Results | We note that both SVM and LR, |
Introduction | The experimental results show that the proposed model improve the performance signifi-cantly(in terms of precision, recall and F1 measures) as well as the ROUGE-l, ROUGE-2 and ROUGE-L measures as compared to the state-of-the-art methods, such as Support Vector Machines ( SVM ), Logistic Regression (LR) and Linear CRF (LCRF) (Shen et al., 2007). |
Experiments | We also compare the results of SSMFLK with those of two supervised classification methods: Support Vector Machine ( SVM ) and Naive Bayes. |
Experiments | —e— Consistency Method 0.51 —I— Homonic—CMN + Green Function 0.45 7 + SVM |
Experiments | 6 f —9— SSMFLK 0.5 - —6— Consistency Method -—I— Homonic—CMN + Green Function 0.4 ' + SVM + Naive Bayes |
Related Work | Most work in machine learning literature on utilizing labeled features has focused on using them to generate weakly labeled examples that are then used for standard supervised learning: (Schapire et al., 2002) propose one such framework for boosting logistic regression; (Wu and Srihari, 2004) build a modified SVM and (Liu et al., 2004) use a combination of clustering and EM based methods to instantiate similar frameworks. |
Experimental Setup 4.1 Data Sets and Preprocessing | SVM: This method learns an SVM classifier for each language given the monolingual labeled data; the unlabeled data is not used. |
Experimental Setup 4.1 Data Sets and Preprocessing | Monolingual TSVM (TSVM-M): This method learns two transductive SVM (TSVM) classifiers given the monolingual labeled data and the monolingual unlabeled data for each language. |
Experimental Setup 4.1 Data Sets and Preprocessing | First, two monolingual SVM classifiers are built based on only the corresponding labeled data, and then they are bootstrapped by adding the most confident predicted examples from the unlabeled data into the training set. |
Introduction | maximum entropy and SVM classifiers) as well as two alternative methods for leveraging unlabeled data (transductive SVMs (Joachims, 1999b) and co-training (Blum and Mitchell, 1998)). |
Results and Analysis | Among the baselines, the best is Co-SVM; TSVMs do not always improve performance using the unlabeled data compared to the standalone SVM ; and TSVM-B outperforms TSVM-M except for Chinese in the second setting. |
Results and Analysis | 8Significance is tested using paired t-tests with p<0.05: denotes statistical significance compared to the corresponding performance of MaXEnt; * denotes statistical significance compared to SVM ; and r denotes statistical significance compared to Co-SVM. |
Discussion | We present here our model of text classification and compare it with SVM and KNN on two datasets. |
Discussion | Moreover, the QC performs well in text classification compared with SVM and KNN and outperforms them on small-scale training sets. |
Experiment | We compared the performance of QC with several classical classification methods, including Support Vector Machine ( SVM ) and K-nearest neighbor (KNN). |
Experiment | We randomly selected training samples from the training pool ten times to train QC, SVM , and KNN classifier respectively and then verified the three trained classifiers on the testing sets, the results of which are illustrated in Figure 4. |
Experiment | We noted that the QC performed better than both KNN and SVM on small-scale training sets, when the number of training samples is less than 50. |
Experiments | Methods: We evaluated the overall performance relative to the common SVM bag of words approach that can be ubiquitously found in text mining literature. |
Experiments | o SVM-TF: Uses a bag of words SVM with term frequency weights. |
Experiments | SVM-Delta-IDF: Uses a bag of words SVM classification with TF.Delta-IDF weights (Formula 2) in the feature vectors before training or testing an SVM . |
Related Work | (2012) propose an algorithm which first trains individual SVM classifiers on several small, class-balanced, random subsets of the dataset, and then reclassifies each training instance using a majority vote of these individual classifiers. |
Experiments | 10We use SVMlight (J oachims, 1999) to train our linear SVM classifiers |
Experiments | For SVM , models trained on POS and LIWC features achieve even lower accuracy than Unigram. |
Experiments | tive model, SAGE achieve much better results than SVM , and is around 0.65 accurate in the cross-domain task. |
Feature-based Additive Model | If we instead use SVM , for example, we would have to train classifiers one by one (due to the distinct features from different sources) to draw conclusions regarding the differences between Turker vs Expert vs truthful reviews, positive expert vs negative expert reviews, or reviews from different domains. |
Introduction | In the examples in Table l, we trained a linear SVM classifier on Ott’s Chicago-hotel dataset on unigram features and tested it on a couple of different domains (the details of data acquisition are illustrated in Section 3). |
Introduction | Table 1: SVM performance on datasets for a classifier trained on Chicago hotel review based on Unigram feature. |
Unsupervised Mining of Personal and Impersonal Views | We apply both support vector machine ( SVM ) and Maximum Entropy (ME) algorithms with the help of the SVM-light4 and Mallet5 tools. |
Unsupervised Mining of Personal and Impersonal Views | We find that ME performs slightly better than SVM on the average. |
Unsupervised Mining of Personal and Impersonal Views | Transductive SVM , which seeks the largest separation between labeled and unlabeled data through regularization (Joachims, 1999). |
Parameter Estimation Models | Continuous parameters are modeled with a linear regression model (LR), an M5’ model tree (M5), and a model based on support vector machines with a linear kernel ( SVM ). |
Parameter Estimation Models | We test a Naive Bayes classifier (NB), a j48 decision tree (J48), a nearest-neighbor classifier using one neighbor (NN), a Java implementation of the RIPPER rule-based learner (J RIP), the AdaBoost boosting algorithm (ADA), and a support vector machines classifier with a linear kernel ( SVM ). |
Parameter Estimation Models | Figure 3: SVM model with a linear kernel predicting the CONTENT POLARITY parameter. |
Abstract | Input: - L, labeled data set - U, unlabeled data set - n, batch size Output: - SVM , classifier Repeat: 1. |
Abstract | Train a single classifier SVM on L 2. |
Abstract | The objective is to learn SVM classifiers in both languages, denoted as SVMC and SVMe respectively, in a BAL fashion to improve their classification performance. |
Experiments and evaluation | As mentioned before, this classifier is a linear SVM . |
Improving a distributional thesaurus | More precisely, we follow (Lee and Ng, 2002), a reference work for WSD, by adopting a Support Vector Machines ( SVM ) classifier with a linear kernel and three kinds of features for characterizing each considered occur- |
Improving a distributional thesaurus | For the second type of features, we take more precisely the POS of the three words before E and those of the three words after E. Each pair {POS, position} corresponds to a binary feature for the SVM classifier. |
Improving a distributional thesaurus | Each instance of the 11 types of collocations is represented by a tuple (lemmal, positionl, lemma2, position2> and leads to a binary feature for the SVM classifier. |
Introduction | We focus on large-margin methods such as SVM (Joachims, 1998) and passive-aggressive algorithms such as MIRA. |
The Relative Margin Machine in SMT | It is maximized by minimizing the norm in SVM , or analogously, the proximity constraint in MIRA: arg minW — wt||2. |
The Relative Margin Machine in SMT | RMM was introduced as a generalization over SVM that incorporates both the margin constraint |
The Relative Margin Machine in SMT | Nonetheless, since structured RMM is a generalization of Structured SVM , which shares its underlying objective with MIRA, our intuition is that SMT should be able to benefit as well. |
Experiments | In SVM implementations, the tradeoff parameter between training error and margin was set to l for all experiments. |
Experiments | We compare our approaches to three state-of-the-art approaches including SVM with convolution tree kernels (Collins and Duffy, 2001), linear regression and SVM with linear kernels (Scholkopf and Smola, 2002). |
Experiments | The SVM with linear kernels and the linear regression model used the same features as the manifold models. |
Experiment | 4.3 Classification task — SVM |
Experiment | 4.3.1 Experimental Setting To test our SVM classifier, we perform the classification task. |
Experiment | Table 3: Average tenfold cross-validation accuracies of polarity classification task with SVM . |
Term Weighting and Sentiment Analysis | Specifically, we explore the statistical term weighting features of the word generation model with Support Vector machine ( SVM ), faithfully reproducing previous work as closely as possible (Pang et al., 2002). |
Experiments | We experiment with two machine-learning approaches: Naive Bayes and SVM . |
Experiments | We report the n-gram values for which the best results are obtained and the hyperparameters for SVM , c and 7. |
Experiments | The SVM produces better results for all languages except Portuguese, where the accuracy is equal. |
Our Approach | For SVM , we use the wrapper provided by Weka for LibSVM (Chang and Lin, 2011). |
Adaptive Pattern-based Bilingual Data Mining | Next, in the pattern learning module, those translation snippet pairs are used to find candidate patterns and then a SVM classifier is built to select the most useful patterns shared by most translation pairs in the whole text. |
Adaptive Pattern-based Bilingual Data Mining | After all pattern candidates are extracted, a SVM classifier is used to select the good ones: |
Adaptive Pattern-based Bilingual Data Mining | In this SVM model, each pattern candidate pi has the following four features: |
Overview of the Proposed Approach | Then a SVM classifier is trained to select good patterns from all extracted pattern candidates. |
Experiments and Results | We employ an SVM coreference resolver trained and tested on ACE 2005 with 79.5% Precision, 66.7% Recall and 72.5% F1 to label coreference mentions of the same named entity in an article. |
Incorporating Structural Syntactic Information | And thus an SVM classifier can be learned and then used for recognition. |
Introduction | Section 4 introduces the frame work for discourse recognition, as well as the baseline feature space and the SVM classifier. |
The Recognition Framework | The classifier learned by SVM is: |
The Recognition Framework | One advantage of SVM is that we can use tree kernel approach to capture syntactic parse tree information in a particular high-dimension space. |
Approach | We use Support Vector Machines ( SVM ) with linear kernel as our classifier. |
Approach | We use SVM with linear kernel as our classifier. |
Evaluation | Sentence Filtering Evaluation: We used Support Vector Machines ( SVM ) with linear kernel as our classifier. |
Evaluation | Sentence Classification Evaluation: We used SVM in this step as well. |
Evaluation | Author Name Replacement Evaluation: The classifier used in this task is also SVM . |
Experience Detection | While we tested several classifiers, we chose to use two different classifiers based on SVM and Logistic Regression for the final experimental results because they showed the best performance. |
Experience Detection | Logistic Feature Regression SVM |
Experience Detection | Logistic Feature Regression SVM |
Lexicon Construction | ME SVM Prec. |
Experiments | NB 41.0 81.8 BINB 41.9 83.1 SVM 40.7 79.4 REcNTN 45.7 85.4 MAX-TDNN 37.4 77.1 NBOW 42.4 80.5 DCNN 48.5 86.8 |
Experiments | SVM is a support vector machine with unigram and bigram features. |
Experiments | head word, parser SVM hypernyms, WordNet |
Substructure Spaces for BTKs | In the 1st phase, a kernel based classifier, SVM in our study, is employed to classify each candidate subtree pair as aligned or unaligned. |
Substructure Spaces for BTKs | Since SVM is a large margin based discriminative classifier rather than a probabilistic model, we introduce a sigmoid function to convert the distance against the hyperplane to a posterior alignment probability as follows: |
Substructure Spaces for BTKs | We use SVM with binary classes as the classifier. |
Experimental Setup | We learned the feature weights with a linear SVM , using the software SVM-OOPS (Woodsend and Gondzio, 2009). |
Experimental Setup | For each phrase, features were extracted and salience scores calculated from the feature weights determined through SVM training. |
Experimental Setup | The distance from the SVM hyperplane represents the salience score. |
Experimental Evaluation and Discussion | We set the degree of the kernels to 3 since cubic kernels with SVM have proved effective for Japanese dependency parsing (Kudo and Matsumoto, 2000; Kudo and Matsumoto, 2002). |
Experimental Evaluation and Discussion | Stopping Criteria It is known that increment rate of the number of support vectors in SVM indicates saturation of accuracy improvement during iterations of active learning (Schohn and Cohn, 2000). |
Experimental Evaluation and Discussion | It is interesting to examine whether the observation for SVM is also useful for support vectors7 of the averaged perceptron. |
CR + LS + DMM + DPM 39.32* +24% 47.86* +20% | For all experiments we used a linear SVM kernel.15 |
CR + LS + DMM + DPM 39.32* +24% 47.86* +20% | Table 4 shows that of the highest-weighted SVM features learned when training models for HOW questions on YA and Bio, many are shared (e.g., 56.5% of the features in the top half of both DPMs are shared), suggesting that a core set of discourse features may be of utility across domains. |
CR + LS + DMM + DPM 39.32* +24% 47.86* +20% | Table 4: Percentage of top features with the highest SVM weights that are shared between Bio HOW and YA models. |
Experimental Settings | The PZD system relies on a set of SVM classifiers trained using morphological and lexical features. |
Experimental Settings | The SVM classifiers are built using Yamcha (Kudo and Matsumoto, 2003). |
Experimental Settings | Simple features are used directly by the PZD SVM models, whereas Binned features’ (numerical) values are reduced to a small, labeled category set whose labels are used as model features. |
Approach Overview | In each of the first two steps, a binary SVM classifier is built to perform the classification. |
Experiments | In the experiments, we consider the positive and negative tweets annotated by humans as subjective tweets (i.e., positive instances in the SVM classifiers), which amount to 727 tweets. |
Related Work | According to the experimental results, machine learning based classifiers outperform the unsupervised approach, where the best performance is achieved by the SVM classifier with unigram presences as features. |
Related Work | In contrast, (Barbosa and Feng, 2010) propose a two-step approach to classify the sentiments of tweets using SVM classifiers with abstract features. |
Experiments and Results | fied; (2) for the identified query pairs, there should be sufficient statistics of associated clickthrough data; (3) The click frequency should be well distributed at both sides so that the preference order between bilingual document pairs can be derived for SVM learning. |
Introduction | For both languages, we achieve significant improvements over monolingual Ranking SVM (RSVM) baselines (Herbrich et al., 2000; J oachims, 2002), which exploit a variety of monolingual features. |
Learning to Rank Using Bilingual Information | We resort to Ranking SVM (RSVM) (Herbrich et al., 2000; Joachims, 2002) learning for classification on pairs of instances. |
Learning to Rank Using Bilingual Information | The problem is to solve SVM objective: rrgn + A 21.2]. |
Related Work | (2) SVM : The ngram features and Support Vector Machine are widely used baseline methods to build sentiment classifiers (Pang et al., 2002). |
Related Work | LibLinear is used to train the SVM classifier. |
Related Work | (3) NBSVM: NBSVM (Wang and Manning, 2012) is a state-of—the-art performer on many sentiment classification datasets, which trades-off between Naive Bayes and NB-enhanced SVM . |
Abstract | We applied a modified version of HITS algorithm and an SVM classifier trained with pseudo-relevant data for article analysis. |
Disputant relation-based method | As for the rest of the sentences, a similarity analysis is conducted with an SVM classifier. |
Disputant relation-based method | where SU: number of all sentences of the article Qi: number of quotes from the side i. Qij: number of quotes from either side i or j. Si: number of sentences classified to i by SVM . |
Introduction | We applied a modified version of HITS algorithm to identify the key opponents of an issue, and used disputant extraction techniques combined with an SVM classifier for article analysis. |
Results and Discussion | We compared the performance of the DPLVM with the CRFs and other baseline systems, including the heuristic system (Heu), the HMM model, and the SVM model described in $08, i.e., Sun et al. |
Results and Discussion | The SVM method described by Sun et al. |
Results and Discussion | In general, the results indicate that all of the sequential labeling models outperformed the SVM regression model with less training time.3 In the SVM regression approach, a large number of negative examples are explicitly generated for the training, which slowed the process. |
Conclusion | # Pos/Neg Lexicon SVM Hownet 627/1,038 0.737 0.756 Hownet+NW 743/ 1,150 0.770 0.779 Hownet+T100 679/ 1,172 0.761 0.774 cptHownet 138/125 0.738 0.758 cptHownet+NW 254/237 0.774 0.782 cptHownet+T100 190/159 0.764 0.775 |
Experiment | The second model is a SVM model in which opinion words are used as feature, and 5-fold cross validation is conducted. |
Experiment | 5 This is not necessary for the SVM model. |
Related Work | The TRE systems use techniques such as: Rules (Regulars, Patterns and Propositions) (Miller et al., 1998), Kernel method (Zhang et al., 2006b; Zelenko et al., 2003), Belief network (Roth and Yih, 2002), Linear programming (Roth and Yih, 2007), Maximum entropy (Kambhatla, 2004) or SVM (GuoDong et al., 2005). |
Related Work | (2005) introduced a feature based method, which utilized lexicon information around entities and was evaluated on Winnow and SVM classifiers. |
Related Work | For each type of these relations, a SVM was trained and tested independently. |
Discussion | We use three sentiment classification techniques: Na‘1've Bayes, MaxEnt and SVM with un-igrams, bigrams and trigrams as features. |
Discussion | 7http://scikit-learn.org/stable/ 8In case of SVM , the probability of predicted class is computed as given in Platt (1999). |
Discussion | MaxEnt (Movie) -0.29 (72.17) MaxEnt (Twitter) -0.26 (71.68) SVM (Movie) -().24 (66.27) SVM (Twitter) -().19 (73.15) |
Event Causality Extraction Method | An event causality candidate is given a causality score 0 8 core, which is the SVM score (distance from the hyperplane) that is normalized to [0,1] by the sigmoid function Each event causality candidate may be given multiple original sentences, since a phrase pair can appear in multiple sentences, in which case it is given more than one SVM score. |
Experiments | (2011): CEAWS is an unsupervised method that uses CEA to rank event causality candidates, and CEAsup is a supervised method using SVM and the CEA features, whose ranking is based on the SVM scores. |
Experiments | The baselines are as follows: Csuns is an unsupervised method that uses 03 for ranking, and Cssup is a supervised method using SVM with 03 as the only feature that uses SVM scores for ranking. |
Multilingual Subjectivity System | Previous studies have found that, among several ML-based approaches, the SVM classifier generally performs well in many subjectivity analysis tasks (Pang et al., 2002; Banea et al., 2008). |
Multilingual Subjectivity System | An SVM score (a margin or the distance from a learned decision boundary) with a positive value predicts the input as being subjective, and negative value as objective. |
Multilingual Subjectivity System | The second and the third approaches are carried out as follows: Corpus-based (T-CB): We translate the MPQA corpus into the target languages sentence by sentence using a web-based service.6 Using the same method for S-CB, we train an SVM model for each language with the translated training corpora. |
Empirical Evaluation | SVM ) should be separately created with regards to distinct features. |
Empirical Evaluation | We utilised SVanl‘;8 as an implementation of the Ranking SVM algorithm, in which the parameter c was set as 1.0 and the remaining parameters were set to their defaults. |
Reference Resolution using Extra-linguistic Information | Although the work by Denis and Baldridge (2008) uses Maximum Entropy to create their ranking-based model, we adopt the Ranking SVM algorithm (J oachims, 2002), which learns a weight vector to rank candidates for a given partial ranking of each referent. |
Predicting Direction of Power | Handling of undefined values for features in SVM is not straightforward. |
Predicting Direction of Power | Most SVM implementations assume the value of 0 by default in such cases, conflating them |
Predicting Direction of Power | Since we use a quadratic kernel, we expect the SVM to pick up the interaction between each feature and its indicator feature. |
Comparing the two Datasets | SMO is an implementation of Support Vector Machines ( SVM ), rules.JRip is the RIPPER algorithm, and bayes .NaiveBayes is a Naive Bayes classifier. |
Comparing the two Datasets | In task C, the SVM algorithm was also the best performing algorithm among those that were also tried on the English data, but decision trees produced even better results here. |
Comparing the two Datasets | The results are: in task A the lazy.KStar classifier scored 58.6%, and the SVM classifier scored 75.5% in task B and 59.4% in task C, with trees . |
Previous work | She also exploited a semi-supervised approach using Laplacian SVM classification on a small set of examples. |
Prosodic event detection method | Our previous supervised learning approach (Jeon and Liu, 2009) showed that a combined model using Neural Network (NN) classifier for acoustic-prosodic evidence and Support Vector Machine ( SVM ) classifier for syntactic-prosodic evidence performed better than other classifiers. |
Prosodic event detection method | We therefore use NN and SVM in this study. |
Introduction | Finally, new relation instances are extracted using kernel based classifiers, e. g., the SVM classifier. |
Introduction | We apply the one vs. others strategy for multiple classification using SVM . |
Introduction | For SVM training, the parameter C is set to 2.4 for all experiments, and the tree kernel parameter A is tuned to 0.2 for FTK and 0.4 (the optimal parameter setting used in Qian et al. |
Related work | In particular, starting from EDUs, at each step of the tree-building, a binary SVM classifier is first applied to determine which pair of adjacent discourse constituents should be merged to form a larger span, and another multi-class SVM classifier is then applied to assign the type of discourse relation that holds between the chosen pair. |
Related work | Also, the employment of SVM classifiers allows the incorporation of rich features for better data representation (Feng and Hirst, 2012). |
Related work | However, HILDA’s approach also has obvious weakness: the greedy algorithm may lead to poor performance due to local optima, and more importantly, the SVM classifiers are not well-suited for solving structural problems due to the difficulty of taking context into account. |
Experimental setup | We use a Ranking SVM (Si/Mug“ (Joachims, 2002)) to score summaries using our features. |
Experimental setup | The Ranking SVM seeks to minimize the number of discordant pairs (pairs in which the gold standard has :31 ranked strictly higher than :52, but the learner ranks x2 strictly higher than :01). |
Experimental setup | For system-level evaluation, we treat the real-valued output of the SVM ranker for each summary as the linguistic quality score. |
Experiments | the target-independent (SVM-indep) and target-dependent features and uses SVM as the classifier. |
Experiments | SVM-conn: The words, punctuations, emoti-cons, and #hashtags included in the converted dependency tree are used as the features for SVM . |
Experiments | AdaRNN-comb: We combine the root vectors obtained by AdaRNN-Wfli with the uni/bi-gram features, and they are fed into a SVM classifier. |
Machine Learning with Edit-Turn-Pairs | Baseline R. Forest SVM Accuracy .799 :|:.031 .866 :|:.026T .858 :|:.027T Fimac, NaN .789 1.032 .763 1.033 Precisionmac. |
Machine Learning with Edit-Turn-Pairs | A reduction of the feature set as judged by a X2 ranker improved the results for both Random Forest as well as the SVM , so we limited our feature set to the 100 best features. |
Machine Learning with Edit-Turn-Pairs | In a 10-fold cross-validation experiment, we tested a Random Forest classifier (Breiman, 2001) and an SVM (Platt, 1998) with polynomial kernel. |
Approach | In its basic form, a binary SVM classifier learns a linear threshold function that discriminates data points of two categories. |
Evaluation | We trained a SVM regression model with our full set of feature types and compared it to the SVM rank preference model. |
Introduction | In this paper, we report experiments on rank preference Support Vector Machines (SVMs) trained on a relatively small amount of data, on identification of appropriate feature types derived automatically from generic text processing tools, on comparison with a regression SVM model, and on the robustness of the best model to ‘outlier’ texts. |
Soft Constraints in Dual Decomposition | All we need to employ the structured perceptron algorithm (Collins, 2002) or the structured SVM algorithm (Tsochantaridis et al., 2004) is a black-box procedure for performing MAP inference in the structured linear model given an arbitrary cost vector. |
Soft Constraints in Dual Decomposition | This can be ensured by simple modifications of the perceptron and subgradient descent optimization of the structured SVM objective simply by truncating c coordinate-wise to be nonnegative at every learning iteration. |
Soft Constraints in Dual Decomposition | A similar analysis holds for the structured SVM ap- |
Experiments | For gLDA, we learn a binary linear SVM on its topic representations using SVMLight (J oachims, 1999). |
Experiments | The results of DiscLDA (Lacoste-Jullien et al., 2009) and linear SVM on raw bag-of-words features were reported in (Zhu et al., 2012). |
Experiments | The fact that gLDA+SVM performs better than the standard gSLDA is due to the same reason, since the SVM part of gLDA+SVM can well capture the supervision information to learn a classifier for good prediction, while standard sLDA can’t well-balance the influence of supervision. |
Attribute-based Classification | We used an L2-regularized L2-loss linear SVM (Fan et a1., 2008) to learn the attribute predictions. |
Attribute-based Classification | data was randomly split into a training and validation set of equal size in order to find the optimal cost parameter C. The final SVM for the attribute was trained on the entire training data, i.e., on all positive and negative examples. |
Attribute-based Classification | The SVM learners used the four different feature types proposed in Farhadi et al. |
A Machine Learning based approach | We use the SVM classifier with features generated using the following steps. |
Conclusions and Future Work | This ontology guides a rule based approach to thwarting detection, and also provides features for an SVM based learning system. |
Results | We used the CVX3 library in Matlab to solve the optimization problem for learning weights and the LIBSVM4 library to implement the svm classifier. |
Discussion | As before, the linguistic, acoustic, and visual features are averaged over the entire video, and we use an SVM classifier in tenfold cross validation experiments. |
Experiments and Results | We use the entire set of 412 utterances and run ten fold cross validations using an SVM classifier, as implemented in the Weka toolkit.5 In line with previous work on emotion recognition in speech (Haq and Jackson, 2009; Anagnostopoulos and Vovoli, 2010) where utterances are selected in a speaker dependent manner (i.e., utterances from the same speaker are included in both training and test), as well as work on sentence-level opinion classification where document boundaries are not considered in the split performed between the training and test sets (Wilson et al., 2004; Wiegand and Klakow, 2009), the training/test split for each fold is performed at utterance level regardless of the video they belong to. |
Multimodal Sentiment Analysis | These simple weighted unigram features have been successfully used in the past to build sentiment classifiers on text, and in conjunction with Support Vector Machines ( SVM ) have been shown to lead to state-of-the-art performance (Maas et al., 2011). |
Structured Learning | We use a soft-margin support vector machine ( SVM ) (Vapnik, 1998) objective over the full structured output space (Taskar et al., 2003; Tsochantaridis et al., 2004) of extractive and compressive summaries: |
Structured Learning | In our application, this approach efficiently solves the structured SVM training problem up to some specified tolerance 6. |
Structured Learning | Thus, if loss-augmented prediction turns up no new constraints on a given iteration, the current solution to the reduced problem, w and E, is the solution to the full SVM training problem. |
Abstract | Oberlander and Nowson explore using a Na'ive Bayes and an SVM classifier to perform binary classification of text on each personality dimension. |
Abstract | The results of the SVM classifier, shown in line (1) of Table 2, were fairly poor. |
Abstract | Training a multiclass SVM on the binned n-gram features from (5) produces 51.6% cross-validation accuracy on training data and 44.4% accuracy on the weighted test set (both numbers should be compared to a 33% baseline). |
Predicting politeness | The BOW classifier is an SVM using a unigram feature representation.6 We consider this to be a strong baseline for this new |
Predicting politeness | is an SVM using the linguistic features listed in Table 3 in addition to the unigram features. |
Predicting politeness | For new requests, we use class probability estimates obtained by fitting a logistic regression model to the output of the SVM (Witten and Frank, 2005) as predicted politeness scores (with values between 0 and l; henceforth politeness, by abuse of language). |
Abstract | For classification we use an SVM binary classifier and training data taken from the EUROVOC thesaurus. |
Feature extraction | To align or map source and target terms we use an SVM binary classifier (J oachims, 2002) with a linear kernel and the tradeoff between training error and margin parameter c = 10. |
Method | For classification purposes we use an SVM binary classifier. |
Experiments and Results | We experimented with an SVM classifier and found logistic regression to do slightly better. |
Related Work | They use an SVM classifier with only n-grams as features. |
Related Work | Nowson et al (2006) employed dictionary and n—gram based content analysis and achieved 91.5% accuracy using an SVM classifier. |
Experiments | This uses a set of binary SVM classifiers, one for each verb class (frame) 73. |
Experiments | In the classification phase the binary classifiers are applied by (i) only considering classes that are compatible with the target verbs; and (ii) selecting the class associated with the maximum positive SVM margin. |
Model Analysis and Discussion | In line with the method discussed in (Pighin and Moschitti, 2009b), these fragments are extracted as they appear in most of the support vectors selected during SVM training. |