Index of papers in Proc. ACL 2014 that mention

**SVM**

1. A Semiparametric Gaussian Copula Regression Model for Predicting Financial Risks from Earnings Calls

Experiments | The baselines are standard squared-loss linear regression, linear kernel SVM, and nonlinear (Gaussian) kernel SVM . |

Experiments | We use the Statistical Toolbox’s linear regression implementation in Matlab, and LibSVM (Chang and Lin, 2011) for training and testing the SVM models. |

Experiments | The hyperparameter C in linear SVM, and the 7 and C hyperparameters in Gaussian SVM are tuned on the training set using 10-fold cross-validation. |

Introduction | 0 Our results significantly outperform standard linear regression and strong SVM baselines. |

Related Work | (2003) are among the first to study SVM and text mining methods in the market prediction domain, where they align financial news articles with multiple time series to simulate the 33 stocks in the Hong Kong Hang Seng Index. |

Related Work | (2009) model the SEC-mandated annual reports, and performs linear SVM regression with e-insensitive loss function to predict the measured volatility. |

Related Work | Traditional discriminative models, such as linear regression and linear SVM , have been very popular in various text regression tasks, such as predicting movie revenues from reviews (Joshi et al., 2010), understanding the geographic lexical variation (Eisenstein et al., 2010), and predicting food prices from menus (Chahuneau et al., 2012). |

SVM is mentioned in 15 sentences in this paper.

Topics mentioned in this paper:

- SVM (15)
- regression model (13)
- linear regression (11)

Abstract | However, by using an SVM ranker to combine the realizer’s model score together with features from multiple parsers, including ones designed to make the ranker more robust to parsing mistakes, we show that significant increases in BLEU scores can be achieved. |

Abstract | Moreover, via a targeted manual analysis, we demonstrate that the SVM reranker frequently manages to avoid vicious ambiguities, while its ranking errors tend to affect fluency much more often than adequacy. |

Introduction | Consequently, we examine two reranking strategies, one a simple baseline approach and the other using an SVM reranker (J oachims, 2002). |

Introduction | Therefore, to develop a more nuanced self-monitoring reranker that is more robust to such parsing mistakes, we trained an SVM using dependency precision and recall features for all three parses, their n-best parsing results, and per-label precision and recall for each type of dependency, together with the realizer’s normalized perceptron model score as a feature. |

Introduction | With the SVM reranker, we obtain a significant improvement in BLEU scores over |

Reranking with SVMs 4.1 Methods | Similarly, we conjectured that large differences in the realizer’s perceptron model score may more reliably reflect human fluency preferences than small ones, and thus we combined this score with features for parser accuracy in an SVM ranker. |

Reranking with SVMs 4.1 Methods | Additionally, given that parsers may more reliably recover some kinds of dependencies than others, we included features for each dependency type, so that the SVM ranker might learn how to weight them appropriately. |

Reranking with SVMs 4.1 Methods | We trained the SVM ranker (J oachims, 2002) with a linear kernel and chose the hyper-parameter c, which tunes the tradeoff between training error and margin, with 6-fold cross-validation on the devset. |

SVM is mentioned in 23 sentences in this paper.

Topics mentioned in this paper:

- perceptron (29)
- SVM (23)
- BLEU (20)

Experimental Evaluation | We use logistic regression (LR) with L2 regularization (Fan et al., 2008) and the SVMWWW ( SVM ) system (Joachims, 2007) with its default settings as the classifiers. |

Experimental Evaluation | It self-trains two classifiers from the character 3- gram, lexical, and syntactic views using CNG and SVM classifiers (Kourtis and Stamatatos, 2011). |

Experimental Evaluation | The original method applied only CNG and SVM on the character n-gram view. |

Introduction | However, the self-training method in (Kourtis and Stamatatos, 2011) uses two classifiers (CNG and SVM ) on one view. |

Proposed Tri-Training Algorithm | Many classification algorithms give such scores, e.g., SVM and logistic regression. |

Related Work | On developing effective learning techniques, supervised classification has been the dominant approach, e.g., neural networks (Graham et al., 2005; Zheng et al., 2006), decision tree (Uzuner and Katz, 2005; Zhao and Zobel, 2005), logistic regression (Madigan et al., 2005), SVM (Diederich et al., 2000; Gamon 2004; Li et al., 2006; Kim et al., 2011), etc. |

SVM is mentioned in 13 sentences in this paper.

Topics mentioned in this paper:

- SVM (13)
- n-grams (5)
- logistic regression (3)

Large-Margin Learning Framework | As we will see, it is possible to learn {Wm} using standard support vector machine ( SVM ) training (holding A fixed), and then make a simple gradient-based update to A (holding {Wm} fixed). |

Large-Margin Learning Framework | As is standard in the multi-class linear SVM (Crammer and Singer, 2001), we can solve the problem defined in Equation 6 via Lagrangian optimization: |

Large-Margin Learning Framework | If A is fixed, then the optimization problem is equivalent to a standard multi-class SVM , in the transformed feature space f (vi; A). |

SVM is mentioned in 10 sentences in this paper.

Topics mentioned in this paper:

- discourse parsing (19)
- shift-reduce (16)
- EDUs (10)

Experiments | 10We use SVMlight (J oachims, 1999) to train our linear SVM classifiers |

Experiments | For SVM , models trained on POS and LIWC features achieve even lower accuracy than Unigram. |

Experiments | tive model, SAGE achieve much better results than SVM , and is around 0.65 accurate in the cross-domain task. |

Feature-based Additive Model | If we instead use SVM , for example, we would have to train classifiers one by one (due to the distinct features from different sources) to draw conclusions regarding the differences between Turker vs Expert vs truthful reviews, positive expert vs negative expert reviews, or reviews from different domains. |

Introduction | In the examples in Table l, we trained a linear SVM classifier on Ott’s Chicago-hotel dataset on unigram features and tested it on a couple of different domains (the details of data acquisition are illustrated in Section 3). |

Introduction | Table 1: SVM performance on datasets for a classifier trained on Chicago hotel review based on Unigram feature. |

SVM is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

- Turker (19)
- Unigram (9)
- gold-standard (8)

Experiments | Methods: We evaluated the overall performance relative to the common SVM bag of words approach that can be ubiquitously found in text mining literature. |

Experiments | o SVM-TF: Uses a bag of words SVM with term frequency weights. |

Experiments | SVM-Delta-IDF: Uses a bag of words SVM classification with TF.Delta-IDF weights (Formula 2) in the feature vectors before training or testing an SVM . |

Related Work | (2012) propose an algorithm which first trains individual SVM classifiers on several small, class-balanced, random subsets of the dataset, and then reclassifies each training instance using a majority vote of these individual classifiers. |

SVM is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

- ground truth (9)
- F1 Score (8)
- feature weighting (7)

Experiments | We experiment with two machine-learning approaches: Naive Bayes and SVM . |

Experiments | We report the n-gram values for which the best results are obtained and the hyperparameters for SVM , c and 7. |

Experiments | The SVM produces better results for all languages except Portuguese, where the accuracy is equal. |

Our Approach | For SVM , we use the wrapper provided by Weka for LibSVM (Chang and Lin, 2011). |

SVM is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

- edit distance (5)
- SVM (5)
- learning algorithms (4)

Experiments | In SVM implementations, the tradeoff parameter between training error and margin was set to l for all experiments. |

Experiments | We compare our approaches to three state-of-the-art approaches including SVM with convolution tree kernels (Collins and Duffy, 2001), linear regression and SVM with linear kernels (Scholkopf and Smola, 2002). |

Experiments | The SVM with linear kernels and the linear regression model used the same features as the manifold models. |

SVM is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

- relation extraction (22)
- unlabeled data (12)
- knowledge base (8)

Abstract | Input: - L, labeled data set - U, unlabeled data set - n, batch size Output: - SVM , classifier Repeat: 1. |

Abstract | Train a single classifier SVM on L 2. |

Abstract | The objective is to learn SVM classifiers in both languages, denoted as SVMC and SVMe respectively, in a BAL fashion to improve their classification performance. |

SVM is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

- relation instances (19)
- relation extraction (16)
- parallel corpora (11)

Related Work | (2) SVM : The ngram features and Support Vector Machine are widely used baseline methods to build sentiment classifiers (Pang et al., 2002). |

Related Work | LibLinear is used to train the SVM classifier. |

Related Work | (3) NBSVM: NBSVM (Wang and Manning, 2012) is a state-of—the-art performer on many sentiment classification datasets, which trades-off between Naive Bayes and NB-enhanced SVM . |

SVM is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

- sentiment classification (52)
- word embedding (46)
- neural networks (16)

CR + LS + DMM + DPM 39.32* +24% 47.86* +20% | For all experiments we used a linear SVM kernel.15 |

CR + LS + DMM + DPM 39.32* +24% 47.86* +20% | Table 4 shows that of the highest-weighted SVM features learned when training models for HOW questions on YA and Bio, many are shared (e.g., 56.5% of the features in the top half of both DPMs are shared), suggesting that a core set of discourse features may be of utility across domains. |

CR + LS + DMM + DPM 39.32* +24% 47.86* +20% | Table 4: Percentage of top features with the highest SVM weights that are shared between Bio HOW and YA models. |

SVM is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

- reranking (24)
- cosine similarity (12)
- lexical semantic (12)

Experiments | NB 41.0 81.8 BINB 41.9 83.1 SVM 40.7 79.4 REcNTN 45.7 85.4 MAX-TDNN 37.4 77.1 NBOW 42.4 80.5 DCNN 48.5 86.8 |

Experiments | SVM is a support vector machine with unigram and bigram features. |

Experiments | head word, parser SVM hypernyms, WordNet |

SVM is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

- Neural Network (17)
- n-grams (11)
- unigram (9)

Discussion | We use three sentiment classification techniques: Na‘1've Bayes, MaxEnt and SVM with un-igrams, bigrams and trigrams as features. |

Discussion | 7http://scikit-learn.org/stable/ 8In case of SVM , the probability of predicted class is computed as given in Platt (1999). |

Discussion | MaxEnt (Movie) -0.29 (72.17) MaxEnt (Twitter) -0.26 (71.68) SVM (Movie) -().24 (66.27) SVM (Twitter) -().19 (73.15) |

SVM is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

- human annotator (5)
- confidence score (4)
- MaxEnt (3)

Introduction | Finally, new relation instances are extracted using kernel based classifiers, e. g., the SVM classifier. |

Introduction | We apply the one vs. others strategy for multiple classification using SVM . |

Introduction | For SVM training, the parameter C is set to 2.4 for all experiments, and the tree kernel parameter A is tuned to 0.2 for FTK and 0.4 (the optimal parameter setting used in Qian et al. |

SVM is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

- tree kernel (28)
- context information (10)
- relation extraction (9)

Predicting Direction of Power | Handling of undefined values for features in SVM is not straightforward. |

Predicting Direction of Power | Most SVM implementations assume the value of 0 by default in such cases, conflating them |

Predicting Direction of Power | Since we use a quadratic kernel, we expect the SVM to pick up the interaction between each feature and its indicator feature. |

SVM is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

- feature set (6)
- manual annotations (6)
- structural feature (5)

Soft Constraints in Dual Decomposition | All we need to employ the structured perceptron algorithm (Collins, 2002) or the structured SVM algorithm (Tsochantaridis et al., 2004) is a black-box procedure for performing MAP inference in the structured linear model given an arbitrary cost vector. |

Soft Constraints in Dual Decomposition | This can be ensured by simple modifications of the perceptron and subgradient descent optimization of the structured SVM objective simply by truncating c coordinate-wise to be nonnegative at every learning iteration. |

Soft Constraints in Dual Decomposition | A similar analysis holds for the structured SVM ap- |

SVM is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

- soft constraints (25)
- CRF (19)
- ground truth (10)

Conclusion | # Pos/Neg Lexicon SVM Hownet 627/1,038 0.737 0.756 Hownet+NW 743/ 1,150 0.770 0.779 Hownet+T100 679/ 1,172 0.761 0.774 cptHownet 138/125 0.738 0.758 cptHownet+NW 254/237 0.774 0.782 cptHownet+T100 190/159 0.764 0.775 |

Experiment | The second model is a SVM model in which opinion words are used as feature, and 5-fold cross validation is conducted. |

Experiment | 5 This is not necessary for the SVM model. |

SVM is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

- POS tags (8)
- word segmentation (8)
- sentiment analysis (7)

Event Causality Extraction Method | An event causality candidate is given a causality score 0 8 core, which is the SVM score (distance from the hyperplane) that is normalized to [0,1] by the sigmoid function Each event causality candidate may be given multiple original sentences, since a phrase pair can appear in multiple sentences, in which case it is given more than one SVM score. |

Experiments | (2011): CEAWS is an unsupervised method that uses CEA to rank event causality candidates, and CEAsup is a supervised method using SVM and the CEA features, whose ranking is based on the SVM scores. |

Experiments | The baselines are as follows: Csuns is an unsupervised method that uses 03 for ranking, and Cssup is a supervised method using SVM with 03 as the only feature that uses SVM scores for ranking. |

SVM is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

- semantic relations (25)
- phrase pairs (11)
- Randomly sample (4)

Related work | In particular, starting from EDUs, at each step of the tree-building, a binary SVM classifier is first applied to determine which pair of adjacent discourse constituents should be merged to form a larger span, and another multi-class SVM classifier is then applied to assign the type of discourse relation that holds between the chosen pair. |

Related work | Also, the employment of SVM classifiers allows the incorporation of rich features for better data representation (Feng and Hirst, 2012). |

Related work | However, HILDA’s approach also has obvious weakness: the greedy algorithm may lead to poor performance due to local optima, and more importantly, the SVM classifiers are not well-suited for solving structural problems due to the difficulty of taking context into account. |

SVM is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

- discourse parser (21)
- EDUs (17)
- time complexity (15)

Experiments | the target-independent (SVM-indep) and target-dependent features and uses SVM as the classifier. |

Experiments | SVM-conn: The words, punctuations, emoti-cons, and #hashtags included in the converted dependency tree are used as the features for SVM . |

Experiments | AdaRNN-comb: We combine the root vectors obtained by AdaRNN-Wfli with the uni/bi-gram features, and they are fed into a SVM classifier. |

SVM is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

- Recursive (14)
- Recursive Neural (11)
- dependency tree (8)

Machine Learning with Edit-Turn-Pairs | Baseline R. Forest SVM Accuracy .799 :|:.031 .866 :|:.026T .858 :|:.027T Fimac, NaN .789 1.032 .763 1.033 Precisionmac. |

Machine Learning with Edit-Turn-Pairs | A reduction of the feature set as judged by a X2 ranker improved the results for both Random Forest as well as the SVM , so we limited our feature set to the 100 best features. |

Machine Learning with Edit-Turn-Pairs | In a 10-fold cross-validation experiment, we tested a Random Forest classifier (Breiman, 2001) and an SVM (Platt, 1998) with polynomial kernel. |

SVM is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

- manually annotated (4)
- turkers (4)
- Cosine similarity (3)

Related Work | The TRE systems use techniques such as: Rules (Regulars, Patterns and Propositions) (Miller et al., 1998), Kernel method (Zhang et al., 2006b; Zelenko et al., 2003), Belief network (Roth and Yih, 2002), Linear programming (Roth and Yih, 2007), Maximum entropy (Kambhatla, 2004) or SVM (GuoDong et al., 2005). |

Related Work | (2005) introduced a feature based method, which utilized lexicon information around entities and was evaluated on Winnow and SVM classifiers. |

Related Work | For each type of these relations, a SVM was trained and tested independently. |

SVM is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

- entity mention (21)
- relation extraction (21)
- soft constraint (19)