Abstract | The created thesaurus is then used to expand feature vectors to train a binary classifier . |
Experiments | Because this is a binary classification task (i.e. |
Experiments | We simply train a binary classifier using unigrams and bigrams as features from the labeled reviews in the source domains and apply the trained classifier on the target domain. |
Experiments | After selecting salient features, the SCL algorithm is used to train a binary classifier . |
Feature Expansion | Using the extended vectors d’ to represent reviews, we train a binary classifier from the source domain labeled reviews to predict positive and negative sentiment in reviews. |
Introduction | Following previous work, we define cross-domain sentiment classification as the problem of learning a binary classifier (i.e. |
Introduction | thesaurus to expand feature vectors in a binary classifier at train and test times by introducing related lexical elements from the thesaurus. |
Introduction | (However, the method is agnostic to the properties of the classifier and can be used to expand feature vectors for any binary classifier ). |
Discussion and future work | While past research has used logistic regression as a binary classifier (Newman et al., 2003), our experiments show that the best-performing classifiers allow for highly nonlinear class boundaries; SVM and RF models achieve between 62.5% and 91.7% accuracy across age groups — a significant improvement over the baselines of LR and NB, as well as over previous results. |
Related Work | Further, the use of binary classification schemes in previous work does not account for partial truths often encountered in real-life scenarios. |
Results | The SVM is a parametric binary classifier that provides highly nonlinear decision boundaries given particular kernels. |
Results | 5.1 Binary classification across all data |
Results | 5.2 Binary classification by age group |
Experiments | 4.1 Binary classification |
Experiments | 3 shows the performance of gSLDA+ with different burn-in steps for binary classification . |
Experiments | burn-in steps for binary classification . |
Logistic Supervised Topic Models | We consider binary classification with a training set D = {(wd, yd)}dD=1, where the response variable Y takes values from the output space 3/ = {0, 1}. |
Logistic Supervised Topic Models | loss (Rosasco et al., 2004) in the task of fully observed binary classification . |
MT System Selection | 4.1 Dialect ID Binary Classification |
MT System Selection | We run the sentence through the Dialect ID binary classifier and we use the predicted class label (DA or MSA) as a feature in our system. |
MT System Selection | It improves over our best baseline single MT system by 1.3% BLEU and over the Dialect ID Binary Classification system selection baseline by 0.8% BLEU. |
Experiments and Results | (2006) experiment), 2. binary classification with the split at each birth year from 1975-1988 and 3. |
Experiments and Results | In contrast to Schler et al.’s experiment, our division does not introduce a gap between age groups, we do binary classification , and we use significantly less data. |
Experiments and Results | 0 Perform binary classification between blogs BEFORE X and IN/ AFTER X |
Introduction | Therefore, we experimented with binary classification into age groups using all birth dates from 1975 through 1988, thus including students from generation Y who were in college during the emergence of social media technologies. |
Introduction | We find five years where binary classification is significantly more accurate than other years: 1977, 1979, and 1982-1984. |
Related Work | We also use a supervised machine learning approach, but classification by gender is naturally a binary classification task, while our work requires determining a natural dividing point. |
Abstract | For classification we use an SVM binary classifier and training data taken from the EUROVOC thesaurus. |
Conclusion | In this paper we presented an approach to align terms identified by a monolingual term extractor in bilingual comparable corpora using a binary classifier . |
Feature extraction | To align or map source and target terms we use an SVM binary classifier (J oachims, 2002) with a linear kernel and the tradeoff between training error and margin parameter c = 10. |
Method | We then treat term alignment as a binary classification task, i.e. |
Method | For classification purposes we use an SVM binary classifier . |
Related Work | However, it naturally lends itself to being viewed as a classification task, assuming a symmetric approach, since the different information sources mentioned above can be treated as features and each source-target language potential term pairing can be treated as an instance to be fed to a binary classifier which decides whether to align them or not. |
Domain Adaptation | Next, we train a binary classification model, 6, using those feature vectors. |
Domain Adaptation | Any binary classification algorithm can be used to learn 6. |
Domain Adaptation | Finally, we classify h using the trained binary classifier 6. |
Related Work | Linear predictors are then learnt to predict the occurrence of those pivots, and SVD is used to construct a lower dimensional representation in which a binary classifier is trained. |
Related Work | The created thesaurus is used to expand feature vectors during train and test stages in a binary classifier . |
Background | Then the thresholded output of this binary classifier is used as training data for learning a multi-class classifier for the 7 relation types (Bunescu and Mooney, 2005b). |
Cluster Feature Selection | When the binary classifier’s prediction probability is greater than 0.5, we take the prediction with the highest probability of the multi-class classifier as the final class label. |
Feature Based Relation Extraction | Specifically, we first train a binary classifier to distinguish between relation instances and non-relation instances. |
Feature Based Relation Extraction | Then rather than using the thresholded output of this binary classifier as training data, we use only the annotated relation instances to train a multi-class classifier for the 7 relation types. |
Feature Based Relation Extraction | given a test instance x , we first apply the binary classifier to it for relation detection; if it is detected as a relation instance we then apply the multi-class relation classifier to classify it4. |
Abstract | Our experiments test multiple text representations on two binary classification tasks, change of price and polarity. |
Experiments | Both tasks are treated as binary classification problems. |
Experiments | gested as one of the best methods to summarize into a single value the confusion matrix of a binary classification task (Jurman and Furlanello, 2010; Baldi et al., 2000). |
Introduction | Our experiments test several document representations for two binary classification tasks, change of price and polarity. |
Related Work | Our two binary classification tasks for news, price change and polarity, are analogous to their activity and direction. |
Building a Discourse Parser | o S: A binary classifier , for structure (existence of a connecting node between the two input sub-trees). |
Building a Discourse Parser | Because the original SVM algorithms build binary classifiers , multi-label classification requires some adaptation. |
Building a Discourse Parser | A possible approach is to reduce the multi-classification problem through a set of binary classifiers , each trained either on a single class (“one vs. all”) or by pair (“one vs. one”). |
Evaluation | Binary classifier S is trained on 52,683 instances (split approximately 1/3, 2 / 3 between positive and negative examples), extracted from 350 documents, and tested on 8,558 instances extracted from 50 documents. |
Error Classification | To solve this problem, we train five binary classifiers , one for each error type, using a one-versus-all scheme. |
Error Classification | So in the binary classification problem for identifying error 6,, we create one training instance from each essay in the training set, labeling the instance as positive if the essay has 6, as one of its labels, and negative otherwise. |
Error Classification | After creating training instances for error 6,, we train a binary classifier , 19,-, for identifying which test essays contain error 61-. |
Evaluation | Let tpi be the number of test essays correctly labeled as positive by error ei’s binary classifier 1),; pi be the total number of test essays labeled as positive by 1),; and g,- be the total number of test essays that belong to 6,- according to the gold standard. |
Experiments | Table 6 lists the number of utterance-response pairs used to train eight binary classifiers for individual emotional categories, which form a one-versus-the rest classifier for the prediction task. |
Predicting Addressee’s Emotion | Although a response could elicit multiple emotions in the addressee, in this paper we focus on predicting the most salient emotion elicited in the addressee and cast the prediction as a single-label multi-class classification problem.5 We then construct a one-versus-the-rest classifier6 by combining eight binary classifiers , each of which predicts whether the response elicits each emotional category. |
Predicting Addressee’s Emotion | We use online passive-aggressive algorithm to train the eight binary classifiers . |
Predicting Addressee’s Emotion | Since the rule-based approach annotates utterances with emotions only when they contain emotional expressions, we independently train for each emotional category a binary classifier that estimates the addresser’s emotion from her/his utterance and apply it to the unlabeled utterances. |
Experiments | The former train a latent PCFG support vector machine for binary classification (LSVM). |
Experiments | The latter report results for two binary classifiers : RERANK uses the reranking features of Charniak and Johnson (2005), and TSG uses |
Experiments | Indeed, the methods in Post (2011) are simple binary classifiers , and it is not clear that these models would be properly calibrated for any other task, such as integration in a decoder. |
Beam-Width Prediction | Given a global maximum beam-width b, we train 19 different binary classifiers , each using separate mapping functions (1)19, where the target value y produced by (13],, is 1 if Rm > k and 0 otherwise. |
Beam-Width Prediction | During decoding, we assign the beam-width for chart cell spanning wi+1...wj given models 60, 61, “.654 by finding the lowest value k such that the binary classifier 6k, classifies Rm 3 k. If no such k exists, RM is set to the maximum beam-width value b: |
Introduction | More generally, instead of a binary classification decision, we can also use this method to predict the desired cell population directly and get cell closure for free when the classifier predicts a beam-width of zero. |
Open/Closed Cell Classification | We first look at the binary classification of chart cells as either open or closed to full constituents, and predict this value from the input sentence alone. |
Error Detection with a Maximum Entropy Model | As mentioned before, we consider error detection as a binary classification task. |
Introduction | Sometimes the step 2) is not necessary if only one effective feature is used (Ueffing and Ney, 2007); and sometimes the step 2) and 3) can be merged into a single step if we directly output predicting results from binary classifiers instead of making thresholding decision. |
Introduction | We integrate two sets of linguistic features into a maximum entropy (MaxEnt) model and develop a MaxEnt-based binary classifier to predict the category (correct or incorrect) for each word in a generated target sentence. |
Related Work | 0 We treat error detection as a complete binary classification problem. |
Baseline Approaches | More specifically, both baselines recast the cause identification problem as a set of 14 binary classification problems, one for predicting each shaper. |
Baseline Approaches | In the binary classification problem for predicting shaper 3,, we create one training instance from each document in the training set, labeling the instance as positive if the document has 3,- as one of its labels, and negative otherwise. |
Baseline Approaches | After creating training instances, we train a binary classifier , Ci, for predicting 3i, employing as features the top 50 unigrams that are selected according to information gain computed over the training data (see Yang and Pedersen (1997)). |
Introduction | Second, the fact that this is a 14-class classification problem makes it more challenging than a binary classification problem. |
Experiments and Results | Although previous work has been done on PDTB (Pitler et al., 2009) and (Lin et al., 2009), we cannot make a direct comparison with them because various experimental conditions, such as, different classification strategies (multi-class classification, multiple binary classification ), different data preparation (feature extraction and selection), different benchmark data collections (different sections for training and test, different levels of discourse relations), different classifiers with various parameters (MaxEnt, Na‘1've Bayes, SVM, etc) and |
Implementation Details of Multitask Learning Method | Specifically, we adopt multiple binary classification to build model for main task. |
Implementation Details of Multitask Learning Method | That is, for each discourse relation, we build a binary classifier . |
Experiment 3: Sense Similarity | (2007) considered sense grouping as a binary classification task whereby for each word every possible pairing of senses has to be classified |
Experiment 3: Sense Similarity | We constructed a simple threshold-based classifier to perform the same binary classification . |
Experiment 3: Sense Similarity | For a binary classification task, we can directly calculate precision, recall and F-score by constructing a contingency table. |
Experimental Setup | We treat relation extraction as a multi-class classification problem and use SVM-light-TK4 to train the binary classifiers . |
Experimental Setup | To estimate the importance weights, we train a binary classifier that distinguishes between source and target domain instances. |
Related Work | We will use a binary classifier trained on RE instance representations. |
Introduction | We model entity identification as a sequence tagging problem and relation extraction as binary classification . |
Model | We treat the relation extraction problem as a combination of two binary classification problems: opinion-arg classification, which decides whether a pair consisting of an opinion candidate 0 and an argument candidate a forms a relation; and opinion-implicit-arg classification, which decides whether an opinion candidate 0 is linked to an implicit argument, i.e. |
Results | By using binary classifiers to predict relations, CRF+RE produces high precision on opinion and target extraction but also results in very low recall. |
Experiments | We report both OVR performance and the performance of three One-versus-One binary classifiers , trained to distinguish between each pair of classes. |
Experiments | We also observe that each of the three One-versas-One binary classifications performs significantly better than chance, suggesting that Employee, Customer, and Tarker are in fact three different classes. |
Experiments | For simplicity, we focus on truthful (Cas-tomer) versus deceptive (Turker) binary classification rather than a multi-class classification. |
Experiments | For each domain in YAGO, we have a binary classification task: whether the instance has the relation corresponding to the domain. |
Experiments | No-transfer classifier (NT) We only use the few labeled instances of the target relation type together with the negative relation instances to train a binary classifier . |
Experiments | Alternate no-transfer classifier (NT-U) We use the union of the k source-domain labeled data sets Dss and the small set of target-domain labeled data D; to train a binary classifier . |