Related Work | (2007) demonstrates that the general feature space they devise achieves a rate of error reduction ranging from 48% to 88% over a chance baseline accuracy, across classification tasks of varying difficulty. |
Results and Discussion | (2007), we adopt a single evaluation measure - macro-averaged recall - for all of our classification tasks . |
Results and Discussion | (2007) conducts 11 classification tasks including six 2-way classifications, two 3-way classifications, one 6-way classification, one 8-way classification, and one l4-way classification. |
Results and Discussion | In our experiments, we replicate these 11 classification tasks using the proposed six different feature sets. |
Approach | We formulate the sentence-level sentiment classification task as a sequence labeling problem. |
Conclusion | Our experiments show that our model achieves better accuracy than existing supervised and semi-supervised models for the sentence-level sentiment classification task . |
Experiments | For the three-way classification task on the MD dataset, we also implemented the following baselines: (4) VOTEFLIP: a rule-based algorithm that leverages the positive, negative and neutral cues along with the effect of negation to determine the sentence sentiment (Choi and Cardie, 2009). |
Experiments | We first report results on a binary (positive or negative) sentence-level sentiment classification task . |
Experiments | We also analyzed the model’s performance on a three-way sentiment classification task . |
Introduction | We evaluate our approach on the sentence-level sentiment classification task using two standard product review datasets. |
Abstract | We show that this constraint is effective on the sentiment classification task (Fang et al., 2002), resulting in scores similar to the ones obtained by the structural correspondence methods (Blitzer et al., 2007) without the need to engineer auxiliary tasks. |
Constraints on Inter-Domain Variability | At least some of these clusters, when induced by maximizing the likelihood L(6, a) with sufficiently large 04, will be useful for the classification task on the source domain. |
Discussion and Conclusions | Our approach results in competitive domain-adaptation performance on the sentiment classification task , rivalling that of the state-of-the-art SCL method (Blitzer et al., 2007). |
Empirical Evaluation | In this section we empirically evaluate our approach on the sentiment classification task . |
Empirical Evaluation | On the sentiment classification task in order to construct them two steps need to be performed: (1) a set of words correlated with the sentiment label is selected, and, then (2) prediction of each such word is regarded a distinct auxiliary problem. |
The Latent Variable Model | In this paper we consider classification tasks , namely prediction of sentiment polarity of a user review (Pang et al., 2002), and model the joint distribution of the binary sentiment label 3/ E {0, l} and the multiset of text features :13, :ci 6 X. |
The Latent Variable Model | Consequently, the latent representation induced in this way is likely to be inappropriate for the classification task in question. |
Abstract | We evaluate these models on two cross-lingual document classification tasks , outperforming the prior state of the art. |
Corpora | The Europarl corpus v71 (Koehn, 2005) was used during initial development and testing of our approach, as well as to learn the representations used for the Cross-Lingual Document Classification task described in §5.2. |
Experiments | First, we replicate the cross-lingual document classification task of Klementiev et al. |
Experiments | multi-label classification task using the TED corpus, both for training and evaluating. |
Experiments | Each document in the classification task is represented by the average of the d-dimensional representations of all its sentences. |
Abstract | Our experiments test multiple text representations on two binary classification tasks , change of price and polarity. |
Experiments | gested as one of the best methods to summarize into a single value the confusion matrix of a binary classification task (Jurman and Furlanello, 2010; Baldi et al., 2000). |
Experiments | Models are constructed using linear kernel support vector machines for both classification tasks . |
Introduction | Our experiments test several document representations for two binary classification tasks , change of price and polarity. |
Motivation | Bag-of-Words (BOW) document representation is difficult to surpass for many document classification tasks , but cannot capture the degree of semantic similarity among these sentences. |
Related Work | Our work addresses classification tasks that have potential relevance to an influential financial model (Rydberg and Shephard, 2003). |
Related Work | Our two binary classification tasks for news, price change and polarity, are analogous to their activity and direction. |
Experiment | Our experiments consist of an opinion retrieval task and a sentiment classification task . |
Experiment | Since MOAT is a classification task , we use a threshold parameter to draw a boundary between opinionated and non-opinionated sentences. |
Experiment | 4.3 Classification task — SVM |
Related Work | (2002) presents empirical results indicating that using term presence over term frequency is more effective in a data-driven sentiment classification task . |
Baseline Approaches | The SVM learning algorithm as implemented in the LIB SVM software package (Chang and Lin, 2001) is used for classifier training, owing to its robust performance on many text classification tasks . |
Conclusions | We recast it as a multi-class, multi-label text classification task , and presented a bootstrapping algorithm for improving the prediction of minority classes in the presence of a small training set. |
Dataset | Unlike newswire articles, at which many topic-based text classification tasks are targeted, the ASRS reports are informally written using various domain-specific abbreviations and acronyms, tend to contain poor grammar, and have capitalization information removed, as illustrated in the following sentence taken from one of the reports. |
Introduction | The difficulty of a text classification task depends on various factors, but typically, the task can be difficult if (1) the amount of labeled data available for learning the task is small; (2) it involves multiple classes; (3) it involves multi-label categorization, where more than one label can be assigned to each document; (4) the class distributions are skewed, with some categories significantly outnumbering the others; and (5) the documents belong to the same domain (e. g., movie review classification). |
Introduction | Hence, cause identification can be naturally recast as a text classification task : given an incident report, determine which of a set of 14 shapers contributed to the occurrence of the incident described in the report. |
Related Work | Since we recast cause identification as a text classification task and proposed a bootstrapping approach that targets at improving minority class prediction, the work most related to ours involves one or both of these topics. |
Experiments | Similar to the opinion classification task , comment type classification is a multi-class classification with three classes: video, product and uninform. |
Experiments | Table 1: Summary of YouTube comments data used in the sentiment, type and full classification tasks . |
Experiments | We cast this problem as a single multi-class classification task with seven classes: the Cartesian product between {product , video} type labels and {posit ive, neutral , negative} sentiment labels plus the uninf ormat ive category (spam and ofl-topic). |
Representations and models | Our structures are specifically adapted to the noisy user-generated texts and encode important aspects of the comments, e. g., words from the sentiment lexicons, product concepts and negation words, which specifically targets the sentiment and comment type classification tasks . |
YouTube comments corpus | The resulting annotator agreement 04 value (Krip-pendorf, 2004; Artstein and Poesio, 2008) scores are 60.6 (AUTO), 72.1 (TABLETS) for the sentiment task and 64.1 (AUTO), 79.3 (TABLETS) for the type classification task . |
Cross-Language Structural Correspondence Learning | We refer to this classification task as the target task. |
Experiments | For each of the nine target-language-category-combinations a text classification task is created by taking the training set of the product category in S and the test set of the same product category in ’2'. |
Introduction | Stated precisely: We are given a text classification task 7 in a target language ’2' for which no labeled documents are available. |
Introduction | 7 may be a spam filtering task, a topic categorization task, or a sentiment classification task . |
Introduction | In this connection we compile extensive corpora in the languages English, German, French, and Japanese, and for different sentiment classification tasks . |
Introduction | State-of-the-art approaches cast this problem as a classification task and train classifiers using supervised learning (Section 2). |
Introduction | Our approach obviates the need for expensive annotation efforts, and allows us to rapidly bootstrap training data for new classification tasks . |
Introduction | We validate our approach by advancing the state-of-the-art on the most well-studied user classification task : predicting user gender (Section 5). |
Learning Class Attributes | Table 1: Example instances used for extraction of class attributes for the gender classification task |
Learning Class Attributes | For the gender classification task , we manually filtered the entire set of attributes to select around 1000 attributes that were judged to be discriminative (two thirds of which are female). |
Conclusions and Future Work | Experimental results showed that the emotional lexicons generated by our algorithm is of high quality, and can assist emotion classification task . |
Experiments | Since there is no metric explicitly measuring the quality of an emotion lexicon, we demonstrate the performance of our algorithm in two ways: (1) we perform a case study for the lexicon generated by our algorithm, and (2) we compare the results of solving emotion classification task using our lexicon against different methods, and demonstrate the advantage of our lexicon over other lexicons and other emotion classification systems. |
Experiments | We compare the performance between a popular emotion lexicon WordNet-Affect (Strapparava and Valitutti, 2004) and our approach for emotion classification task . |
Experiments | In particular, we are able to obtain an overall Fl-score of 10.52% for disgust classification task which is difficult to work out using pre- |
Introduction | We frame content selection as a simple classification task : given a set of time-series data, decide for each template whether it should be included in a summary or not. |
Methodology | The LP method transforms the ML task, into one single-label multi-class classification task , where the possible set of predicted variables for the transformed class is the powerset of labels present in the original dataset. |
Related Work | Collective content selection (Barzilay and Lapata, 2004) is similar to our proposed method in that it is a classification task that predicts the templates from the same instance simultaneously. |
Related Work | Problem transformation approaches (Tsoumakas and Katakis, 2007) transform the ML classification task into one or more simple classification tasks . |
Task A: Polarity Classification | N-gram features are widely used in a variety of classification tasks, therefore we also use them in our polarity classification task . |
Task B: Valence Prediction | To conduct our valence prediction study, we used the same human annotators from the polarity classification task for each one of the English, Spanish, Russian and Farsi languages. |
Task B: Valence Prediction | We used the same features for the regression task as we have used in the classification task . |
Task B: Valence Prediction | The learned lessons from this study are: (l) valence prediction is a much harder task than polarity classification both for human annotation and for the machine learning algorithms; (2) the obtained results showed that despite its difficulty this is still a plausible problem; (3) similarly to the polarity classification task , valence prediction with LIWC is improved when shorter contexts (the metaphor/source/target information source) are considered. |
Prediction Experiments | We hypothesize that it might because that SVM, as a strong large margin learner, is a more natural approach in a binary classification setting, but might not be the best choice in a four-way or multiclass classification task . |
Prediction Experiments | Figure 2 and Figure 3 show the experiment results in both time and region classification task . |
Prediction Experiments | Secondly, in the two tasks, it is observed that the accuracy of the binary region classification task is much higher than the four-way task, thus while the latter benefits significantly from this joint learning scheme of the SME model, but the former might not have the equivalent gain5. |
The Sparse Mixed-Effects Model | The weight update of MR) and MQ) is bound by the averaged accuracy of the two classification tasks in the training data, which is similar to the notion of minimizing empirical risk (Bahl et al., 1988). |
Abstract | Based on an observation that expe-rience-revealing sentences have a certain linguistic style, we formulate the problem of detecting experience as a classification task using various features including tense, mood, aspect, modality, experiencer, and verb classes. |
Experience Detection | Having converted the problem of experience detection for sentences to a classification task , we focus on the extent to which various linguistic features contribute to the performance of the binary classif1er for sentences. |
Introduction | the problem as a classification task using various linguistic features including tense, mood, aspect, modality, experiencer, and verb classes. |
Lexicon Construction | The other one is based on Support Vector Machine (Chang and Lin, 2001) which is the state-of-the-art algorithm for many classification tasks . |
A Distributional Model for Argument Classification | In the argument classification task , the similarity between two argument heads hl and fig observed in FrameNet can be computed over l7; and |
Empirical Analysis | Table 3: Accuracy on Arg classification tasks wrt different clustering policies |
Empirical Analysis | We measured the performance on the argument classification tasks of different models obtained by combing different choices of o with Eq. |
Introduction | More recently, the state-of-art frame-based semantic role labeling system discussed in (Johansson and Nugues, 2008b) reports a 19% drop in accuracy for the argument classification task when a different test domain is targeted (i.e. |
Problem Definition | We consider a semi-supervised classification task , in which the goal is to produce a mapping from an instance space 26 consisting of T-tuples of nonnegative integer-valued features w 2 (ml, . |
Problem Definition | We evaluate on two text classification tasks : topic classification, and sentiment detection. |
Problem Definition | However, LR underperforms in classification tasks (in terms of F1, Tables 4-6). |
Abstract | On a three-way classification task between vowels, nasals, and non-nasal consonants, our model yields unsupervised accuracy of 89% across the same set of languages. |
Analysis | To further compare our model to the EM baseline, we show confusion matrices for the three-way classification task in Figure 3. |
Conclusion | We further experimented on a three-way classification task involving nasal characters, achieving nearly 90% accuracy. |
Experimental Evaluation | Following are the three classification tasks associated with this dataset. |
Experimental Evaluation | For SRAA dataset we infer 8 topics on the training dataset and label these 8 topics for all the three classification tasks . |
Experimental Evaluation | However, ClassifyLDA performs better than TS-LDA for the three classification tasks of SRAA dataset. |
Pairwise Markov Random Fields and Loopy Belief Propagation | We formulate the task of learning sense- and word-level connotation lexicon as a graph-based classification task (Sen et al., 2008). |
Pairwise Markov Random Fields and Loopy Belief Propagation | In this classification task , we denote by 3? |
Pairwise Markov Random Fields and Loopy Belief Propagation | Problem Definition Having introduced our graph-based classification task and objective formulation, we define our problem more formally. |
Discussion | Given that structural learning can help in topical classification tasks (Tsochantaridis et al., 2005; Dekel et al., 2004), the lack of success on genres is surprising. |
Experiments | We used character n-grams because they are very easy to extract, language-independent (no need to rely on parsing or even stemming), and they are known to have the best performance in genre classification tasks (Kanaris and Stamatatos, 2009; Sharoff et al., 2010). |
Structural SVMs | But many implementations are not publicly available, and their scalability to real-life text classification tasks is unknown. |