Abstract | Previous work in traditional text classification and its variants — such as sentiment analysis — has achieved successful results by using the bag-of-words representation; that is, by treating text as a collection of words with no interdependencies, training a classifier on a large feature set of word unigrams which appear in the corpus. |
Abstract | Defining features in this manner allows us to both explore the bag-of-words representation as well as use groups of n-grams as features, which we believed would be a better fit for this problem. |
Abstract | In experiments based on the bag-of-words model, we only consider an absolute frequency threshold, whereas in later experiments, we also take into account the relative frequency ratio threshold. |
Background | Our baseline approach to both problems is to use a bag-of-words model of the contribution, and use machine learning for classification. |
Background | We build a contextual feature space, described in section 4.2, to enhance our baseline bag-of-words model. |
Background | This is a distinction that a bag-of-words model would have difficulty with. |