Abstract | In a straightforward bag-of-words experimental setup we add etymological ancestors of the words in the documents, and investigate the performance of a model built on English data, on Italian test data (and viceversa). |
Abstract | The results show not only statistically significant, but a large improvement — a jump of almost 40 points in Fl-score — over the raw (vanilla bag-of-words ) representation. |
Cross Language Text Categorization | The most frequently, and successfully, used document representation is the bag-of-words (BoWs). |
Cross Language Text Categorization | As is commonly done in text categorization (Sebastiani, 2005), the documents in our data are represented as bag-of-words , and classification is done using support vector machines (SVMs). |
Cross Language Text Categorization | The bag-of-words representation for each document is expanded with the corresponding etymological features. |
Discussion | Feature filtering is commonly done in machine learning when the data has many features, and in text categorization when using the bag-of-words representation in particular. |
Discussion | The difference in results on the two dictionary versions was significant: a 4 and 5 points increase respectively in micro-averaged Fl-score in the bag-of-words setting for English trainingfltalian testing and Italian trainingflEnglish testing, and a 2 and 6 points increase in the LSA setting. |
Introduction | We start with the basic setup, representing the documents as bag-of-words , where we train a model on the English training data, and use this model to categorize documents from the Italian test data (and viceversa). |
Introduction | We then add the etymological roots of the words in the data to the bag-of-words , and notice a large — 21 points — increase in performance in terms of Fl-score. |
Introduction | We then use the bag-of-words representation of the training data to build a semantic space using LSA, and use the generated word vectors to represent the training and test data. |
Conclusions | Following the word-alignment paradigm, we find that the rich lexical semantic information improves the models consistently in the unstructured bag-of-words setting and also in the framework of learning latent structures. |
Experiments | For the unstructured, bag-of-words setting, we tested logistic regression (LR) and boosted decision trees (BDT). |
Introduction | Due to the variety of word choices and inherent ambiguities in natural languages, bag-of-words approaches with simple surface-form word matching tend to produce brittle results with poor prediction accuracy (Bilotti et al., 2007). |
Learning QA Matching Models | In this section, we investigate the effectiveness of various learning models for matching questions and sentences, including the bag-of-words setting |
Learning QA Matching Models | 5.1 Bag-of-Words Model |
Learning QA Matching Models | The bag-of-words model treats each question and sentence as an unstructured bag of words. |
Problem Definition | For instance, if we assume a naive complete bipartite matching, then effectively it reduces to the simple bag-of-words model. |
Related Work | Observing the limitations of the bag-of-words models, Wang et al. |
Experiments | Experiments evaluate the FWD and SemTree feature spaces compared to two baselines: bag-of-words (BOW) and supervised latent Dirichlet allocation (sLDA) (Blei and McAuliffe, 2007). |
Introduction | Our main contribution is a novel tree representation based on semantic frame parses that performs significantly better than enriched bag-of-words vectors. |
Introduction | On the polarity task, the semantic frame features encoded as trees perform significantly better across years and sectors than bag-of-words vectors (BOW), and outperform BOW vectors enhanced with semantic frame features, and a supervised topic modeling approach. |
Methods | Table 1 lists 24 types of features, including semantic Frame attributes, bag-of-Words , and scores for words in the Dictionary of Affect in Language by part of speech (pDAL). |
Methods | Bag-of-Words features include term frequency and tfidf of unigrams, bigrams, and trigrams. |
Motivation | Bag-of-Words (BOW) document representation is difficult to surpass for many document classification tasks, but cannot capture the degree of semantic similarity among these sentences. |
Related Work | NLP has recently been applied to financial text for market analysis, primarily using bag-of-words (BOW) document representation. |
Related Work | Table 1: FWD features (Frame, bag-of-Words , part-of-speech DAL score) and their value types. |
Conclusion | In this paper we apply recursive neural networks to political ideology detection, a problem where previous work relies heavily on bag-of-words models and hand-designed lexica. |
Related Work | In general, work in this category tends to combine traditional surface lexical modeling (e. g., bag-of-words ) with hand-designed syntactic features or lexicons. |
Related Work | Most previous work on ideology detection ignores the syntactic structure of the language in use in favor of familiar bag-of-words representations for |
Related Work | E.g., Gerrish and Blei (2011) predict the voting patterns of Congress members based on bag-of-words representations of bills and inferred political leanings of those members. |
Where Compositionality Helps Detect Ideological Bias | Experimental Results Table 1 shows the RNN models outperforming the bag-of-words baselines as well as the word2vec baseline on both datasets. |
Where Compositionality Helps Detect Ideological Bias | We obtain better results on Convote than on IBC with both bag-of-words and RNN models. |
Conclusion and Future Work | These documents are converted to a bag-of-words input and fed into neural networks. |
Introduction | The levels inferred from neural network correspond to distinct levels of concepts, where high-level representations are obtained from low-level bag-of-words input. |
Topic Similarity Model with Neural Network | The most relevant N documents d f and d6 are retrieved and converted to a high-dimensional, bag-of-words input f and e for the representation learningl. |
Topic Similarity Model with Neural Network | Assuming that the input is a n-of-V binary vector X representing the bag-of-words (V is the vocabulary size), an auto-encoder consists of an encoding process g(X) and a decoding process The objective of the auto-encoder is to minimize the reconstruction error £(h(g(X)), X). |
Topic Similarity Model with Neural Network | In our task, for each sentence, we treat the retrieved N relevant documents as a single large document and convert it to a bag-of-words vector X in Figure 2. |
Abstract | Previous work in traditional text classification and its variants — such as sentiment analysis — has achieved successful results by using the bag-of-words representation; that is, by treating text as a collection of words with no interdependencies, training a classifier on a large feature set of word unigrams which appear in the corpus. |
Abstract | Defining features in this manner allows us to both explore the bag-of-words representation as well as use groups of n-grams as features, which we believed would be a better fit for this problem. |
Abstract | In experiments based on the bag-of-words model, we only consider an absolute frequency threshold, whereas in later experiments, we also take into account the relative frequency ratio threshold. |
Abstract | These regularizers impose linguistic bias in feature weights, enabling us to incorporate prior knowledge into conventional bag-of-words models. |
Introduction | For tasks like text classification, sentiment analysis, and text-driven forecasting, this is an open question, as cheap “bag-of-words” models often perform well. |
Introduction | We embrace the conventional bag-of-words representation of text, instead bringing linguistic bias to bear on regularization. |
Introduction | Our experiments demonstrate that structured regularizers can squeeze higher performance out of conventional bag-of-words models on seven out of eight of text categorization tasks tested, in six cases with more compact models than the best-performing unstructured-regularized model. |
Related and Future Work | Overall, our results demonstrate that linguistic structure in the data can be used to improve bag-of-words models, through structured regularization. |
Related and Future Work | Our experimental focus has been on a controlled comparison between regularizers for a fixed model family (the simplest available, linear with bag-of-words features). |
Abstract | We rely on the tree kernel technology to automatically extract and learn features with better generalization power than bag-of-words . |
Experiments | We conjecture that sentiment prediction for AUTO category is largely driven by one-shot phrases and statements where it is hard to improve upon the bag-of-words and sentiment lexicon features. |
Experiments | The bag-of-words model seems to be affected by the data sparsity problem which becomes a crucial issue when only a small training set is available. |
Introduction | The comment contains a product name xoom and some negative expressions, thus, a bag-of-words model would derive a negative polarity for this product. |
Introduction | Clearly, the bag-of-words lacks the structural information linking the sentiment with the target product. |
Representations and models | Such classifiers are traditionally based on bag-of-words and more advanced features. |
Background | Our baseline approach to both problems is to use a bag-of-words model of the contribution, and use machine learning for classification. |
Background | We build a contextual feature space, described in section 4.2, to enhance our baseline bag-of-words model. |
Background | This is a distinction that a bag-of-words model would have difficulty with. |
Never send a human to do a machine’s job. | Our first formulation of the prediction task uses a standard bag-of-words model“). |
Never send a human to do a machine’s job. | If there were no information in the textual content of a quote to determine whether it were memorable, then an SVM employing bag-of-words features should perform no better than chance. |
Never send a human to do a machine’s job. | Even a relatively small number of distinctiveness features, on their own, improve significantly over the much larger bag-of-words model. |
Abstract | Representing reports according to their topic distributions is more compact than bag-of-words representation and can be processed faster than raw text in subsequent automated processes. |
Background | 2.1 Bag-of-Words (BOW) Representation |
Background | One way of doing this is bag-of-words (BoW) representation where each document becomes a vector of its words/tokens. |
Conclusion | Firstly, bag-of-words representation is replaced with topic vectors which provide good dimensionality reduction and still get comparable classification performance. |
Approach | This is a distributed bag-of-words approach as sentence ordering is not taken into account by the model. |
Approach | The use of a nonlinearity enables the model to learn interesting interactions between words in a document, which the bag-of-words approach of ADD is not capable of learning. |
Related Work | (2013) proposed a bag-of-words autoencoder model, where the bag-of-words representation in one language is used to train the embeddings in another. |
Related Work | Hermann and Blunsom (2014) propose a large-margin learner for multilingual word representations, similar to the basic additive model proposed here, which, like the approaches above, relies on a bag-of-words model for sentence representations. |
Abstract | Following a probabilistic decipherment approach, we first introduce a new framework for decipherment training that is flexible enough to incorporate any number/type of features (besides simple bag-of-words ) as side-information used for estimating translation models. |
Bayesian MT Decipherment via Hash Sampling | Firstly, we would like to include as many features as possible to represent the source/target words in our framework besides simple bag-of-words context similarity (for example, left-context, right-context, and other general-purpose features based on topic models, etc.). |
Introduction | Secondly, we introduce a new feature-based representation for sampling translation candidates that allows one to incorporate any amount of additional features (beyond simple bag-of-words ) as side-information during decipherment training. |
Approach | We use the document’s binary bag-of-words vector vj, and compute the document’s vector space representation through the matrix-vector product (Dij. |
Results and Discussion | Features Number of training examples + Bag-of-words features .5K 5K 20K | .5K 5K 20K |
Results and Discussion | Additional features: Across all embeddings, appending the document’s binary bag-of-words representation increases classification accuracy. |
Abstract | Second, by going beyond a bag-of-words approach, it takes into account the inherent sequential nature of utterances to learn semantic classes based on context. |
Experiments | Similarly, if no Markov properties are used ( bag-of-words ), MTR reduces to w—LDA. |
Related Work and Motivation | Standard topic models, such as Latent Dirichlet Allocation (LDA) (Blei et al., 2003), use a bag-of-words approach, which disregards word order and clusters words together that appear in a similar global context. |
Identifying Key Medical Relations | The similarity of two sentences is defined as the bag-of-words similarity of the dependency paths connecting arguments. |
Relation Extraction with Manifold Models | 0 (7) Bag-of-words features modeling the dependency path. |
Relation Extraction with Manifold Models | o (8) Bag-of-words features modeling the whole sentence. |
Abstract | By performing probability integral transform, our approach moves beyond the standard count-based bag-of-words models in NLP, and improves previous work on text regression by incorporating the correlation among local features in the form of semiparametric Gaussian copula. |
Copula Models for Text Regression | By doing this, we are essentially performing probability integral transform— an important statistical technique that moves beyond the count-based bag-of-words feature space to marginal cumulative density functions space. |
Copula Models for Text Regression | This is of crucial importance to modeling text data: instead of using the classic bag-of-words representation that uses raw counts, we are now working with uniform marginal CDFs, which helps coping with the overfitting issue due to noise and data sparsity. |
Query Classification | The baseline system uses only the words in the queries as features (the bag-of-words representation), treating the query classification problem as a typical text categorization problem. |
Query Classification | Here, bow indicates the use of bag-of-words features; WN refers to word clusters of size N; and PN refers to phrase clusters of size N. All the clusters are soft clusters created with the web corpus using 3-word context windows. |
Query Classification | The bag-of-words features alone have dismal performance. |
Introduction | bag-of-words or indivisible n-gram). |
Related Work | One method considers the phrases as bag-of-words and employs a convolution model to transform the word embeddings to phrase embeddings (Collobert et al., 2011; Kalchbrenner and Blunsom, 2013). |
Related Work | (2013) also use bag-of-words but learn BLEU sensitive phrase embeddings. |