Index of papers in Proc. ACL 2013 that mention
  • bag-of-words
Nastase, Vivi and Strapparava, Carlo
Abstract
In a straightforward bag-of-words experimental setup we add etymological ancestors of the words in the documents, and investigate the performance of a model built on English data, on Italian test data (and viceversa).
Abstract
The results show not only statistically significant, but a large improvement — a jump of almost 40 points in Fl-score — over the raw (vanilla bag-of-words ) representation.
Cross Language Text Categorization
The most frequently, and successfully, used document representation is the bag-of-words (BoWs).
Cross Language Text Categorization
As is commonly done in text categorization (Sebastiani, 2005), the documents in our data are represented as bag-of-words , and classification is done using support vector machines (SVMs).
Cross Language Text Categorization
The bag-of-words representation for each document is expanded with the corresponding etymological features.
Discussion
Feature filtering is commonly done in machine learning when the data has many features, and in text categorization when using the bag-of-words representation in particular.
Discussion
The difference in results on the two dictionary versions was significant: a 4 and 5 points increase respectively in micro-averaged Fl-score in the bag-of-words setting for English trainingfltalian testing and Italian trainingflEnglish testing, and a 2 and 6 points increase in the LSA setting.
Introduction
We start with the basic setup, representing the documents as bag-of-words , where we train a model on the English training data, and use this model to categorize documents from the Italian test data (and viceversa).
Introduction
We then add the etymological roots of the words in the data to the bag-of-words , and notice a large — 21 points — increase in performance in terms of Fl-score.
Introduction
We then use the bag-of-words representation of the training data to build a semantic space using LSA, and use the generated word vectors to represent the training and test data.
bag-of-words is mentioned in 12 sentences in this paper.
Topics mentioned in this paper:
Yih, Wen-tau and Chang, Ming-Wei and Meek, Christopher and Pastusiak, Andrzej
Conclusions
Following the word-alignment paradigm, we find that the rich lexical semantic information improves the models consistently in the unstructured bag-of-words setting and also in the framework of learning latent structures.
Experiments
For the unstructured, bag-of-words setting, we tested logistic regression (LR) and boosted decision trees (BDT).
Introduction
Due to the variety of word choices and inherent ambiguities in natural languages, bag-of-words approaches with simple surface-form word matching tend to produce brittle results with poor prediction accuracy (Bilotti et al., 2007).
Learning QA Matching Models
In this section, we investigate the effectiveness of various learning models for matching questions and sentences, including the bag-of-words setting
Learning QA Matching Models
5.1 Bag-of-Words Model
Learning QA Matching Models
The bag-of-words model treats each question and sentence as an unstructured bag of words.
Problem Definition
For instance, if we assume a naive complete bipartite matching, then effectively it reduces to the simple bag-of-words model.
Related Work
Observing the limitations of the bag-of-words models, Wang et al.
bag-of-words is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Xie, Boyi and Passonneau, Rebecca J. and Wu, Leon and Creamer, Germán G.
Experiments
Experiments evaluate the FWD and SemTree feature spaces compared to two baselines: bag-of-words (BOW) and supervised latent Dirichlet allocation (sLDA) (Blei and McAuliffe, 2007).
Introduction
Our main contribution is a novel tree representation based on semantic frame parses that performs significantly better than enriched bag-of-words vectors.
Introduction
On the polarity task, the semantic frame features encoded as trees perform significantly better across years and sectors than bag-of-words vectors (BOW), and outperform BOW vectors enhanced with semantic frame features, and a supervised topic modeling approach.
Methods
Table 1 lists 24 types of features, including semantic Frame attributes, bag-of-Words , and scores for words in the Dictionary of Affect in Language by part of speech (pDAL).
Methods
Bag-of-Words features include term frequency and tfidf of unigrams, bigrams, and trigrams.
Motivation
Bag-of-Words (BOW) document representation is difficult to surpass for many document classification tasks, but cannot capture the degree of semantic similarity among these sentences.
Related Work
NLP has recently been applied to financial text for market analysis, primarily using bag-of-words (BOW) document representation.
Related Work
Table 1: FWD features (Frame, bag-of-Words , part-of-speech DAL score) and their value types.
bag-of-words is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Sarioglu, Efsun and Yadav, Kabir and Choi, Hyeong-Ah
Abstract
Representing reports according to their topic distributions is more compact than bag-of-words representation and can be processed faster than raw text in subsequent automated processes.
Background
2.1 Bag-of-Words (BOW) Representation
Background
One way of doing this is bag-of-words (BoW) representation where each document becomes a vector of its words/tokens.
Conclusion
Firstly, bag-of-words representation is replaced with topic vectors which provide good dimensionality reduction and still get comparable classification performance.
bag-of-words is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Celikyilmaz, Asli and Hakkani-Tur, Dilek and Tur, Gokhan and Sarikaya, Ruhi
Abstract
Second, by going beyond a bag-of-words approach, it takes into account the inherent sequential nature of utterances to learn semantic classes based on context.
Experiments
Similarly, if no Markov properties are used ( bag-of-words ), MTR reduces to w—LDA.
Related Work and Motivation
Standard topic models, such as Latent Dirichlet Allocation (LDA) (Blei et al., 2003), use a bag-of-words approach, which disregards word order and clusters words together that appear in a similar global context.
bag-of-words is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Labutov, Igor and Lipson, Hod
Approach
We use the document’s binary bag-of-words vector vj, and compute the document’s vector space representation through the matrix-vector product (Dij.
Results and Discussion
Features Number of training examples + Bag-of-words features .5K 5K 20K | .5K 5K 20K
Results and Discussion
Additional features: Across all embeddings, appending the document’s binary bag-of-words representation increases classification accuracy.
bag-of-words is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Ravi, Sujith
Abstract
Following a probabilistic decipherment approach, we first introduce a new framework for decipherment training that is flexible enough to incorporate any number/type of features (besides simple bag-of-words ) as side-information used for estimating translation models.
Bayesian MT Decipherment via Hash Sampling
Firstly, we would like to include as many features as possible to represent the source/target words in our framework besides simple bag-of-words context similarity (for example, left-context, right-context, and other general-purpose features based on topic models, etc.).
Introduction
Secondly, we introduce a new feature-based representation for sampling translation candidates that allows one to incorporate any amount of additional features (beyond simple bag-of-words ) as side-information during decipherment training.
bag-of-words is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: