Index of papers in Proc. ACL 2013 that mention

bag-of-words

Seen in text as:

bag-of-words (36)
Bag-of-Words (4)

Seen in 43 sentences in 7 papers.

1. Bridging Languages through Etymology: The case of cross language text categorization

Nastase, Vivi and Strapparava, Carlo

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	In a straightforward bag-of-words experimental setup we add etymological ancestors of the words in the documents, and investigate the performance of a model built on English data, on Italian test data (and viceversa).
Abstract	The results show not only statistically significant, but a large improvement — a jump of almost 40 points in Fl-score — over the raw (vanilla bag-of-words ) representation.
Cross Language Text Categorization	The most frequently, and successfully, used document representation is the bag-of-words (BoWs).
Cross Language Text Categorization	As is commonly done in text categorization (Sebastiani, 2005), the documents in our data are represented as bag-of-words , and classification is done using support vector machines (SVMs).
Cross Language Text Categorization	The bag-of-words representation for each document is expanded with the corresponding etymological features.
Discussion	Feature filtering is commonly done in machine learning when the data has many features, and in text categorization when using the bag-of-words representation in particular.
Discussion	The difference in results on the two dictionary versions was significant: a 4 and 5 points increase respectively in micro-averaged Fl-score in the bag-of-words setting for English trainingfltalian testing and Italian trainingflEnglish testing, and a 2 and 6 points increase in the LSA setting.
Introduction	We start with the basic setup, representing the documents as bag-of-words , where we train a model on the English training data, and use this model to categorize documents from the Italian test data (and viceversa).
Introduction	We then add the etymological roots of the words in the data to the bag-of-words , and notice a large — 21 points — increase in performance in terms of Fl-score.
Introduction	We then use the bag-of-words representation of the training data to build a semantic space using LSA, and use the generated word vectors to represent the training and test data.

bag-of-words is mentioned in 12 sentences in this paper.

Topics mentioned in this paper:

2. Question Answering Using Enhanced Lexical Semantic Models

Yih, Wen-tau and Chang, Ming-Wei and Meek, Christopher and Pastusiak, Andrzej

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusions	Following the word-alignment paradigm, we find that the rich lexical semantic information improves the models consistently in the unstructured bag-of-words setting and also in the framework of learning latent structures.
Experiments	For the unstructured, bag-of-words setting, we tested logistic regression (LR) and boosted decision trees (BDT).
Introduction	Due to the variety of word choices and inherent ambiguities in natural languages, bag-of-words approaches with simple surface-form word matching tend to produce brittle results with poor prediction accuracy (Bilotti et al., 2007).
Learning QA Matching Models	In this section, we investigate the effectiveness of various learning models for matching questions and sentences, including the bag-of-words setting
Learning QA Matching Models	5.1 Bag-of-Words Model
Learning QA Matching Models	The bag-of-words model treats each question and sentence as an unstructured bag of words.
Problem Definition	For instance, if we assume a naive complete bipartite matching, then effectively it reduces to the simple bag-of-words model.
Related Work	Observing the limitations of the bag-of-words models, Wang et al.

bag-of-words is mentioned in 10 sentences in this paper.

Topics mentioned in this paper:

3. Semantic Frames to Predict Stock Price Movement

Xie, Boyi and Passonneau, Rebecca J. and Wu, Leon and Creamer, Germán G.

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Experiments evaluate the FWD and SemTree feature spaces compared to two baselines: bag-of-words (BOW) and supervised latent Dirichlet allocation (sLDA) (Blei and McAuliffe, 2007).
Introduction	Our main contribution is a novel tree representation based on semantic frame parses that performs significantly better than enriched bag-of-words vectors.
Introduction	On the polarity task, the semantic frame features encoded as trees perform significantly better across years and sectors than bag-of-words vectors (BOW), and outperform BOW vectors enhanced with semantic frame features, and a supervised topic modeling approach.
Methods	Table 1 lists 24 types of features, including semantic Frame attributes, bag-of-Words , and scores for words in the Dictionary of Affect in Language by part of speech (pDAL).
Methods	Bag-of-Words features include term frequency and tfidf of unigrams, bigrams, and trigrams.
Motivation	Bag-of-Words (BOW) document representation is difficult to surpass for many document classification tasks, but cannot capture the degree of semantic similarity among these sentences.
Related Work	NLP has recently been applied to financial text for market analysis, primarily using bag-of-words (BOW) document representation.
Related Work	Table 1: FWD features (Frame, bag-of-Words , part-of-speech DAL score) and their value types.

bag-of-words is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

4. Topic Modeling Based Classification of Clinical Reports

Sarioglu, Efsun and Yadav, Kabir and Choi, Hyeong-Ah

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Representing reports according to their topic distributions is more compact than bag-of-words representation and can be processed faster than raw text in subsequent automated processes.
Background	2.1 Bag-of-Words (BOW) Representation
Background	One way of doing this is bag-of-words (BoW) representation where each document becomes a vector of its words/tokens.
Conclusion	Firstly, bag-of-words representation is replaced with topic vectors which provide good dimensionality reduction and still get comparable classification performance.

bag-of-words is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

topic model (23)
SVM (14)
LDA (6)

5. Semi-Supervised Semantic Tagging of Conversational Understanding using Markov Topic Regression

Celikyilmaz, Asli and Hakkani-Tur, Dilek and Tur, Gokhan and Sarikaya, Ruhi

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Second, by going beyond a bag-of-words approach, it takes into account the inherent sequential nature of utterances to learn semantic classes based on context.
Experiments	Similarly, if no Markov properties are used ( bag-of-words ), MTR reduces to w—LDA.
Related Work and Motivation	Standard topic models, such as Latent Dirichlet Allocation (LDA) (Blei et al., 2003), use a bag-of-words approach, which disregards word order and clusters words together that appear in a similar global context.

bag-of-words is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

6. Re-embedding words

Labutov, Igor and Lipson, Hod

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Approach	We use the document’s binary bag-of-words vector vj, and compute the document’s vector space representation through the matrix-vector product (Dij.
Results and Discussion	Features Number of training examples + Bag-of-words features .5K 5K 20K \| .5K 5K 20K
Results and Discussion	Additional features: Across all embeddings, appending the document’s binary bag-of-words representation increases classification accuracy.

bag-of-words is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

7. Scalable Decipherment for Machine Translation via Hash Sampling

Ravi, Sujith

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Following a probabilistic decipherment approach, we first introduce a new framework for decipherment training that is flexible enough to incorporate any number/type of features (besides simple bag-of-words ) as side-information used for estimating translation models.
Bayesian MT Decipherment via Hash Sampling	Firstly, we would like to include as many features as possible to represent the source/target words in our framework besides simple bag-of-words context similarity (for example, left-context, right-context, and other general-purpose features based on topic models, etc.).
Introduction	Secondly, we introduce a new feature-based representation for sampling translation candidates that allows one to incorporate any amount of additional features (beyond simple bag-of-words ) as side-information during decipherment training.

bag-of-words is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: