Index of papers in Proc. ACL that mention
  • feature space
Bollegala, Danushka and Weir, David and Carroll, John
Distribution Prediction
To reduce the dimensionality of the feature space , and create dense representations for words, we perform SVD on F. We use the left singular vectors corresponding to the k largest singular values to compute a rank k approximation F, of F. We perform truncated SVD using SVDLIBCZ.
Distribution Prediction
Each row in F is considered as representing a word in a lower k (<<nc) dimensional feature space corresponding to a particular domain.
Distribution Prediction
Distribution prediction in this lower dimensional feature space is preferrable to prediction over the original feature space because there are reductions in overfit-ting, feature sparseness, and the learning time.
Domain Adaptation
increased the train time due to the larger feature space .
Experiments and Results
Therefore, when the overlap be-ween the vocabularies used in the source and the arget domains is small, fired cannot reduce the mismatch between the feature spaces .
Experiments and Results
All methods are evalu-ted under the same settings, including train/test plit, feature spaces , pivots, and classification al-;orithms so that any differences in performance an be directly attributable to their domain adapt-,bility.
Introduction
tent feature spaces separately for the source and the target domains using Singular Value Decomposition (SVD).
Introduction
Second, we learn a mapping from the source domain latent feature space to the target domain latent feature space using Partial Least Square Regression (PLSR).
Introduction
The SVD smoothing in the first step both reduces the data sparseness in distributional representations of individual words, as well as the dimensionality of the feature space , thereby enabling us to efficiently and accurately learn a prediction model using PLSR in the second step.
O \
Because the dimensionality of the source and target domain feature spaces is equal to h, the complexity of the least square regression problem increases with h. Therefore, larger k values result in overfitting to the train data and classification accuracy is reduced on the target test data.
feature space is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Lassalle, Emmanuel and Denis, Pascal
Abstract
In effect, our approach finds an optimal feature space (derived from a base feature set and indicator set) for discriminating coreferential mention pairs.
Abstract
Although our approach explores a very large space of possible feature spaces , it remains tractable by exploiting the structure of the hierarchies built from the indicators.
Introduction
It is worth noting that, from a machine learning point of view, this is related to feature extraction in that both approaches in effect recast the pairwise classification problem in higher dimensional feature spaces .
Introduction
We will see that this is also equivalent to selecting a single large adequate feature space by using the data.
Modeling pairs
Given a document, the number of mentions is fixed and each pair of mentions follows a certain distribution (that we partly observe in a feature space ).
Modeling pairs
2.2 Feature spaces 2.2.1 Definitions
Modeling pairs
that casts pairs into a feature space F through which we observe them.
feature space is mentioned in 41 sentences in this paper.
Topics mentioned in this paper:
Yang, Qiang and Chen, Yuqiang and Xue, Gui-Rong and Dai, Wenyuan and Yu, Yong
Abstract
In this paper, we present a new learning scenario, heterogeneous transfer learning, which improves learning performance when the data can be in different feature spaces and where no correspondence between data instances in these spaces is provided.
Image Clustering with Annotated Auxiliary Data
Let .7: = { be an image feature space , and V = {vfigll be the image data set.
Introduction
Traditional machine learning relies on the availability of a large amount of data to train a model, which is then applied to test data in the same feature space .
Introduction
A commonality among these methods is that they all require the training data and test data to be in the same feature space .
Introduction
However, in practice, we often face the problem where the labeled data are scarce in their own feature space, whereas there may be a large amount of labeled heterogeneous data in another feature space .
Related Works
This example is related to our image clustering problem because they both rely on data from different feature spaces .
Related Works
Most learning algorithms for dealing with cross-language heterogeneous data require a translator to convert the data to the same feature space .
Related Works
For those data that are in different feature spaces where no translator is available, Davis and Domingos (2008) proposed a Markov-logic-based transfer learning algorithm, which is called deep transfer, for transferring knowledge between biological domains and Web domains.
feature space is mentioned in 15 sentences in this paper.
Topics mentioned in this paper:
Sun, Jun and Zhang, Min and Tan, Chew Lim
Bilingual Tree Kernels
In order to compute the dot product of the feature vectors in the exponentially high dimensional feature space , we introduce the tree kernel functions as follows:
Bilingual Tree Kernels
As a result, we propose the dependent Bilingual Tree kernel (dBTK) to jointly evaluate the similarity across subtree pairs by enlarging the feature space to the Cartesian product of the two substructure sets.
Bilingual Tree Kernels
Here we verify the correctness of the kernel by directly constructing the feature space for the inner product.
Introduction
Both kernels can be utilized within different feature spaces using various representations of the substructures.
Substructure Spaces for BTKs
Given feature spaces defined in the last two sections, we propose a 2-phase subtree alignment model as follows:
Substructure Spaces for BTKs
Feature Space P R F
Substructure Spaces for BTKs
Feature Space P R F
feature space is mentioned in 13 sentences in this paper.
Topics mentioned in this paper:
Li, Jianguo and Brew, Chris
Abstract
In this work, we develop and evaluate a wide range of feature spaces for deriving Levin—style verb classifications (Levin, 1993).
Abstract
We perform the classification experiments using Bayesian Multinomial Regression (an efficient log—linear modeling framework which we found to outperform SVMs for this task) with the proposed feature spaces .
Experiment Setup 4.1 Corpus
Since one of our primary goals is to identify a general feature space that is not specific to any class distinctions, it is of great importance to understand how the classification accuracy is affected when attempting to classify more verbs into a larger number of classes.
Integration of Syntactic and Lexical Information
ACO features integrate at least some degree of syntactic information into the feature space .
Related Work
They define a general feature space that is supposed to be applicable to all Levin classes.
Related Work
(2007) demonstrates that the general feature space they devise achieves a rate of error reduction ranging from 48% to 88% over a chance baseline accuracy, across classification tasks of varying difficulty.
Related Work
However, they also show that their general feature space does not generally improve the classification accuracy over subcategorization frames (see table 1).
Results and Discussion
Another feature set that combines syntactic and lexical information, ACO, which keeps function words in the feature space to preserve syntactic information, outperforms the conventional CO on the majority of tasks.
feature space is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Kobdani, Hamidreza and Schuetze, Hinrich and Schiehlen, Michael and Kamp, Hans
Conclusion
In addition, we showed that our system is a flexible and modular framework that is able to learn from data with different quality (perfect vs noisy markable detection) and domain; and is able to deliver good results for shallow information spaces and competitive results for rich feature spaces .
Introduction
Typical systems use a rich feature space based on lexical, syntactic and semantic knowledge.
Introduction
We view association information as an example of a shallow feature space which contrasts with the rich feature space that is generally used in CoRe.
Introduction
The feature spaces are the shallow and rich feature spaces .
Related Work
These researchers show that a “deterministic” system (essentially a rule-based system) that uses a rich feature space including lexical, syntactic and semantic features can improve CoRe performance.
Results and Discussion
To summarize, the advantages of our self-training approach are: (i) We cover cases that do not occur in the unlabeled corpus (better recall effect); and (ii) we use the leveraging effect of a rich feature space including distance, person, number, gender etc.
feature space is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Mayfield, Elijah and Penstein Rosé, Carolyn
Background
We build a contextual feature space , described in section 4.2, to enhance our baseline bag-of-words model.
Background
4.2 Contextual Feature Space Additions
Background
0 Baseline: This model uses a bag-of-words feature space as input to an SVM classifier.
feature space is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Eidelman, Vladimir and Marton, Yuval and Resnik, Philip
Conclusions and Future Work
We have introduced RM, a novel online margin-based algorithm designed for optimizing high-dimensional feature spaces , which introduces constraints into a large-margin optimizer that bound the spread of the projection of the data while maXimizing the margin.
Conclusions and Future Work
Experimentation in statistical MT yielded significant improvements over several other state-of-the-art optimizers, especially in a high-dimensional feature space (up to 2 BLEU and 4.3 TER on average).
Introduction
However, as the dimension of the feature space increases, generalization becomes increasingly difficult.
Introduction
This criterion performs well in practice at finding a linear separator in high-dimensional feature spaces (Tsochantaridis et al., 2004; Crammer et al., 2006).
Introduction
Chinese-English translation experiments show that our algorithm, RM, significantly outperforms strong state-of-the-art optimizers, in both a basic feature setting and high-dimensional (sparse) feature space (§4).
Learning in SMT
Online large-margin algorithms, such as MIRA, have also gained prominence in SMT, thanks to their ability to learn models in high-dimensional feature spaces (Watanabe et al., 2007; Chiang et al., 2009).
feature space is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Tomasoni, Mattia and Huang, Minlie
Discussion and Future Directions
The Quality assessing component itself could be built as a module that can be adjusted to the kind of Social Media in use; the creation of customized Quality feature spaces would make it possible to handle different sources of UGC (forums, collaborative authoring websites such as Wikipedia, blogs etc.).
Discussion and Future Directions
A great obstacle is the lack of systematically available high quality training examples: a tentative solution could be to make use of clustering algorithms in the feature space ; high and low quality clusters could then be labeled by comparison with examples of virtuous behavior (such as Wikipedia’s Featured Articles).
Experiments
To demonstrate it, we conducted a set of experiments on the original unfiltered dataset to establish whether the feature space \11 was powerful enough to capture the quality of answers; our specific objective was to estimate the
Related Work
(2008) which inspired us in the design of the Quality feature space presented in Section 2.1.
The summarization framework
feature space to capture the following syntactic, behavioral and statistical properties:
The summarization framework
The features mentioned above determined a space \II; An answer a, in such feature space , assumed the vectorial form:
feature space is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
He, Yulan and Lin, Chenghua and Alani, Harith
Abstract
We study the polarity-bearing topics extracted by J ST and show that by augmenting the original feature space with polarity-bearing topics, the in-domain supervised classifiers learned from augmented feature representation achieve the state-of-the-art performance of 95% on the movie review data and an average of 90% on the multi-domain sentiment dataset.
Introduction
We study the polarity-bearing topics extracted by the JST model and show that by augmenting the original feature space with polarity-bearing topics, the performance of in-domain supervised classifiers learned from augmented feature representation improves substantially, reaching the state-of-the-art results of 95% on the movie review data and an average of 90% on the multi-domain sentiment dataset.
Joint Sentiment-Topic (J ST) Model
In this paper, we have studied polarity-bearing topics generated from the J ST model and shown that by augmenting the original feature space with polarity-bearing topics, the in-domain supervised classifiers learned from augmented feature representation achieve the state-of-the-art performance on both the movie review data and the multi-domain sentiment dataset.
Joint Sentiment-Topic (J ST) Model
First, polarity-bearing topics generated by the J ST model were simply added into the original feature space of documents, it is worth investigating attaching different weight to each topic
Related Work
proposed a kemel-mapping function which maps both source and target domains data to a high-dimensional feature space so that data points from the same domain are twice as similar as those from different domains.
feature space is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Wang, William Yang and Mayfield, Elijah and Naidu, Suresh and Dittmar, Jeremiah
Prediction Experiments
In terms of the size of vocabulary W for both the SME and SVM learner, we select three values to represent dense, medium or sparse feature spaces : W1 2 29, W2 2 212, and the full vocabulary size of W3 2 213'8.
Prediction Experiments
For example, with a medium density feature space of 212, SVM obtained an accuracy of 35.8%, but SME achieved an accuracy of 40.9%, which is a 14.2% relative improvement (p < 0.001) over SVM.
Prediction Experiments
When the feature space becomes sparser, the SME obtains an increased relative improvement (10 < 0.001) of 16.1%, using full size of vocabulary.
feature space is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Chen, Yanping and Zheng, Qinghua and Zhang, Wei
Conclusion
The size of the employed lexicon determines the dimension of the feature space .
Discussion
Combining two head nouns may increase the feature space
Discussion
Such a large feature space makes the occurrence of features close to a random distribution, leading to a worse data sparseness.
Feature Construction
Because the number of lexicon entry determines the dimension of the feature space , performance of Omni-word feature is influenced by the lexicon being employed.
Related Work
(2010) proposed a model handling the high dimensional feature space .
feature space is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Xie, Boyi and Passonneau, Rebecca J. and Wu, Leon and Creamer, Germán G.
Experiments
Experiments evaluate the FWD and SemTree feature spaces compared to two baselines: bag-of-words (BOW) and supervised latent Dirichlet allocation (sLDA) (Blei and McAuliffe, 2007).
Experiments
SVM-light with tree kernels3 (Joachims, 2006; Moschitti, 2006) is used for both the FWD and SemTree feature spaces .
Methods
4.2 SemTree Feature Space and Kernels
Methods
We propose SemTree as another feature space to encode semantic information in trees.
Related Work
We explore a rich feature space that relies on frame semantic parsing.
feature space is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Mayfield, Elijah and Adamson, David and Penstein Rosé, Carolyn
Cue Discovery for Content Selection
Our feature space X = {$1, :52, .
Cue Discovery for Content Selection
We search only top candidates for efficiency, following the fixed-width search methodology for feature selection in very high-dimensionality feature spaces (Gutlein et al., 2009).
Experimental Results
We use a binary unigram feature space , and we perform 7-fold cross-va1idation.
Prediction
One challenge of this approach is our underlying unigram feature space - tree-based algorithms are generally poor classifiers for the high-dimensionality, low-information features in a lexical feature space (Han et al., 2001).
Prediction
We exhaustively sweep this feature space , and report the most successful stump rules for each annotation task.
feature space is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Wang, William Yang and Hua, Zhenhao
Copula Models for Text Regression
By doing this, we are essentially performing probability integral transform— an important statistical technique that moves beyond the count-based bag-of-words feature space to marginal cumulative density functions space.
Discussions
By applying the Probability Integral Transform to raw features in the copula model, we essentially avoid comparing apples and oranges in the feature space , which is a common problem in bag-of-features models in NLP.
Experiments
model over squared loss linear regression model are increasing, when working with larger feature spaces .
Related Work
For example, when bag-of-word-unigrams are present in the feature space , it is easier if one does not explicitly model the stochastic dependencies among the words, even though doing so might hurt the predictive power, while the variance from the correlations among the random variables is not explained.
feature space is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Zhao, Qiuye and Marcus, Mitch
Abstract
The score of tag predictions are usually computed in a high-dimensional feature space .
Abstract
Consider a character-based feature function gb(c, t, c) that maps a character-tag pair to a high-dimensional feature space , with respect to an input character sequence c. For a possible word over c of length l , w, = 0,0 .
Abstract
In Section 3.4, we describe a way of mapping words to a character-based feature space .
feature space is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Lampos, Vasileios and Preoţiuc-Pietro, Daniel and Cohn, Trevor
Introduction
In addition, more advanced regularisation functions enable multitask learning schemes that can exploit shared structure in the feature space .
Methods
Group LASSO exploits a predefined group structure on the feature space and tries to achieve sparsity in the group-level, i.e.
Methods
In this optimisation process, we aim to enforce sparsity in the feature space but in a structured manner.
feature space is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Li, Jiwei and Ritter, Alan and Hovy, Eduard
Conclusion and Future Work
Another direction involves incorporating richer feature space for better inference performance, such as multimedia sources (i.e.
Experiments
We evaluate settings described in Section 4.2 i.e., GLOBAL setting, where user-level attribute is predicted directly from jointly feature space and LOCAL setting where user-level prediction is made based on tweet-level prediction along with different inference approaches described in Section 4.4, i.e.
Experiments
This can be explained by the fact that LOCAL(U) sets 256 = 1 once one posting cc 6 L5 is identified as attribute related, while GLOBAL tend to be more meticulous by considering the conjunctive feature space from all postings.
feature space is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Kim, Seokhwan and Banchs, Rafael E. and Li, Haizhou
Wikipedia-based Composite Kernel for Dialog Topic Tracking
Since our hypothesis is that the more similar the dialog histories of the two inputs are, the more similar aspects of topic transtions occur for them, we propose a subsequence kernel (Lodhi et al., 2002) to map the data into a new feature space defined based on the similarity of each pair of history sequences as follows:
Wikipedia-based Composite Kernel for Dialog Topic Tracking
The other kernel incorporates more various types of domain knowledge obtained from Wikipedia into the feature space .
Wikipedia-based Composite Kernel for Dialog Topic Tracking
Since this constructed tree structure represents semantic, discourse, and structural information extracted from the similar Wikipedia paragraphs to each given instance, we can explore these more enriched features to build the topic tracking model using a subset tree kernel (Collins and Duffy, 2002) which computes the similarity between each pair of trees in the feature space as follows:
feature space is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
duVerle, David and Prendinger, Helmut
Abstract
Our method is based on recent advances in the field of statistical machine learning (multivariate capabilities of Support Vector Machines) and a rich feature space .
Building a Discourse Parser
This makes SVM well-fitted to treat classification problems involving relatively large feature spaces such as ours (% 105 features).
Evaluation
The feature space dimension is 136,987.
feature space is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Huang, Jian and Taylor, Sarah M. and Smith, Jonathan L. and Fotiadis, Konstantinos A. and Giles, C. Lee
Methods 2.1 Document Level and Profile Based CDC
Kemelization (Scholkopf and Smola, 2002) is a machine learning technique to transform patterns in the data space to a high-dimensional feature space so that the structure of the data can be more easily and adequately discovered.
Methods 2.1 Document Level and Profile Based CDC
Using the kernel trick, the squared distance between (19(rj) and (l)(wi) in the feature space H can be computed as:
Methods 2.1 Document Level and Profile Based CDC
We measure the kemelized XBI (KXBI) in the feature space as,
feature space is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Prettenhofer, Peter and Stein, Benno
Cross-Language Structural Correspondence Learning
MASK(x, pl) is a function that returns a copy of x where the components associated with the two words in p; are set to zero—which is equivalent to removing these words from the feature space .
Cross-Language Structural Correspondence Learning
Since (6Tv)T = VT6 it follows that this view of CL-SCL corresponds to the induction of a new feature space given by Equation 2.
Cross-Language Text Classification
Le, documents from the training set and the test set map on two non-overlapping regions of the feature space .
feature space is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Tsuruoka, Yoshimasa and Tsujii, Jun'ichi and Ananiadou, Sophia
Introduction
In NLP applications, the dimension of the feature space tends to be very large—it can easily become several millions, so the application of L1 penalty to all features significantly slows down the weight updating process.
Log-Linear Models
Since the dimension of the feature space can be very large, it can significantly slow down the weight update process.
Log-Linear Models
Another merit is that it allows us to perform the application of L1 penalty in a lazy fashion, so that we do not need to update the weights of the features that are not used in the current sample, which leads to much faster training when the dimension of the feature space is large.
feature space is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Arnold, Andrew and Nallapati, Ramesh and Cohen, William W.
Abstract
We present a novel hierarchical prior structure for supervised transfer learning in named entity recognition, motivated by the common structure of feature spaces for this task across natural language data sets.
Introduction
In particular, we develop a novel prior for named entity recognition that exploits the hierarchical feature space often found in natural language domains (§l.2) and allows for the transfer of information from labeled datasets in other domains (§l.3).
Introduction
Representing feature spaces with this kind of tree, besides often coinciding with the explicit language used by common natural language toolkits (Cohen, 2004), has the added benefit of allowing a model to easily back-off, or smooth, to decreasing levels of specificity.
feature space is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Croce, Danilo and Giannone, Cristina and Annesi, Paolo and Basili, Roberto
Abstract
The resulting argument classification model promotes a simpler feature space that limits the potential overfitting effects.
Introduction
The model adopts a simple feature space by relying on a limited set of grammatical properties, thus reducing its learning capacity.
Introduction
As we will see, the accuracy reachable through a restricted feature space is still quite close to the state-of-art, but interestingly the performance drops in out-of-domain tests are avoided.
feature space is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Bramsen, Philip and Escobar-Molano, Martha and Patel, Ami and Alonso, Rafael
Abstract
Because considering such features would increase the size of the feature space , we suspected that including these features would also benefit from algorithmic means of selecting n-grams that are indicative of particular lects, and even from binning these relevant n-grams into sets to be used as features.
Abstract
Although this approach to partitioning is simple and worthy of improvement, it effectively reduced the dimensionality of the feature space .
Abstract
Therefore, as we explored the feature space , small bins of different n-gram lengths were merged.
feature space is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: