Distribution Prediction | To reduce the dimensionality of the feature space , and create dense representations for words, we perform SVD on F. We use the left singular vectors corresponding to the k largest singular values to compute a rank k approximation F, of F. We perform truncated SVD using SVDLIBCZ. |
Distribution Prediction | Each row in F is considered as representing a word in a lower k (<<nc) dimensional feature space corresponding to a particular domain. |
Distribution Prediction | Distribution prediction in this lower dimensional feature space is preferrable to prediction over the original feature space because there are reductions in overfit-ting, feature sparseness, and the learning time. |
Domain Adaptation | increased the train time due to the larger feature space . |
Experiments and Results | Therefore, when the overlap be-ween the vocabularies used in the source and the arget domains is small, fired cannot reduce the mismatch between the feature spaces . |
Experiments and Results | All methods are evalu-ted under the same settings, including train/test plit, feature spaces , pivots, and classification al-;orithms so that any differences in performance an be directly attributable to their domain adapt-,bility. |
Introduction | tent feature spaces separately for the source and the target domains using Singular Value Decomposition (SVD). |
Introduction | Second, we learn a mapping from the source domain latent feature space to the target domain latent feature space using Partial Least Square Regression (PLSR). |
Introduction | The SVD smoothing in the first step both reduces the data sparseness in distributional representations of individual words, as well as the dimensionality of the feature space , thereby enabling us to efficiently and accurately learn a prediction model using PLSR in the second step. |
O \ | Because the dimensionality of the source and target domain feature spaces is equal to h, the complexity of the least square regression problem increases with h. Therefore, larger k values result in overfitting to the train data and classification accuracy is reduced on the target test data. |
Conclusion | The size of the employed lexicon determines the dimension of the feature space . |
Discussion | Combining two head nouns may increase the feature space |
Discussion | Such a large feature space makes the occurrence of features close to a random distribution, leading to a worse data sparseness. |
Feature Construction | Because the number of lexicon entry determines the dimension of the feature space , performance of Omni-word feature is influenced by the lexicon being employed. |
Related Work | (2010) proposed a model handling the high dimensional feature space . |
Copula Models for Text Regression | By doing this, we are essentially performing probability integral transform— an important statistical technique that moves beyond the count-based bag-of-words feature space to marginal cumulative density functions space. |
Discussions | By applying the Probability Integral Transform to raw features in the copula model, we essentially avoid comparing apples and oranges in the feature space , which is a common problem in bag-of-features models in NLP. |
Experiments | model over squared loss linear regression model are increasing, when working with larger feature spaces . |
Related Work | For example, when bag-of-word-unigrams are present in the feature space , it is easier if one does not explicitly model the stochastic dependencies among the words, even though doing so might hurt the predictive power, while the variance from the correlations among the random variables is not explained. |
Wikipedia-based Composite Kernel for Dialog Topic Tracking | Since our hypothesis is that the more similar the dialog histories of the two inputs are, the more similar aspects of topic transtions occur for them, we propose a subsequence kernel (Lodhi et al., 2002) to map the data into a new feature space defined based on the similarity of each pair of history sequences as follows: |
Wikipedia-based Composite Kernel for Dialog Topic Tracking | The other kernel incorporates more various types of domain knowledge obtained from Wikipedia into the feature space . |
Wikipedia-based Composite Kernel for Dialog Topic Tracking | Since this constructed tree structure represents semantic, discourse, and structural information extracted from the similar Wikipedia paragraphs to each given instance, we can explore these more enriched features to build the topic tracking model using a subset tree kernel (Collins and Duffy, 2002) which computes the similarity between each pair of trees in the feature space as follows: |
Conclusion and Future Work | Another direction involves incorporating richer feature space for better inference performance, such as multimedia sources (i.e. |
Experiments | We evaluate settings described in Section 4.2 i.e., GLOBAL setting, where user-level attribute is predicted directly from jointly feature space and LOCAL setting where user-level prediction is made based on tweet-level prediction along with different inference approaches described in Section 4.4, i.e. |
Experiments | This can be explained by the fact that LOCAL(U) sets 256 = 1 once one posting cc 6 L5 is identified as attribute related, while GLOBAL tend to be more meticulous by considering the conjunctive feature space from all postings. |