Index of papers in Proc. ACL that mention
  • labeled data
Manshadi, Mehdi and Li, Xiao
A grammar for semantic tagging
For these reasons, we evaluate our grammar model on the task of automatic tagging of queries for which we have labeled data available.
A grammar for semantic tagging
The model, however, extends the lexicon by including words discovered from labeled data (if available).
A grammar for semantic tagging
It is true that the word “vs” plays a critical role in this query, representing that the user’s intention is to compare the two brands; but as mentioned above in our labeled data such words has left unlabeled.
Discriminative re-ranking
In particular, when there is no or a very small amount of labeled data , a parser could still work by using unsupervised learning approaches to learn the rules, or by simply using a set of hand-built rules (as we did above for the task of semantic tagging).
Discriminative re-ranking
When there is enough labeled data, then a discriminative model can be trained on the labeled data to learn contextual information and to further enhance the tagging performance.
Introduction
Preparing labeled data , however, is very expensive.
Introduction
Therefore in cases where there is no or a small amount of labeled data available, these models do a poor job.
Introduction
As seen later, in the case where there is not a large amount of labeled data available, the parser part is the dominant part of the module and performs reasonably well.
Summary
This is a big advantage of the parser model, because in practice providing labeled data is very expensive but very often the lexicons can be easily extracted from the structured data on the web (for example extracting movie titles from imdb or book titles from Amazon).
labeled data is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Meng, Xinfan and Wei, Furu and Liu, Xiaohua and Zhou, Ming and Xu, Ge and Wang, Houfeng
Abstract
Chinese) using labeled data in the source language (e.g.
Abstract
Most existing work relies on machine translation engines to directly adapt labeled data from the source language to the target language.
Abstract
Experiments on multiple data sets show that CLMM is consistently effective in two settings: (1) labeled data in the target language are unavailable; and (2) labeled data in the target language are also available.
Introduction
Therefore, it is desirable to use the English labeled data to improve sentiment classification of documents in other languages.
Introduction
One direct approach to leveraging the labeled data in English is to use machine translation engines as a black box to translate the labeled data from English to the target language (e.g.
Introduction
First, the vocabulary covered by the translated labeled data is limited, hence many sentiment indicative words can not be learned from the translated labeled data .
labeled data is mentioned in 58 sentences in this paper.
Topics mentioned in this paper:
Li, Fangtao and Pan, Sinno Jialin and Jin, Ou and Yang, Qiang and Zhu, Xiaoyan
Abstract
In this paper, we propose a domain adaptation framework for sentiment- and topic- lexicon co-extraction in a domain of interest where we do not require any labeled data, but have lots of labeled data in another related domain.
Introduction
In this paper, we focus on the co-extraction task of sentiment and topic lexicons in a target domain where we do not have any labeled data, but have plenty of labeled data in a source domain.
Introduction
lize useful labeled data from the source domain as well as exploit the relationships between the topic and sentiment words to propagate information for lexicon construction in the target domain.
Introduction
labeled data be available in the target domain.
labeled data is mentioned in 16 sentences in this paper.
Topics mentioned in this paper:
Bollegala, Danushka and Weir, David and Carroll, John
Abstract
We describe a sentiment classification method that is applicable when we do not have any labeled data for a target domain but have some labeled data for multiple other domains, designated as the source domains.
Experiments
Figure 3: Effect of source domain labeled data .
Experiments
To investigate the impact of the quantity of source domain labeled data on our method, we vary the amount of data from zero to 800 reviews, with equal amounts of positive and negative labeled data .
Experiments
Note that source domain labeled data is used both to create the sentiment sensitive thesaurus as well as to train the sentiment classifier.
Introduction
Supervised learning algorithms that require labeled data have been successfully used to build sentiment classifiers for a specific domain (Pang et al., 2002).
Introduction
positive or negative sentiment) given a small set of labeled data for the source domain, and unlabeled data for both source and target domains.
Introduction
In particular, no labeled data is provided for the target domain.
labeled data is mentioned in 16 sentences in this paper.
Topics mentioned in this paper:
Li, Shoushan and Huang, Chu-Ren and Zhou, Guodong and Lee, Sophia Yat Mei
Introduction
It is well-known that sentiment classification is very domain-specific (Blitzer et al., 2007), so it is critical to eliminate its dependence on a large-scale labeled data for its wide applications.
Related Work
Supervised methods consider sentiment classification as a standard classification problem in which labeled data in a domain are used to train a domain-specific classifier.
Unsupervised Mining of Personal and Impersonal Views
The co-training algorithm is a specific semi-supervised learning approach which starts with a set of labeled data and increases the amount of labeled data using the unlabeled data by bootstrapping (Blum and Mitchell, 1998).
Unsupervised Mining of Personal and Impersonal Views
Input: The labeled data L containing personal sentence set Sbmmwl and impersonal sentence set
Unsupervised Mining of Personal and Impersonal Views
U — persona SU —impersonal Output: New labeled data L Procedure:
labeled data is mentioned in 13 sentences in this paper.
Topics mentioned in this paper:
Persing, Isaac and Ng, Vincent
Abstract
To improve the performance of a cause identification system for the minority classes, we present a bootstrapping algorithm that automatically augments a training set by learning from a small amount of labeled data and a large amount of unlabeled data.
Abstract
Experimental results show that our algorithm yields a relative error reduction of 6.3% in F-measure for the minority classes in comparison to a baseline that learns solely from the labeled data .
Baseline Approaches
mate goal is to evaluate the effectiveness of our bootstrapping algorithm, the baseline approaches only make use of small amounts of labeled data for acquiring classifiers.
Baseline Approaches
To ensure a fair comparison with the first baseline, we do not employ additional labeled data for parameter tuning; rather, we reserve 25% of the available training data for tuning, and use the remaining 75% for classifier
Introduction
The difficulty of a text classification task depends on various factors, but typically, the task can be difficult if (1) the amount of labeled data available for learning the task is small; (2) it involves multiple classes; (3) it involves multi-label categorization, where more than one label can be assigned to each document; (4) the class distributions are skewed, with some categories significantly outnumbering the others; and (5) the documents belong to the same domain (e. g., movie review classification).
Introduction
Such methods, however, are unlikely to perform equally well for our cause identification task given our small labeled set, as the minority class prediction problem is complicated by the scarcity of labeled data .
Introduction
More specifically, given the scarcity of labeled data , many words that are potentially correlated with a shaper (especially a minority shaper) may not appear in the training set, and the lack of such useful indicators could hamper the acquisition of an accurate classifier via supervised learning techniques.
Our Bootstrapping Algorithm
One of the potential weaknesses of the two baselines described in the previous section is that the classifiers are trained on only a small amount of labeled data .
Our Bootstrapping Algorithm
The situation is somewhat aggravated by the fact that we are adopting a one-versus-all scheme for generating training instances for a particular shaper, which, together with the small amount of labeled data , implies that only a couple of positive instances may be available for training the classifier for a minority class.
Our Bootstrapping Algorithm
The reason we impose the “at least three” requirement is precision: we want to ensure, with a reasonable level of confidence, that the unlabeled documents chosen to augment P should indeed be labeled with the shaper under consideration, as incorrectly labeled documents would contaminate the labeled data, thus accelerating the deterioration of the quality of the automatically labeled data in subsequent bootstrapping iterations and adversely affecting the accuracy of the classifier trained on it (Pierce and Cardie, 2001).
labeled data is mentioned in 12 sentences in this paper.
Topics mentioned in this paper:
Lu, Bin and Tan, Chenhao and Cardie, Claire and K. Tsou, Benjamin
A Joint Model with Unlabeled Parallel Text
where v E {1,2} denotes L1 or L2; the first term on the right-hand side is the likelihood of labeled data for both D1 and D2; and the second term is the likelihood of the unlabeled parallel data U.
A Joint Model with Unlabeled Parallel Text
By further considering the weight to ascribe to the unlabeled data vs. the labeled data (and the weight for the L2-norm regularization), we get the following regularized joint log likelihood to be maximized:
A Joint Model with Unlabeled Parallel Text
where the first term on the right-hand side is the log likelihood of the labeled data from both D1 and D2; the second is the log likelihood of the unlabeled parallel data U, multiplied by Al 2 O, a constant that controls the contribution of the unlabeled data; and x12 2 0 is a regularization constant that penalizes model complexity or large feature weights.
Abstract
We present a novel approach for joint bilingual sentiment classification at the sentence level that augments available labeled data in each language with unlabeled parallel data.
Introduction
Given the labeled data in each language, we propose an approach that exploits an unlabeled parallel corpus with the following
Introduction
The proposed maximum entropy-based EM approach jointly learns two monolingual sentiment classifiers by treating the sentiment labels in the unlabeled parallel text as unobserved latent variables, and maximizes the regularized joint likelihood of the language-specific labeled data together with the inferred sentiment labels of the parallel text.
labeled data is mentioned in 37 sentences in this paper.
Topics mentioned in this paper:
Huang, Hongzhao and Cao, Yunbo and Huang, Xiaojiang and Ji, Heng and Lin, Chin-Yew
Abstract
In addition, it is challenging to generate sufficient high quality labeled data for supervised models with low cost.
Abstract
Compared to the state-of-the-art supervised model trained from 100% labeled data, our proposed approach achieves comparable performance with 31% labeled data and obtains 5% absolute Fl gain with 50% labeled data .
Conclusions
By studying three novel fine-grained relations, detecting semantically-related information with semantic meta paths, and exploiting the data manifolds in both unlabeled and labeled data for collective inference, our work can dramatically save annotation cost and achieve better performance, thus shed light on the challenging wikification task for tweets.
Experiments
In comparision with the supervised baseline proposed by (Meij et al., 2012), our model SSRega1 relying on local compatibility already achieves comparable performance with 50% of labeled data .
Experiments
We can easily see that our proposed approach using 50% labeled data achieves similar performance with the state-of-the-art supervised model with 100% labeled data .
Experiments
6.4 Effect of Labeled Data Size
Introduction
Sufficient labeled data is crucial for supervised models.
Introduction
In order to address these unique challenges for wikification for the short tweets, we employ graph-based semi-supervised learning algorithms (Zhu et al., 2003; Smola and Kondor, 2003; Blum et al., 2004; Zhou et al., 2004; Talukdar and Crammer, 2009) for collective inference by exploiting the manifold (cluster) structure in both unlabeled and labeled data .
labeled data is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Nguyen, Minh Luan and Tsang, Ivor W. and Chai, Kian Ming A. and Chieu, Hai Leong
Abstract
We address two challenges: negative transfer when knowledge in source domains is used without considering the differences in relation distributions; and lack of adequate labeled samples for rarer relations in the new domain, due to a small labeled data set and imbalance relation distributions.
Introduction
However, most supervised learning algorithms require adequate labeled data for every relation type to be extracted.
Introduction
Instead, it can be more cost-effective to adapt an existing relation extraction system to the new domain using a small set of labeled data .
Introduction
Together with imbalanced relation distributions inherent in the domain, this can cause some rarer relations to constitute only a very small proportion of the labeled data set.
Problem Statement
The target domain has a few labeled data D; 2 {(xi, yi)}:.
Problem Statement
For the sth source domain, we have an adequate labeled data set DS.
Related Work
However, purely supervised relation extraction methods assume the availability of sufficient labeled data , which may be costly to obtain for new domains.
Related Work
We address this by augmenting a small labeled data set with other information in the domain adaptation setting.
Related Work
To create labeled data , the texts are dependency-parsed, and the domain-independent patterns on the parses form the basis for extractions.
Robust Domain Adaptation
By augmenting with unlabeled data D;,, we aim to alleviate the effect of imbalanced relation distribution, which causes a lack of labeled samples for rarer classes in a small set of labeled data .
labeled data is mentioned in 22 sentences in this paper.
Topics mentioned in this paper:
Dasgupta, Sajib and Ng, Vincent
Evaluation
Owing to the randomness involved in the choice of labeled data , all baseline results are averaged over ten independent runs for each fold.
Evaluation
We implemented Kamvar et al.’s (2003) semi-supervised spectral clustering algorithm, which incorporates labeled data into the clustering framework in the form of must-link and cannot-link constraints.
Evaluation
We employ as our second baseline a transductive SVM5 trained using 100 points randomly sampled from the training folds as labeled data and the remaining 1900 points as unlabeled data.
Introduction
perimental results on five sentiment classification datasets demonstrate that our system can generate high-quality labeled data from unambiguous reviews, which, together with a small number of manually labeled reviews selected by the active learner, can be used to effectively classify ambiguous reviews in a discriminative fashion.
Our Approach
However, in the absence of labeled data , it is not easy to assess feature relevance.
Our Approach
Even if labeled data were present, the ambiguous points might be better handled by a discriminative leam-ing system than a clustering algorithm, as discriminative learners are more sophisticated, and can handle ambiguous feature space more effectively.
Our Approach
In self-training, we iteratively train a classifier on the data labeled so far, use it to classify the unlabeled instances, and augment the labeled data with the most confidently labeled instances.
labeled data is mentioned in 14 sentences in this paper.
Topics mentioned in this paper:
Celikyilmaz, Asli and Thint, Marcus and Huang, Zhiheng
Abstract
With a new representation of graph-based SSL on QA datasets using only a handful of features, and under limited amounts of labeled data , we show improvement in generalization performance over state-of-the-art QA models.
Experiments
In the first part, we randomly selected subsets of labeled training dataset X2 C X L with different sample sizes, ={l% >|< 711;, 5% >|< 71L, 10% * 71L, 25% * 71L, 50% * 71L, 100% * 711;}, where 71,; represents the sample size of X L. At each random selection, the rest of the labeled dataset is hypothetically used as unlabeled data to verify the performance of our SSL using different sizes of labeled data .
Experiments
Note from Table 2 that, when the number of labeled data is small (7133 < 10% * 7113), graph based SSL, gSum SSL, has a better performance compared to SVM.
Experiments
Especially in Hybrid graph-Summary SSL, Hybrid gSum SSL, when the number of labeled data is small (7133 < 25% >x< 7113) performance improvement is better than rest
Graph Summarization
The labeled data points, i.e., X L, are appended to each of these selected X 5 datasets, X5 = {mi,...mfn_l} U XL.
Graph Summarization
The local density constraints become crucial for inference where summarized labeled data are used instead of overall dataset.
Graph Summarization
As a result q number of summary datasets XS each of which with nb labeled data points are combined to form a representative sample of X, X = {X8 }:=1 reducing the number of data from n to a much smaller number of data, p = q * nb << n. So the new summary of the X can be represented with X = {Xi}§=1.
Introduction
One of the challenges we face with is that we have very limited amount of labeled data , i.e., correctly labeled (true/false entailment) sentences.
Introduction
We consider situations where there are much more unlabeled data, X U, than labeled data , X L, i.e., 711; << 711].
Introduction
— application of a graph-summarization method to enable learning from a very large unlabeled and rather small labeled data , which would not have been feasible for most sophisticated learning tools in section 4.
labeled data is mentioned in 13 sentences in this paper.
Topics mentioned in this paper:
Bollegala, Danushka and Weir, David and Carroll, John
Datasets
(2006), we use sections 2 — 21 from Wall Street Journal (WSJ) as the source domain labeled data .
Distribution Prediction
Our distribution prediction learning method is unsupervised in the sense that it does not require manually labeled data for a particular task from any of the domains.
Domain Adaptation
The main reason that a model trained only on the source domain labeled data performs poorly in the target domain is the feature mismatch — few features in target domain test instances appear in source domain training instances.
Experiments and Results
For each domain, the accuracy obtained y a classifier trained using labeled data from that
Experiments and Results
This upper baseline represents the classification accuracy we could hope to obtain if we were to have labeled data for the target domain.
Introduction
Our proposed cross-domain word distribution prediction method is unsupervised in the sense that it does not require any labeled data in either of the two steps.
O \
Unlike our distribution prediction method, which is unsupervised, SST requires labeled data for the source domain to learn a feature mapping between a source and a target domain in the form of a thesaurus.
Related Work
(2006) append the source domain labeled data with predicted pivots (i.e.
Related Work
The unsupervised DA setting that we consider does not assume the availability of labeled data for the target domain.
Related Work
However, if a small amount of labeled data is available for the target domain, it can be used to further improve the performance of DA tasks (Xiao et al., 2013; Daume III, 2007).
labeled data is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Lucas, Michael and Downey, Doug
Abstract
SSL techniques are often effective in text classification, where labeled data is scarce but large unlabeled corpora are readily available.
Introduction
Semi-supervised Learning (SSL) is a Machine Learning (ML) approach that utilizes large amounts of unlabeled data, combined with a smaller amount of labeled data , to learn a target function (Zhu, 2006; Chapelle et al., 2006).
Introduction
Then, for a given target class and labeled data set, we utilize the statistics to improve a classifier.
Introduction
The marginal statistics are used as a constraint to improve the class-conditional probability estimates P (w | +) and P (w | —) for the positive and negative classes, which are often noisy when estimated over sparse labeled data sets.
Problem Definition
In particular, SFE uses the equality P(+|w) = P(+, and estimates the rhs using P computed over all the unlabeled data, rather than using only labeled data as in standard MNB.
Problem Definition
Further, it can be shown that as P(w) of a word 21) in the unlabeled data becomes larger than that in the labeled data , SFE’s estimate of the ratio P(w|+) /P(w|—) approaches one.
Problem Definition
Depending on the labeled data , such an estimate can be arbitrarily inaccurate.
labeled data is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Li, Zhenghua and Zhang, Min and Chen, Wenliang
Abstract
With a conditional random field based probabilistic dependency parser, our training objective is to maximize mixed likelihood of labeled data and auto-parsed unlabeled data with ambiguous labelings.
Ambiguity-aware Ensemble Training
not sufficiently covered in manually labeled data .
Ambiguity-aware Ensemble Training
Since 13’ contains much more instances than D (1.7M vs. 40K for English, and 4M vs. 16K for Chinese), it is likely that the unlabeled data may overwhelm the labeled data during SGD training.
Ambiguity-aware Ensemble Training
1: Input: Labeled data D = {(Xi,di)}£\;1, and unlabeled data 13’ = {(u,,v,)}§‘:1;Parameters: I, N1, M1, b : Output: w : Initialization: Wm) = O, k: = 0; : forz' = 1 to I do {iterations} Randomly select N1 instances from D and M1 instances from D’ to compose a new dataset Di, and shuffle it.
Conclusions
The training objective is to maximize the mixed likelihood of both the labeled data and the auto-parsed unlabeled data with ambiguous labelings.
Introduction
Such sentences can provide more discriminative instances for training which may be unavailable in labeled data .
Introduction
Evaluation on labeled data shows the oracle accuracy of parse forest is much higher than that of l-best outputs of single parsers (see Table 3).
Introduction
Finally, using a conditional random field (CRF) based probabilistic parser, we train a better model by maximizing mixed likelihood of labeled data and auto-parsed unlabeled data with ambiguous labelings.
labeled data is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Haffari, Gholamreza and Sarkar, Anoop
AL-SMT: Multilingual Setting
When (re-)training the models, two phrase tables are learned for each SMT model: one from the labeled data 11.. and the other one from pseudo-labeled data lU+ (which we call the main and auxiliary phrase tables respectively).
Experiments
We subsampled 5,000 sentences as the labeled data 11.. and 20,000 sentences as [U for the pool of untranslated sentences (while hiding the English part).
Introduction
However, if we start with only a small amount of initial parallel data for the new target language, then translation quality is very poor and requires a very large injection of human labeled data to be effective.
Introduction
In self-training each MT system is retrained using human labeled data plus its own noisy translation output on the unlabeled data.
Introduction
In co-training each MT system is retrained using human labeled data plus noisy translation output from the other MT systems in the ensemble.
Sentence Selection: Single Language Pair
The more frequent a phrase is in the labeled data , the more unimportant it is; since probably we have observed most of its translations.
Sentence Selection: Single Language Pair
In the labeled data L, phrases are the ones which are extracted by the SMT models; but what are the candidate phrases in the unlabeled data U?
Sentence Selection: Single Language Pair
two multinomials, one for labeled data and the other one for unlabeled data.
labeled data is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Pershina, Maria and Min, Bonan and Xu, Wei and Grishman, Ralph
Abstract
However, in some cases a small amount of human labeled data is available.
Available at http://nlp. stanford.edu/software/mimlre. shtml.
Thus, our approach outperforms state-of-the-art model for relation extraction using much less labeled data that was used by Zhang et al., (2012) to outper-
Introduction
In this paper, we present the first effective approach, Guided DS (distant supervision), to incorporate labeled data into distant supervision for extracting relations from sentences.
Introduction
(2012), we generalize the labeled data through feature selection and model this additional information directly in the latent variable approaches.
Introduction
While prior work employed tens of thousands of human labeled examples (Zhang et al., 2012) and only got a 6.5% increase in F-score over a logistic regression baseline, our approach uses much less labeled data (about 1/8) but achieves much higher improvement on performance over stronger baselines.
The Challenge
Simply taking the union of the hand-labeled data and the corpus labeled by distant supervision is not effective since hand-labeled data will be swamped by a larger amount of distantly labeled data .
The Challenge
An effective approach must recognize that the hand-labeled data is more reliable than the automatically labeled data and so must take precedence in cases of conflict.
The Challenge
Instead we propose to perform feature selection to generalize human labeled data into training guidelines, and integrate them into latent variable model.
Training
Upsam—pling the labeled data did not improve the performance either.
labeled data is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Takamatsu, Shingo and Sato, Issei and Nakagawa, Hiroshi
Abstract
This noisy labeled data causes poor extraction performance.
Experiments
We compared the following methods: logistic regression with the labeled data cleaned by the proposed method (PROP), logistic regression with the standard DS labeled data (LR), and MultiR proposed in (Hoffmann et al., 2011) as a state-of-the-art multi-instance learning system.7 For logistic regression, when more than one relation is assigned to a sentence, we simply copied the feature vector and created a training example for each relation.
Introduction
Supervised approaches are limited in scalability because labeled data is expensive to produce.
Introduction
A particularly attractive approach, called distant supervision (DS), creates labeled data by heuristically aligning entities in text with those in a knowledge base, such as Freebase (Mintz et al., 2009).
Introduction
However, the DS assumption can fail, which results in noisy labeled data and this causes poor extraction performance.
Knowledge-based Distant Supervision
DS uses a knowledge base to create labeled data for relation extraction by heuristically matching entity pairs.
Related Work
Previous works (Hoffmann et al., 2010; Yao et al., 2010) have pointed out that the DS assumption generates noisy labeled data , but did not directly address the problem.
Wrong Label Reduction
labeled data generated by DS: LD
Wrong Label Reduction
For relation extraction, we train a classifier for entity pairs using the resultant labeled data .
labeled data is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Titov, Ivan
Empirical Evaluation
For every pair, the semi-supervised methods use labeled data from the source domain and unlabeled data from both domains.
Empirical Evaluation
We compare them with two supervised methods: a supervised model (Base) which is trained on the source domain data only, and another supervised model (In-domain) which is learned on the labeled data from the target domain.
Introduction
In addition to the labeled data from the source domain, they also exploit small amounts of labeled data and/or unlabeled data from the target domain to estimate a more predictive model for the target domain.
Introduction
We use generative latent variable models (LVMs) learned on all the available data: unlabeled data for both domains and on the labeled data for the source domain.
Related Work
Second, their expectation constraints are estimated from labeled data , whereas we are trying to match expectations computed on unlabeled data for two domains.
Related Work
This approach bears some similarity to the adaptation methods standard for the setting where labelled data is available for both domains (Chelba and Acero, 2004; Daume and Marcu, 2006).
The Latent Variable Model
1Among the versions which do not exploit labeled data from the target domain.
The Latent Variable Model
The parameters of this model 6 = (12,10) can be estimated by maximizing joint likelihood L(6) of labeled data for the source domain {330), y(l)}l€3L
The Latent Variable Model
However, given that, first, amount of unlabeled data |SU U TU| normally vastly exceeds the amount of labeled data |SL| and, second, the number of features for each example |a3(l)| is usually large, the label y will have only a minor effect on the mapping from the initial features a: to the latent representation z (i.e.
labeled data is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Celikyilmaz, Asli and Hakkani-Tur, Dilek
Abstract
Using utterances from five domains, our approach shows up to 4.5% improvement on domain and dialog act performance over cascaded approach in which each semantic component is learned sequentially and a supervised joint learning model (which requires fully labeled data ).
Data and Approach Overview
Our algorithm assigns domairfldialog-act/slot labels to each topic at each layer in the hierarchy using labeled data (explained in §4.)
Experiments
Here, we not only want to demonstrate the performance of each component of MCM but also their performance under limited amount of labeled data .
Experiments
When the number of labeled data is small (niL £25%*nL), our WebPrior—MCM has a better performance on domain and act predictions compared to the two baselines.
Experiments
Adding labeled data improves the performance of all models however supervised models benefit more compared to MCM models.
Introduction
The contributions of this paper are as follows: (i) construction of a novel Bayesian framework for semantic parsing of natural language (NL) utterances in a unifying framework in §4, (ii) representation of seed labeled data and information from web queries as informative prior to design a novel utterance understanding model in §3 & §4, (iii) comparison of our results to supervised sequential and joint learning methods on NL utterances in §5.
Introduction
We conclude that our generative model achieves noticeable improvement compared to discriminative models when labeled data is scarce.
labeled data is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Kobdani, Hamidreza and Schuetze, Hinrich and Schiehlen, Michael and Kamp, Hans
Abstract
We show that this unsuperVised system has better CoRe performance than other learning approaches that do not use manually labeled data .
Introduction
Until recently, most approaches tried to solve the problem by binary classification, where the probability of a pair of markables being coreferent is estimated from labeled data .
Introduction
Self-training approaches usually include the use of some manually labeled data .
Introduction
In contrast, our self-trained system is not trained on any manually labeled data and is therefore a completely unsupervised system.
Related Work
not with approaches that make some limited use of labeled data .
Results and Discussion
Thus, this comparison of ACE-2/Ontonotes results is evidence that in a realistic scenario using association information in an unsupervised self-trained system is almost as good as a system trained on manually labeled data .
System Architecture
Automatically Labeled Data
labeled data is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Celikyilmaz, Asli and Hakkani-Tur, Dilek and Tur, Gokhan and Sarikaya, Ruhi
Abstract
Finding concepts in natural language utterances is a challenging task, especially given the scarcity of labeled data for learning semantic ambiguity.
Experiments
First a supervised learning algorithm is used to build a CRF model based on the labeled data .
Introduction
Thus, each latent semantic class corresponds to one of the semantic tags found in labeled data .
Markov Topic Regression - MTR
We assume a fixed K topics corresponding to semantic tags of labeled data .
Markov Topic Regression - MTR
K latent topics to the K semantic tags of our labeled data .
Markov Topic Regression - MTR
labeled data , 712?, based on the log-linear model in Eq.
Semi-Supervised Semantic Labeling
(5) is the loss on the labeled data and £2 regularization on parameters, Ag), from nth iteration, same as standard CRF.
Semi-Supervised Semantic Labeling
The labeled rows ml of the vocabulary matrix, m={wl,m“}, contain only {0,1} values, indicating the word’s observed semantic tags in the labeled data .
labeled data is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Lan, Man and Xu, Yu and Niu, Zhengyu
Abstract
To overcome the shortage of labeled data for implicit discourse relation recognition, previous works attempted to automatically generate training data by removing explicit discourse connectives from sentences and then built models on these synthetic implicit examples.
Implementation Details of Multitask Learning Method
nectives and relations in PDTB and generate synthetic labeled data by removing the connectives.
Implementation Details of Multitask Learning Method
BLLIP North American News Text (Complete) is used as unlabeled data source to generate synthetic labeled data .
Implementation Details of Multitask Learning Method
In comparison with the synthetic labeled data generated from the explicit relations in PDTB, the synthetic labeled data from BLLIP contains more noise.
Multitask Learning for Discourse Relation Prediction
Following these two principles, we create the auxiliary tasks by generating automatically labeled data as follows.
Multitask Learning for Discourse Relation Prediction
Previous work (Marcu and Echihabi, 2002) and (Sporleder and Lascarides, 2008) adopted predefined pattern-based approach to generate synthetic labeled data , where each predefined pattern has one discourse relation label.
Multitask Learning for Discourse Relation Prediction
In contrast, we adopt an automatic approach to generate synthetic labeled data , where each discourse connective between two texts serves as their relation label.
labeled data is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Wang, Chang and Fan, James
Background
Recently, “distant supervision” has emerged to be a popular choice for training relation extractors without using manually labeled data (Mintz et al., 2009; J iang, 2009; Chan and Roth, 2010; Wang et al., 2011; Riedel et al., 2010; Ji et al., 2011; Hoffmann et al., 2011; Sur-deanu et al., 2012; Takamatsu et al., 2012; Min et al., 2013).
Experiments
(1) Manifold Unlabeled: We combined the labeled data and unlabeled set 1 in training.
Experiments
(2) Manifold Predicted Labels: We combined labeled data and unlabeled set 2 in training.
Experiments
beled data and the data from unlabeled set 2 was used as labeled data (With Weights).
Identifying Key Medical Relations
Our current strategy is to integrate all associated types, and rely on the relation detector trained with the labeled data to decide how to weight different types based upon the context.
Introduction
When we build a naive model to detect relations, the model tends to overfit for the labeled data .
Relation Extraction with Manifold Models
Integration of the unlabeled data can help solve overfitting problems when the labeled data is not sufficient.
labeled data is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Zeng, Xiaodong and Wong, Derek F. and Chao, Lidia S. and Trancoso, Isabel
Introduction
In the past years, several proposed supervised joint models (Ng and Low, 2004; Zhang and Clark, 2008; Jiang et al., 2009; Zhang and Clark, 2010) achieved reasonably accurate results, but the outstanding problem among these models is that they rely heavily on a large amount of labeled data , i.e., segmented texts with POS tags.
Introduction
However, the production of such labeled data is extremely time-consuming and expensive (Jiao et al., 2006; J iang et al., 2009).
Introduction
Motivated by the works in (Subramanya et al., 2010; Das and Smith, 2011), for structured problems, graph-based label propagation can be employed to infer valuable syntactic information (n-gram-level label distributions) from labeled data to unlabeled data.
Method
It is especially helpful for the graph to make connections with trigrams that may not have been seen in labeled data but have similar label information.
Method
The first term in Equation (5) is the same as Equation (2), which is the traditional CRFs leam-ing objective function on the labeled data .
Method
To satisfy the characteristic of the semi-supervised learning problem, the train set, i.e., the labeled data , is formed by a relatively small amount of annotated texts sampled from CTB-7.
labeled data is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Wang, Qin Iris and Schuurmans, Dale and Lin, Dekang
Conclusion and Future Work
One obvious direction is to use the whole Penn Treebank as labeled data and use some other unannotated data source as unlabeled data for semi-supervised training.
Efficient Optimization Strategy
0 Step 2, based on the learned parameter weights from the labeled data , update 6 and Yj on each unlabeled sentence alternatively:
Introduction
However, a key drawback of supervised training algorithms is their dependence on labeled data , which is usually very difficult to obtain.
Introduction
This loss function has the advantage that the entire training objective on both the labeled and unlabeled data now becomes convex, since it consists of a convex structured large margin loss on labeled data and a convex least squares loss on unlabeled data.
Introduction
In particular, we investigate a semi-supervised approach for structured large margin training, where the objective is a combination of two convex functions, the structured large margin loss on labeled data and the least squares loss on unlabeled data.
Semi-supervised Convex Training for Structured SVM
for structured large margin training, whose objective is a combination of two convex terms: the supervised structured large margin loss on labeled data and the cheap least squares loss on unlabeled data.
Semi-supervised Convex Training for Structured SVM
By combining the convex structured SVM loss on labeled data (shown in Equation (5)) and the convex least squares loss on unlabeled data (shown in Equation (8)), we obtain a semi-supervised structured large margin loss
labeled data is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Druck, Gregory and Pang, Bo
Abstract
Labeled data is not readily available for these tasks, so we focus on the unsupervised setting.
Experiments
7Note that we do not consider this performance to be the upper—bound of supervised approaches; clearly, supervised approaches could benefit from additional labeled data .
Experiments
However, labeled data is relatively expensive to obtain for this task.
Introduction
There is no existing labeled data for the tasks of interest, and we would like the methods we develop to be easily applied in multiple domains.
Introduction
Motivated by this, we propose a generative model for solving these tasks jointly without labeled data .
Models
To identify refinements without labeled data , we propose a generative model of reviews (or more generally documents) with latent variables.
labeled data is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Yang, Bishan and Cardie, Claire
Abstract
The context-aware constraints provide additional power to the CRF model and can guide semi-supervised learning when labeled data is limited.
Approach
PR makes the assumption that the labeled data we have is not enough for learning good model parameters, but we have a set of constraints on the posterior distribution of the labels.
Experiments
For the MD dataset, we also used the dvd domain as additional labeled data for developing the constraints.
Experiments
We found that the PR model is able to correct many CRF errors caused by the lack of labeled data .
Experiments
However, with limited labeled data , the CRF learner can only associate very weak sentiment signals to these features.
Related Work
Compared to the existing work on semi-supervised learning for sentence-level sentiment classification (Tackstro'm and McDonald, 2011a; Tackstrom and McDonald, 2011b; Qu et al., 2012), our work does not rely on a large amount of coarse-grained (document-level) labeled data , instead, distant supervision mainly comes from linguistically-motivated constraints.
labeled data is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Huang, Fei and Yates, Alexander
Experiments
In chunking, there is a clear trend toward larger increases in performance as words become rarer in the labeled data set, from a 0.02 improvement on words of frequency 2, to an improvement of 0.21 on OOV words.
Experiments
To measure the sample complexity of the supervised CRF, we use the same experimental setup as in the chunking experiment on WSJ text, but we vary the amount of labeled data available to the CRF.
Experiments
Thus smoothing is optimizing performance for the case where unlabeled data is plentiful and labeled data is scarce, as we would hope.
Related Work
Several researchers have previously studied methods for using unlabeled data for tagging and chunking, either alone or as a supplement to labeled data .
Related Work
Our technique lets the HMM find parameters that maximize cross-entropy, and then uses labeled data to learn the best mapping from the HMM categories to the POS categories.
Related Work
Our technique uses unlabeled training data from the target domain, and is thus applicable more generally, including in web processing, where the domain and vocabulary is highly variable, and it is extremely difficult to obtain labeled data that is representative of the test distribution.
labeled data is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Khapra, Mitesh M. and Joshi, Salil and Chatterjee, Arindam and Bhattacharyya, Pushpak
Bilingual Bootstrapping
Algorithm 1 Bilingual Bootstrapping LD1 2: Seed Labeled Data from L1 LD2 2: Seed Labeled Data from L2 UD1 := Unlabeled Data from L1 U D2 := Unlabeled Data from L2
Bilingual Bootstrapping
These projected models are then applied to the untagged data of L1 and L2 and the instances which get labeled with a high confidence are added to the labeled data of the respective languages.
Bilingual Bootstrapping
Algorithm 2 Monolingual Bootstrapping LD1 2: Seed Labeled Data from L1 LD2 2: Seed Labeled Data from L2 U D1 := Unlabeled Data from L1 U D2 := Unlabeled Data from L2
Experimental Setup
In each iteration only those words for which P(assigned_sense|word) > 0.6 get moved to the labeled data .
Experimental Setup
Hence, we used a fixed threshold of 0.6 so that in each iteration only those words get moved to the labeled data for which the assigned sense is clearly a majority sense (P > 0.6).
labeled data is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Sun, Weiwei and Uszkoreit, Hans
Capturing Paradigmatic Relations via Word Clustering
Previous research shows that character-based segmentation models trained on labeled data are reasonably accurate (Sun, 2010).
Capturing Paradigmatic Relations via Word Clustering
Table 5 summarizes the accuracies of the systems when trained on smaller portions of the labeled data .
Capturing Paradigmatic Relations via Word Clustering
In other words, the word cluster features can significantly reduce the amount of labeled data required by the learning algorithm.
State-of-the-Art
In this paper, we use CTB 6.0 as the labeled data for the study.
labeled data is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Sun, Weiwei and Wan, Xiaojun
Conclusion
We employ stacking models to incorporate features derived from heterogeneous analysis and apply them to convert heterogeneous labeled data for retraining.
Data-driven Annotation Conversion
It is possible to acquire high quality labeled data for a specific annotation standard by exploring existing heterogeneous corpora, since the annotations are normally highly compatible.
Data-driven Annotation Conversion
Moreover, the exploitation of additional (pseudo) labeled data aims to reduce the estimation error and enhances a NLP system in a different way from stacking.
Data-driven Annotation Conversion
_ CTa / _ gppd—wtb qu1re a new labeled data set D Ctb — Dppdflctb
labeled data is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Jiang, Qixia and Sun, Maosong
Introduction
This is implemented by maximizing the empirical accuracy on the prior knowledge ( labeled data ) and the entropy of hash functions (estimated over labeled and unlabeled data).
Semi-Supervised SimHash
Let XL 2 {(X1,cl)...(xu,cu)} be the labeled data , c E {1...0}, X 6 RM, and XU = {xu+1...xN} the unlabeled data.
Semi-Supervised SimHash
Given the labeled data XL, we construct two sets, attraction set @a and repulsion set 9?.
Semi-Supervised SimHash
Furthermore, we also hope to maximize the empirical accuracy on the labeled data @a and @r and
The direction is determined by concatenating w L times.
This is implemented by maximizing the empirical accuracy on labeled data together with the entropy of hash functions.
labeled data is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Jeon, Je Hun and Liu, Yang
Abstract
In our experiments on the Boston University radio news corpus, using only a small amount of the labeled data as the initial training set, our proposed labeling method combined with most confidence sample selection can effectively use unlabeled data to improve performance and finally reach performance closer to that of the supervised method using all the training data.
Co-training strategy for prosodic event detection
Given a set L of labeled data and a set U of unlabeled data, the algorithm first creates a smaller pool U’ containing u unlabeled data.
Conclusions
In our experiment, we used some labeled data as development set to estimate some parameters.
Experiments and results
Among labeled data , 102 utterances of all f] a and m] 19 speakers are used for testing, 20 utterances randomly chosen from f2b, f3b, m2b, m3b, and m4b are used as development set to optimize parameters such as A and confidence level threshold, 5 utterances are used as the initial training set L, and the rest of the data is used as unlabeled set U, which has 1027 unlabeled utterances (we removed the human labels for co-training experiments).
Experiments and results
We can see that the performance of co-training for these three tasks is slightly worse than supervised learning using all the labeled data, but is significantly better than the original performance using 3% of hand labeled data .
labeled data is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Cai, Peng and Gao, Wei and Zhou, Aoying and Wong, Kam-Fai
Evaluation
All ranking models above were trained only on source domain training data and the labeled data of target domain was just used for testing.
Instance Weighting Scheme Review
(J iang and Zhai, 2007) used a small number of labeled data from target domain to weight source instances.
Introduction
To alleviate the lack of training data in the target domain, many researchers have proposed to transfer ranking knowledge from the source domain with plenty of labeled data to the target domain where only a few or no labeled data is available, which is known as ranking model adaptation (Chen et al., 2008a; Chen et al., 2010; Chen et al., 2008b; Geng et al., 2009; Gao et al., 2009).
Related Work
In (Geng et al., 2009; Chen et al., 2008b), the parameters of ranking model trained on the source domain was adjusted with the small set of labeled data in the target domain.
Related Work
al., 2008a) weighted source instances by using small amount of labeled data in the target domain.
labeled data is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Lin, Dekang and Wu, Xiaoyun
Discussion and Related Work
Ando and Zhang (2005) defined an objective function that combines the original problem on the labeled data with a set of auxiliary problems on unlabeled data.
Discussion and Related Work
One is to leverage a large amount of unsupervised data to train an adequate classifier with a small amount of labeled data .
Introduction
While the labeled data is generally very costly to obtain, there is a vast amount of unlabeled textual data freely available on the web.
Introduction
Under this approach, even if a word is not found in the training data, it may still fire cluster-based features as long as it shares cluster assignments with some words in the labeled data .
Introduction
Since the clusters are obtained without any labeled data , they may not correspond directly to concepts that are useful for decision making in the problem domain.
labeled data is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Martineau, Justin and Chen, Lu and Cheng, Doreen and Sheth, Amit
Experiments
Due to these reasons, there is a lack of sufficient and high quality labeled data for emotion research.
Experiments
Since in real world applications people are primarily concerned with how well the algorithm will work for new TV shows or movies that may not be included in the training data, we defined a test fold for each TV show or movie in our labeled data set.
Experiments
Each test fold corresponded to a training fold containing all the labeled data from all the other TV shows and movies.
Introduction
An active learner uses a small set of labeled data to iteratively select the most informative instances from a large pool of unlabeled data for human annotators to label (Settles, 2010).
Related Work
In Active Learning (Settles, 2010) a small set of labeled data is used to find documents that should be annotated from a large pool of unlabeled documents.
labeled data is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Niu, Zheng-Yu and Wang, Haifeng and Wu, Hua
Experiments of Parsing
Recent studies on parsing indicate that the use of unlabeled data by self-training can help parsing on the WSJ data, even when labeled data is relatively large (McClosky et al., 2006a; Reichart and Rappoport, 2007).
Experiments of Parsing
Table 7 shows the performance of self-trained generative parser and updated reranker on the test set, with CTB and CDTfs as labeled data .
Experiments of Parsing
All the works in Table 8 used CTB articles 1-270 as labeled data .
Introduction
It is important to acquire additional labeled data for the target grammar parsing through exploitation of existing source treebanks since there is often a shortage of labeled data .
Introduction
When coupled with self-training technique, a reranking parser with CTB and converted CDT as labeled data achieves 85.2% f-score on CTB test set, an absolute 1.0% improvement (6% error reduction) over the previous best result for Chinese parsing.
labeled data is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Krishnamurthy, Jayant and Mitchell, Tom
Abstract
ConceptResolver performs both word sense induction and synonym resolution on relations extracted from text using an ontology and a small amount of labeled data .
ConceptResolver
A is re-selected on each iteration because the initial labeled data set is extremely small, so the initial validation set is not necessarily representative of the actual data.
ConceptResolver
l. Initialize labeled data L with 10 positive and 50 negative examples (pairs of senses)
Prior Work
These approaches use large amounts of labeled data , which can be difficult to create.
labeled data is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Ma, Ji and Zhang, Yue and Zhu, Jingbo
Experiments
The data set consists of labelled data for both the source (Wall Street Journal portion of the Penn Treebank) and target (web) domains.
Experiments
Participants are not allowed to use web-domain labelled data for training.
Experiments
In addition to labelled data , a large amount of unlabelled data on the web domain is also provided.
Introduction
The problem we face here can be considered as a special case of domain adaptation, where we have access to labelled data on the source domain (PTB) and unlabelled data on the target domain (web data).
labeled data is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Jiang, Jing
A multitask transfer learning solution
We now present a multitask transfer learning solution to the weakly-supervised relation extraction problem, which makes use of the labeled data from the auxiliary relation types.
A multitask transfer learning solution
It is general for any transfer learning problem with auxiliary labeled data from similar tasks.
Introduction
However, supervised learning heavily relies on a sufficient amount of labeled data for training, which is not always available in practice due to the labor-intensive nature of human annotation.
Introduction
Inspired by recent work on transfer learning and domain adaptation, in this paper, we study how we can leverage labeled data of some old relation types to help the extraction of a new relation type in a weakly-supervised setting, where only a few seed instances of the new relation type are available.
labeled data is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Zeng, Xiaodong and Wong, Derek F. and Chao, Lidia S. and Trancoso, Isabel
Abstract
The proposed approach trains a character-based and word-based model on labeled data , respectively, as the initial models.
Introduction
The proposed approach begins by training a character-based and word-based model on labeled data respectively, and then both models are regularized from each view by their segmentation agreements, i.e., the identical outputs, of unlabeled data.
Semi-supervised Learning via Co-regularizing Both Models
This study proposes a co-regularized CWS model based on character-based and word-based models, built on a small amount of segmented sentences ( labeled data ) and a large amount of raw sentences (unlabeled data).
Semi-supervised Learning via Co-regularizing Both Models
The model induction process is described in Algorithm 1: given labeled dataset D; and unlabeled dataset Du, the first two steps are training a CRFs (character-based) and Perceptrons (word-based) model on the labeled data D; , respectively.
labeled data is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Druck, Gregory and Mann, Gideon and McCallum, Andrew
Generalized Expectation Criteria
2In general, the objective function could also include the likelihood of available labeled data , but throughout this paper we assume we have no parsed sentences.
Generalized Expectation Criteria
If there are constraint functions G for all model feature functions F3, and the target expectations G are estimated from labeled data , then the globally optimal parameter setting under the GE obj ec-tive function is equivalent to the maximum likelihood solution.
Linguistic Prior Knowledge
For some experiments that follow we use “oracle” constraints that are estimated from labeled data .
Related Work
(2006) both use modified forms of self-training to bootstrap parsers from limited labeled data .
labeled data is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Prettenhofer, Peter and Stein, Benno
Cross-Language Structural Correspondence Learning
The confidence, however, can only be determined for 2125 since the setting gives us access to labeled data from 8 only.
Related Work
In the basic domain adaptation setting we are given labeled data from the source domain and unlabeled data from the target domain, and the goal is to train a classifier for the target domain.
Related Work
Beyond this setting one can further distinguish whether a small amount of labeled data from the target domain is available (Daume, 2007; Finkel and Manning, 2009) or not (Blitzer et al., 2006; Jiang and Zhai, 2007).
Related Work
(2007) apply structural learning to image classification in settings where little labeled data is given.
labeled data is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Chambers, Nathanael and Jurafsky, Dan
Abstract
Standard algorithms for template-based information extraction (IE) require predefined template schemas, and often labeled data , to learn to extract their slot fillers (e.g., an embassy is the Target of a Bombing template).
Previous Work
Weakly supervised approaches remove some of the need for fully labeled data .
Previous Work
Shinyama and Sekine (2006) describe an approach to template learning without labeled data .
Standard Evaluation
Our precision is as good as (and our Fl score near) two algorithms that require knowledge of the templates and/or labeled data .
labeled data is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Qian, Longhua and Hui, Haotian and Hu, Ya'nan and Zhou, Guodong and Zhu, Qiaoming
Abstract
Usually the extraction performance depends heavily on the quality and quantity of the labeled data , however, the manual annotation of a large-scale corpus is labor-intensive and time-consuming.
Abstract
During iterations a batch of unlabeled instances are chosen in terms of their informativeness to the current classifier, labeled by an oracle and in turn added into the labeled data to retrain the classifier.
Abstract
Input: - L, labeled data set - U, unlabeled data set - n, batch size Output: - SVM, classifier Repeat: 1.
labeled data is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Titov, Ivan and McDonald, Ryan
Abstract
Our model achieves high accuracy, without any explicitly labeled data except the user provided opinion ratings.
Introduction
When labeled data exists, this problem can be solved effectively using a wide variety of methods available for text classification and information extraction (Manning and Schutze, 1999).
Introduction
However, labeled data is often hard to come by, especially when one considers all possible domains of products and services.
labeled data is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Zeng, Xiaodong and Chao, Lidia S. and Wong, Derek F. and Trancoso, Isabel and Tian, Liang
Experiments
0 Self-training Segmenters (STS): two variant models were defined by the approach reported in (Subramanya et al., 2010) that uses the supervised CRFs model’s decodings, incorporating empirical and constraint information, for unlabeled examples as additional labeled data to retrain a CRFs model.
Introduction
They leverage such mappings to either constitute a Chinese word dictionary for maximum-matching segmentation (Xu et al., 2004), or form labeled data for training a sequence labeling model (Paul et al., 2011).
Methodology
Our learning problem belongs to semi-supervised learning (SSL), as the training is done on treebank labeled data (XL,YL) = {(X1,y1), ..., (Xl,yl)}, and bilingual unlabeled data (XU) 2 {X1, ..., Xu} where X,- = {531, ...,:cm} is an input word sequence and yi = {3/1, ...,ym}, y E T is its corresponding label sequence.
labeled data is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Zhang, Longkai and Li, Li and He, Zhengyan and Wang, Houfeng and Sun, Ni
Experiment
We use the benchmark datasets provided by the second International Chinese Word Segmentation Bakeoff2 as the labeled data .
Our method
We randomly reuse some characters labeling ’N’ from labeled data until ratio 77 is reached.
Our method
In summary our algorithm tackles the problem by duplicating labeled data in source domain.
labeled data is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Sogaard, Anders
Experiments
ting an antagonistic adversary corrupt our labeled data - somewhat surprisingly, maybe - leads to better cross-domain performance.
Introduction
The problem with out-of-vocabulary effects can be illustrated using a small labeled data set: {X1 = <1,<0,1,0>>,X2 I <1, <0,1,1>>,X3 = (0, <0,0,0>>,X4 = (1, (0,0, Say we train our model on X1_3 and evaluate it on the fourth data point.
Robust perceptron learning
Globerson and Roweis (2006) let an adversary corrupt labeled data during training to learn better models of test data with missing features.
labeled data is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Scheible, Christian and Schütze, Hinrich
Distant Supervision
To summarize, the results of our experiments using distant supervision show that a sentiment relevance classifier can be trained successfully by labeling data with a few simple feature rules, with
Methods
Supervised optimization is impossible as we do not have any labeled data .
Related Work
In general, it is not possible to know what the underlying concepts of a statistical classification are if no detailed annotation guidelines exist and no direct evaluation of manually labeled data is performed.
labeled data is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Popat, Kashyap and A.R, Balamurali and Bhattacharyya, Pushpak and Haffari, Gholamreza
Clustering for Cross Lingual Sentiment Analysis
Algorithm 1 Projection based on sense Input: Polarity labeled data in source language (S) and data in target language (T) to be labeled Output: Classified documents 1: Sense mark the polarity labeled data from S 2: Project the sense marked corpora from S to T using a Multidict 3: Model the sentiment classifier using the data obtained in step-2 4: Sense mark the unlabelled data from T 5: Test the sentiment classifier on data obtained in step-4 using model obtained in step-3
Introduction
Popular approaches for Cross-Lingual Sentiment Analysis (CLSA) (Wan, 2009; Duh et al., 2011) depend on Machine Translation (MT) for converting the labeled data from one language to the other (Hiroshi et al., 2004; Banea et al., 2008; Wan, 2009).
Related Work
In situations where labeled data is not present in a language, approaches based on cross-lingual sentiment analysis are used.
labeled data is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Plank, Barbara and Moschitti, Alessandro
Abstract
The clear drawback of supervised methods is the need of training data: labeled data is expensive to obtain, and there is often a mismatch between the training data and the data the system will be applied to.
Introduction
However, the clear drawback of supervised methods is the need of training data, which can slow down the delivery of commercial applications in new domains: labeled data is expensive to obtain, and there is often a mismatch between the training data and the data the system will be applied to.
Related Work
These correspondences are then integrated as new features in the labeled data of the source domain.
labeled data is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Goldwasser, Dan and Roth, Dan
Experimental Settings
The dataset was collected for the purpose of constructing semantic parsers from ambiguous supervision and consists of both “noisy” and gold labeled data .
Experimental Settings
The gold labeled labeled data consists of pairs (X, y).
Semantic Interpretation Model
Moreover, since learning this layer is a byproduct of the learning process (as it does not use any labeled data ) forcing the connection between the decisions is the mechanism that drives learning this model.
labeled data is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Mukherjee, Arjun and Liu, Bing
Experiments
Since ME-LDA used manually labeled training data for Max-Ent, we again randomly sampled 1000 terms from our corpus appearing at least 20 times and labeled them as aspect terms or sentiment terms, so this labeled data clearly has less noise than our automatically labeled data .
Proposed Seeded Models
Note that unlike traditional Max-Ent training, we do not need manually labeled data for training (see Section 4 for details).
Related Work
We adopt this method as well but with no use of manually labeled data in training.
labeled data is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: