Experiments and Results | This is contrary to our original expectation that exp data which has been manually annotated for discourse connective disambiguation should outperform BLLIP which contains a lot of noise. |
Experiments and Results | This finding indicates that under the multitask learning, it may not be worthy of using manually annotated corpus to generate auxiliary data. |
Implementation Details of Multitask Learning Method | This is because the former data is manually annotated whether a word serves as discourse connective or not, while the latter does not manually disambiguate two types of ambiguity, i.e., whether a word serves as discourse connective or not, and the type of discourse relation if it is a discourse connective. |
Introduction | To overcome the shortage of manually annotated training data, (Marcu and Echihabi, 2002) proposed a pattern-based approach to automatically generate training data from raw corpora. |
Introduction | Later, with the release of manually annotated corpus, such as Penn Discourse Treebank 2.0 (PDTB) (Prasad et al., 2008), recent studies performed implicit discourse relation recognition on natural (i.e., genuine) implicit discourse data (Pitler et al., 2009) (Lin et al., 2009) (Wang et al., 2010) with the use of linguistically informed features and machine learning algorithms. |
Introduction | 2002) and (Sporleder and Lascarides, 2008), and (2) manually annotated explicit data with the removal of explicit discourse connectives. |
Multitask Learning for Discourse Relation Prediction | Among them, it is manually annotated as an Expansion relation for 2, 938 times. |
Experiment | Three human annotators who are fluent in the two languages manually annotated N-to-N sentence alignments for each language pairs (KR-EN, KR-CH, KR-JP). |
Experiment | Manual Annotation and Agreement Study |
Experiment | To assess the performance of our subjectivity analysis systems, the Korean sentence chunks were manually annotated by two native speakers of Korean with Subjective and Objective labels (Table l). |
Multilanguage-Comparability 3.1 Motivation | Evaluating with intensity is not easy for the latter approach; if test corpora already exist with intensity annotations for both languages, normalizing the intensity scores to a comparable scale is necessary (yet is uncertain unless every pair is checked manually), otherwise every pair of multilingual texts needs a manual annotation with its relative order of intensity. |
Related Work | (2008) and Boiy and Moens (2009) have created manually annotated gold standards in target languages and studied various feature selection and learning techniques in machine learning approaches to analyze sentiments in multilingual web documents. |
Data | We excluded a small subset of 419 threads that was used for previous manual annotation efforts, part of which was also used to train the DA and GDP taggers (Section 5) that generate features for our system. |
Motivation | Another limitation of (Prabhakaran and Rambow, 2013) is that we used manual annotations for many of our features such as dialog acts and overt displays of power. |
Motivation | Relying on manual annotations for features limited our analysis to a small subset of the Enron corpus, which has only 18 instances of hierarchical power. |
Motivation | Like (Prabhakaran and Rambow, 2013), we use features to capture the dialog structure, but we use automatic taggers to generate them and assume no manual annotation at all at training or test time. |
Structural Analysis | DIAPR: In (Prabhakaran and Rambow, 2013), we used dialog features derived from manual annotations — dialog acts (DA) and overt displays of power (ODP) — to model the structure of interactions within the message content. |
Structural Analysis | In this work, we obtain DA and GDP tags on the entire corpus using automatic taggers trained on those manual annotations . |
Bilingual NER Results | If a model only gives good performance with well-tuned hyper-parameters, then we must have manually annotated data for tuning, which would significantly reduce the applicability and portability of this method to other language pairs and tasks. |
Introduction | Our method does not require any manual annotation of word alignments or named entities over the bilingual training data. |
Joint Alignment and NER Decoding | On the other hand, the CRF—based NER models are trained on manually annotated data, and admit richer sequence and lexical features. |
Related Work | proach is that training such a feature-rich model requires manually annotated bilingual NER data, which can be prohibitively expensive to generate. |
Related Work | The model demonstrates performance improvements in both parsing and alignment, but shares the common limitations of other supervised work in that it requires manually annotated bilingual joint parsing and word alignment data. |
Introduction | (2002) and employ machine learning algorithms to build classifiers from tweets with manually annotated sentiment polarity. |
Introduction | We learn the sentiment-specific word embedding from tweets, leveraging massive tweets with emoticons as distant- supervised corpora without any manual annotations . |
Introduction | 0 We develop three neural networks to learn sentiment-specific word embedding (SSWE) from massive distant-supervised tweets without any manual annotations ; |
Related Work | The sentiment classifier is built from tweets with manually annotated sentiment polarity. |
Related Work | The reason is that RAE and NBSVM learn the representation of tweets from the small-scale manually annotated training set, which cannot well capture the comprehensive linguistic phenomenons of words. |
Active Learning for Sequence Labeling | If Cflyj) exceeds a certain confidence threshold t, is assumed to be the correct label for this token and assigned to it.2 Otherwise, manual annotation of this token is required. |
Experiments and Results | So, using SeSAL the complete corpus can be labeled with only a small fraction of it actually being manually annotated (MUC7: about 18 %, PENNBIOIE: about 13 %). |
Introduction | In most annotation campaigns, the language material chosen for manual annotation is selected randomly from some reference corpus. |
Introduction | In the AL paradigm, only examples of high training utility are selected for manual annotation in an iterative manner. |
Summary and Discussion | Our experiments in the context of the NER scenario render evidence to the hypothesis that the proposed approach to semi-supervised AL (SeSAL) for sequence labeling indeed strongly reduces the amount of tokens to be manually annotated — in terms of numbers, about 60% compared to its fully supervised counterpart (FuSAL), and over 80% compared to a totally passive learning scheme based on random selection. |
Asymmetries in IS | In order to find out whether IS categories are unevenly distributed within German sentences we examine a corpus of German radio news bulletins that has been manually annotated for IS (496 annotated sentences in total) using the scheme of Riester (2008b).5 |
Discussion | (2007) present work on predicting the dative alternation in English using 14 features relating to information status which were manually annotated in their corpus. |
Discussion | In our work, we manually annotate a small corpus in order to learn generalisations. |
Discussion | From these we learn features that approximate the generalisations, enabling us to apply them to large amounts of unseen data without further manual annotation . |
Syntactic IS Asymmetries | The problem, of course, is that we do not possess any reliable system of automatically assigning IS labels to unknown text and manual annotations are costly and time-consuming. |
Abstract | All but the latter three were then characterised in terms of features manually annotated in the Penn Discourse TreeBank — discourse connectives and their senses. |
Conclusion | It has characterised each genre in terms of features manually annotated in the Penn Discourse TreeBank, and used this to show that genre should be made a factor in automated sense labelling of discourse relations that are not explicitly marked. |
The Penn Discourse TreeBank | Genre differences at the level of discourse in the PTB can be seen in the manual annotations of the Penn Discourse TreeBank (Prasad et al., 2008). |
The Penn Discourse TreeBank | These have been manually annotated using the three-level sense hierarchy described in detail in (Miltsakaki et al., 2008). |
Building a Discourse Parser | Both S and L classifiers are trained using manually annotated documents taken from the RST—DT corpus. |
Building a Discourse Parser | In training mode, classification instances are built by parsing manually annotated trees from the RST—DT corpus paired with lexicalized syntax trees (LS Trees) for each sentence (see Sect. |
Conclusions and Future Work | In this paper, we have shown that it is possible to build an accurate automatic text-level discourse parser based on supervised machine-learning algorithms, using a feature-driven approach and a manually annotated corpus. |
Evaluation | A measure of our full system’s performance is realized by comparing structure and labeling of the RST tree produced by our algorithm to that obtained through manual annotation (our gold standard). |
Abstract | An extensive empirical evaluation on our manually annotated YouTube comments corpus shows a high classification accuracy and highlights the benefits of structural models in a cross-domain setting. |
Experiments | This is an important advantage of our structural approach, since we cannot realistically expect to obtain manual annotations for 10k+ comments for each (of many thousands) product domains present on YouTube. |
Introduction | The second contribution of the paper is the creation and annotation (by an expert coder) of a comment corpus containing 35k manually labeled comments for two product YouTube domains: tablets and automobiles.1 It is the first manually annotated corpus that enables researchers to use supervised methods on YouTube for comment classification and opinion analysis. |
YouTube comments corpus | For each video, we extracted all available comments (limited to maximum lk comments per video) and manually annotated each comment with its type and polarity. |
Related Work | Nearly all semantic class taggers are trained using supervised learning with manually annotated data. |
Related Work | Each annotator then labeled an additional 35 documents, which gave us a test set containing 100 manually annotated message board posts. |
Related Work | With just fith of the training set, the system has about 1,600 message board posts to use for training, which yields a similar F score (roughly 61%) as the supervised baseline that used 100 manually annotated posts via 10-fold cross-validation. |
Relational Similarity Experiments | We further experimented with the SemEval’07 task 4 dataset (Girju et al., 2007), where each example consists of a sentence, a target semantic relation, two nominals to be judged on whether they are in that relation, manually annotated WordNet senses, and the Web query used to obtain the sentence: |
Relational Similarity Experiments | The SemEval competition defines four types of systems, depending on whether the manually annotated WordNet senses and the Google query are used: A (WordNet=no, Query=no), B (WordNet=yes, Query=no), C (WordNet=no, Query=yes), and D (WordNet=yes, Query=yes). |
Relational Similarity Experiments | We experimented with types A and C only since we believe that having the manually annotated WordNet sense keys is an unrealistic assumption for a real-world application. |
Experimental Setup | This dataset consists of 3964 instances of 17 potential English idioms which were manually annotated as literal or nonliteral. |
Experiments | .‘he reason is that although this system is claimed 0 be unsupervised, and it performs better than .11 the participating systems (including the super-'ised systems) in the SemEval-2007 shared task, it till needs to incorporate a lot of prior knowledge, pecifically information about co-occurrences be-ween different word senses, which was obtained rom a number of resources (SSI+LKB) includ-ng: (i) SemCor (manually annotated); (ii) LDC-)SO (partly manually annotated ); (iii) collocation lictionaries which are then disambiguated semi-.utomatically. |
Experiments | Even though the system is not 'trained”, it needs a lot of information which is argely dependent on manually annotated data, so t does not fit neatly into the categories Type II or Type III either. |
Introduction | One major factor that makes WSD difficult is a relative lack of manually annotated corpora, which hampers the performance of supervised systems. |
Conclusions and Future Work | That is, it can be applied where there is no available manual annotations to train. |
Evaluation | Instead, the supervised approach would need a large amount of manual annotations for every predicate to be processed. |
Introduction | However, current automatic systems require large amounts of manually annotated training data for each predicate. |
Introduction | The effort required for this manual annotation explains the absence of generally applicable tools. |
Conclusion | It allows one to quickly construct an SRL model for a new language without manual annotation or language-specific heuristics, provided an accurate model is available for one of the related languages along with a certain amount of parallel data for the two languages. |
Evaluation | For English-French, the English CoNLL-ST dataset was used as a source and the model was evaluated on the manually annotated dataset from van der Plas et al. |
Evaluation | Note that in case of French we evaluate against the output of a supervised system, since manual annotation is not available for this dataset. |
Setup | Several approaches have been proposed to obtain an SRL model for a new language with little or no manual annotation . |
Abstract | We manually annotated a corpus of 636 corresponding and non-corresponding edit-turn-pairs. |
Conclusion | To test this system, we manually annotated a corpus of corresponding and non-corresponding edit-turn-pairs. |
Conclusion | With regard to future work, an extension of the manually annotated corpus is the most important issue. |
Corpus | To assess the reliability of these annotations, one of the coauthors manually annotated a random subset of 100 edit-tum-pairs contained in ETP-gold as corresponding or non-corresponding. |
Abstract | We evaluate our method on a manually annotated data set, and show that our method outperforms the baseline that handles these two tasks separately, boosting the F1 from 80.2% to 83.6% for NER, and the Accuracy from 79.4% to 82.6% for NEN, respectively. |
Conclusions and Future work | We evaluate our method on a manually annotated data set. |
Experiments | We manually annotate a data set to evaluate our method. |
Related Work | is trained on a manually annotated data set, which achieves an F1 of 81.48% on the test data set; Chiti-cariu et al. |
Introduction | Manually annotated corpora, such as the Chinese Treebank (CTB) (Xue et al., 2005), usually have words as the basic syntactic elements |
Introduction | We manually annotate the structures of 37,382 words, which cover the entire CTB5. |
Introduction | Luo (2003) exploited this advantage by adding flat word structures without manually annotation to CTB trees, and building a generative character-based parser. |
Related Work | In addition, instead of using flat structures, we manually annotate hierarchal tree structures of Chinese words for converting word-based constituent trees into character-based constituent trees. |
Evaluation | All the recorded dialogs with the total length of 21 hours were manually transcribed, then these transcribed dialogs with 19,651 utterances were manually annotated with the following nine topic categories: Opening, Closing, Itinerary, Accommodation, Attraction, Food, Transportation, Shopping, and Other. |
Evaluation | For the linear kernel baseline, we used the following features: n-gram words, previous system actions, and current user acts which were manually annotated . |
Evaluation | All the evaluations were done in fivefold cross validation to the manual annotations with two different metrics: one is accuracy of the predicted topic label for every turn, and the other is precisiorflrecall/F-measure for each event of topic transition occurred either in the answer or the predicted result. |
Experiment | We manually annotated each video with several sentences that describe what occurs in that video. |
Experiment | To evaluate our results, we manually annotated the correctness of each such pair. |
Experiment | The scores are thresholded to decide hits, which together with the manual annotations , can generate TP, TN, FF, and FN counts. |
Introduction | Our model is also the first to directly learn relational patterns as part of the process of training an end-to-end taxonomic induction system, rather than using patterns that were hand-selected or learned via pairwise classifiers on manually annotated co-occurrence patterns. |
Related Work | Both of these systems use a process that starts by finding basic level terms (leaves of the final taxonomy tree, typically) and then using relational patterns (hand-selected ones in the case of Kozareva and Hovy (2010), and ones learned separately by a pairwise classifier on manually annotated co-occurrence patterns for Navigli and Velardi (2010), Navigli et al. |
Related Work | Our model also automatically learns relational patterns as a part of the taxonomic training phase, instead of relying on handpicked rules or pairwise classifiers on manually annotated co-occurrence patterns, and it is the first end-to-end (i.e., non-incremental) system to include heterogeneous relational information via sibling (e.g., coordination) patterns. |
Feature Construction | Head Noun: The head noun (or head mention) of entity mention is manually annotated . |
Feature Construction | Third, the entity mentions are manually annotated . |
Related Work | Disadvantages of the TRE systems are that the manually annotated corpus is required, which is time-consuming and costly in human labor. |
Abstract | Furthermore, we introduce a manually annotated dataset for target-dependent Twitter sentiment analysis. |
Experiments | After obtaining the tweets, we manually annotate the sentiment labels (negative, neutral, positive) for these targets. |
Introduction | In addition, we introduce a manually annotated dataset, and conduct extensive experiments on it. |
Event Causality Extraction Method | We acquired 43,697 excitation templates by Hashimoto et al.’s method and the manual annotation of excitation template candidates.5 We applied the excitation filter to all 272,025,401 event causality candidates from the web and 132,528,706 remained. |
Experiments | Note that some event causality candidates were not given excitation values for their templates, since some templates were acquired by manual annotation without Hashimoto et al.’s method. |
Introduction | To make event causality self-contained, we wrote guidelines for manually annotating train-ing/development/test data. |
Experiments | To investigate the effects of using an in domain language model, we created a corpus composed of the manual annotations of all the documents in the Old Bailey proceedings, excluding those used in our test set. |
Results and Analysis | We manually annotated the ink blotches (shown in red), and made them unobserved in the model. |
Results and Analysis | We performed error analysis on our development set by randomly choosing 100 word errors from the WER alignment and manually annotating them with relevant features. |
Experiment setup | Data As described earlier, the Stanford Sentiment Treebank (Socher et al., 2013) has manually annotated , real-valued sentiment values for all phrases in parse trees. |
Experiment setup | The phrases at all tree nodes were manually annotated with one of 25 sentiment values that uniformly span between the positive and negative poles. |
Introduction | The recently available Stanford Sentiment Treebank (Socher et al., 2013) renders manually annotated , real-valued sentiment scores for all phrases in parse trees. |
Towards A Universal Treebank | The first is traditional manual annotation , as previously used by Helmreich et al. |
Towards A Universal Treebank | 2.2 Manual Annotation |
Towards A Universal Treebank | Such a reduction may ultimately be necessary also in the case of dependency relations, but since most of our data sets were created through manual annotation , we could afford to retain a fine-grained analysis, knowing that it is always possible to map from finer to coarser distinctions, but not vice versa.4 |
FrameNet — Wiktionary Alignment | teas) independently on a manually annotated gold standard. |
Related Work | In Tonelli and Pighin (2009), they use these features to train an SVM-classifier to identify valid alignments and report an Fl-score of 0.66 on a manually annotated gold standard. |
Resource FNWKde | We compare FNWKde to two German frame-semantic resources, the manually annotated SALSA corpus (Burchardt et al., 2006) and a resource from Pado and Lapata (2005), henceforth P&L05. |
Abstract | Given a large, community-authored, question-paraphrase corpus, we demonstrate that it is possible to learn a semantic lexicon and linear ranking function without manually annotating questions. |
Introduction | These algorithms require no manual annotation and can be applied to large, noisy databases of relational triples. |
Overview of the Approach | The final result is a scalable learning algorithm that requires no manual annotation of questions. |
Factors Affecting System Performance | 0 A balanced corpus of 800 manually annotated sentences extracted from 83 newspaper texts |
Factors Affecting System Performance | 200 sentences from this corpus (100 positive and 100 negative) were also randomly selected from the corpus for an inter-annotator agreement study and were manually annotated by two independent annotators. |
Lexicon-Based Approach | In order to assign the membership score to each word, we did 58 system runs on unique nonintersecting seed lists drawn from manually annotated list of positive and negative adjectives from (Hatzivassiloglou and McKeown, 1997). |
Introduction | However, the performance of these methods highly relies on manually annotated training data. |
Introduction | However, these methods need to manually annotate a lot of training data in each domain. |
Introduction | The sentiment and topic words are manually annotated . |
Data and task | The approach uses a small amount of manually annotated article-pairs to train a document-level CRF model for parallel sentence extraction. |
Data and task | Of these, we manually annotated 91 English-Bulgarian and 79 English-Korean sentence pairs with source and target named entities as well as word-alignment links among named entities in the two languages. |
Data and task | At test time we use the local+global Wiki-based tagger to define the English entities and we don’t use the manually annotated alignments. |
Evaluation | The experiments were performed on the manually annotated Korean test dataset. |
Introduction | Several datasets that provide manual annotations of semantic relationships are available from MUC (Grishman and Sund-heim, 1996) and ACE (Doddington et al., 2004) projects, but these datasets contain labeled training examples in only a few major languages, including English, Chinese, and Arabic. |
Introduction | Because manual annotation of semantic relations for such resource-poor languages is very expensive, we instead consider weakly supervised learning techniques (Riloff and Jones, 1999; Agichtein and Gravano, 2000; Zhang, 2004; Chen et al., 2006) to learn the relation extractors without significant annotation efforts. |
Experimental Setup | We also manually annotated the relations expressed in the text, identifying 94 of the Candidate Relations as valid. |
Experimental Setup | Evaluation Metrics We use our manual annotations to evaluate the type-level accuracy of relation extraction. |
Experimental Setup | The first, Manual Text, is a variant of our model which directly uses the links derived from manual annotations of preconditions in text. |
Experiments | In this section, we evaluate our method on a manually annotated data set and show that our system |
Introduction | 12,245 tweets are manually annotated as the test data set. |
Related Work | A data set is manually annotated and a linear CRF model is trained, which achieves an F-score of 81.48% on their test data set; Downey et al. |
Conclusion | The result is an accurate and efficient wide-coverage CCG parser that can be easily adapted for NLP applications in new domains without manually annotating data. |
Data | For supertagger evaluation, one thousand sentences were manually annotated with CCG lexical categories and POS tags. |
Data | For parser evaluation, three hundred of these sentences were manually annotated with DepBank grammatical relations (King et al., 2003) in the style of Briscoe and Carroll (2006). |
Introduction | Obtaining training data in this setting does not require expensive manual annotation as many articles are published together with captioned images. |
Related Work | The image parser is trained on a corpus, manually annotated with graphs representing image structure. |
Related Work | Instead of relying on manual annotation or background ontological information we exploit a multimodal database of news articles, images, and their captions. |
Abstract | First, we show in a sentence ordering experiment that topological field information improves the entity grid model of Barzilay and Lapata (2008) more than grammatical role and simple clausal order information do, particularly when manual annotations of this information are not available. |
Introduction | sentations to automatic extraction in the absence of manual annotations . |
Introduction | Note, however, that the models based on automatic topological field annotations outperform even the grammatical role-based models using manual annotation (at marginal significance, p < 0.1). |
Introduction | Some of the teams have used the manually annotated WN labels provided with the dataset, and some have not. |
Related Work | In this paper we do not use any manually annotated resources apart from the classification training set. |
Related Work | This manually annotated dataset includes a representative rather than exhaustive list of 7 important nominal relationships. |