Index of papers in Proc. ACL that mention
  • human annotators
Kozareva, Zornitsa
Conclusion
From the two tasks, the valence prediction problem was more challenging both for the human annotators and the automated system.
Metaphors
To conduct our study, we use human annotators to collect metaphor-rich texts (Shutova and Teufel, 2010) and tag each metaphor with its corresponding polarity (Posi-tive/Negative) and valence [—3, +3] scores.
Task A: Polarity Classification
In our study, the source and target domains are provided by the human annotators who agree on these definitions, however the source and target can be also automatically generated by an interpretation system or a concept mapper.
Task B: Valence Prediction
Evaluation Measures: To evaluate the quality of the valence prediction model, we compare the actual valence score of the metaphor given by human annotators denoted with 3/ against those valence scores predicted by the regression model denoted with ac.
Task B: Valence Prediction
To conduct our valence prediction study, we used the same human annotators from the polarity classification task for each one of the English, Spanish, Russian and Farsi languages.
Task B: Valence Prediction
This means that the LIWC based valence regression model approximates the predicted values better to those of the human annotators .
human annotators is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Tomanek, Katrin and Hahn, Udo
Abstract
We propose a semi-supervised AL approach for sequence labeling where only highly uncertain subsequences are presented to human annotators , while all others in the selected sequences are automatically labeled.
Active Learning for Sequence Labeling
loop until stopping criterion is met 1. learn model M from L 2. for allpi E P : up, <— UM(p¢) 3. select B examples p,- E P with highest utility up, 4. query human annotator for labels of all B examples 5. move newly labeled examples from P to L
Active Learning for Sequence Labeling
2Sequences of consecutive tokens azj for which C; ) g t are presented to the human annotator instead of single, isolated tokens.
Introduction
the need for human annotators to supply large amounts of “golden” annotation data on which ML systems can be trained.
Introduction
To further exploit this observation for annotation purposes, we here propose an approach to AL where human annotators are required to label only uncertain subsequences within the selected sentences, while the remaining subsequences are labeled automatically based on the model available from the previous AL iteration round.
Related Work
In contrast, our SeSAL approach which also applies bootstrapping, aims at avoiding to deteriorate data quality by explicitly pointing human annotators to classification-critical regions.
Summary and Discussion
We here hypothesize that human annotators work much more efficiently when pointed to the regions of immediate interest instead of making them skim in a self-paced way through larger passages of (probably) semantically irrelevant but syntactically complex utterances —a tiring and error-prone task.
human annotators is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Ji, Heng and Grishman, Ralph
Experimental Results and Analysis
In addition, we also measured the performance of two human annotators who prepared the ACE 2005 training data on 28 newswire texts (a subset of the blind test set).
Experimental Results and Analysis
The improved trigger labeling is better than one human annotator and only 4.7% worse than another.
Experimental Results and Analysis
This matches the situation of human annotation as well: we may decide whether a mention is involved in some particular event or not by reading and analyzing the target sentence itself; but in order to decide the argument’s role we may need to frequently refer to wider discourse in order to infer and confirm our decision.
human annotators is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Richman, Alexander E. and Schone, Patrick
Evaluation and Results
We had three human annotated test sets, Spanish, French and Ukrainian, consisting of newswire.
Evaluation and Results
When human annotated sets were not available, we held out more than 100,000 words of text generated by our wiki-mining process to use as a test set.
Evaluation and Results
The first consists of 25,000 words of human annotated newswire derived from the ACE 2007 test set, manually modified to conform to our extended MUC-style standards.
human annotators is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Joshi, Aditya and Mishra, Abhijit and Senthamilselvan, Nivvedan and Bhattacharyya, Pushpak
Abstract
The effort required for a human annotator to detect sentiment is not uniform for all texts, irrespective of his/her expertise.
Abstract
As for training data, since any direct judgment of complexity by a human annotator is fraught with subjectivity, we rely on cognitive evidence from eye-tracking.
Abstract
We also study the correlation between a human annotator’s perception of complexity and a machine’s confidence in polarity determination.
Discussion
Our proposed metric measures complexity of sentiment annotation, as perceived by human annotators .
Introduction
The effort required by a human annotator to detect sentiment is not uniform for all texts.
human annotators is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Tomasoni, Mattia and Huang, Minlie
Conclusions
Evaluation results on human annotated data showed that our summarized answers constitute a solid complement to best answers voted by the cQA users.
Experiments
We calculated ROUGE-1 and ROUGE-2 scores10 against human annotation on the filtered version of the dataset presented in Section 3.1.
The summarization framework
Our decision to proceed in an unsupervised direction came from the consideration that any use of external human annotation would have made it impracticable to build an actual system on larger scale.
The summarization framework
i=1 A second approach that made use of human annotation to learn a vector of weights V 2 (v1, v2, v3, 214) that linearly combined the scores was investigated.
The summarization framework
In order to learn the weight vector V that would combine the above scores, we asked three human annotators to generate question-biased extractive summaries based on all answers available for a certain question.
human annotators is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Wang, Chenguang and Duan, Nan and Zhou, Ming and Zhang, Ming
Experiment
Human annotated data contains 0.3M synonym pairs from WordNet dictionary.
Paraphrasing for Web Search
Additionally, human annotated data can also be used as high-quality paraphrases.
Paraphrasing for Web Search
QZ- is the ith query and Dfabel C D is a subset of documents, in which the relevance between Qi and each document is labeled by human annotators .
Paraphrasing for Web Search
The relevance rating labeled by human annotators can be represented by five levels: “Perfect”, “Excellent”, “Good”, “Fair”, and “Bad”.
human annotators is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Zhu, Xiaodan and Guo, Hongyu and Mohammad, Saif and Kiritchenko, Svetlana
Conclusions
This paper provides a comprehensive and quantitative study of the behavior of negators through a unified view of fitting human annotation .
Experimental results
When the depths are within 4, the RNTN performs very well and the ( human annotated ) prior sentiment of arguments used in PSTN does not bring additional improvement over RNTN.
Introduction
We then extend the models to be dependent on the negators and demonstrate that such a simple extension can significantly improve the performance of fitting to the human annotated data.
Semantics-enriched modeling
As we have discussed above, we will use the human annotated sentiment for the arguments, same as in the models discussed in Section 3.
human annotators is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Elsner, Micha and Charniak, Eugene
Motivation
It partitions a chat transcript into distinct conversations, and its output is highly correlated with human annotations .
We discard the 50 most frequent words entirely.
This places it within the bounds of our human annotations (see table 1), toward the more general end of the spectrum.
We discard the 50 most frequent words entirely.
The range of human variation is quite wide, and there are annotators who are closer to baselines than to any other human annotator .
We discard the 50 most frequent words entirely.
As explained earlier, this is because some human annotations are much more specific than others.
human annotators is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Hingmire, Swapnil and Chakraborti, Sutanu
Experimental Evaluation
While labeling a topic, we show its 30 most probable words to the human annotator .
Related Work
Also a human annotator may discard or mislabel a polysemous word, which may affect the performance of a text classifier.
Related Work
In active learning, particular unlabeled documents or features are selected and queried to an oracle (e. g. human annotator ).
Topic Sprinkling in LDA
We then ask a human annotator to assign one or more class labels to the topics based on their most probable words.
human annotators is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Wang, Dong and Liu, Yang
Corpus Creation
Often human annotators have different interpretations about the same sentence, and a speaker’s opiniorflattitude is sometimes ambiguous.
Experiments
We use human annotated dialogue acts (DA) as the extraction units.
Experiments
The system-generated summaries are compared to human annotated extractive and abstractive summaries.
Experiments
We also examined the system output and human annotation and found some reasons for the system errors:
human annotators is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Qazvinian, Vahed and Radev, Dragomir R.
Data Annotation
Previously (Lin and Hovy, 2002) had shown that information overlap judgment is a difficult task for human annotators .
Data Annotation
For each n-gram, w, in a given headline, we look if 212 is part of any nugget in either human annotations .
Data Annotation
Table 2 shows the unigram, bigram, and trigram-based average H between the two human annotators (Humanl, Human2).
human annotators is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Sassano, Manabu and Kurohashi, Sadao
Experimental Evaluation and Discussion
5 In our experiments human annotators do not give labels.
Experimental Evaluation and Discussion
Figure 9 shows the same comparison in terms of required queries to human annotators .
Experimental Evaluation and Discussion
Number of queris to human annotators
human annotators is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Ceylan, Hakan and Kim, Yookyung
Conclusions and Future Work
Human annotations on 5000 automatically annotated queries showed that our data generation method is highly accurate, achieving 84.3% accuracy on average for Category-l queries, and 93.7% accuracy for Category-l and Category-2 queries combined.
Introduction
Furthermore, creating such a data set is expensive as it requires an extensive amount of work by human annotators .
Language Identification
We tested all the systems in this section on a test set of 3500 human annotated queries, which is formed by taking 350 Category-1 queries from each language.
Language Identification
This query is labelled as Category-2 by the human annotator .
human annotators is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Ferschke, Oliver and Gurevych, Iryna and Rittberger, Marc
Data Selection and Corpus Creation
Table 2: Agreement of human annotator with gold standard
Data Selection and Corpus Creation
In order to test the reliability of these user assigned templates as quality flaw markers, we carried out an annotation study in which a human annotator was asked to perform the binary flaw detection task manually.
Data Selection and Corpus Creation
Table 2 lists the chance corrected agreement (Cohen’s K) along with the F1 performance of the human annotations against the gold standard corpus.
human annotators is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Nagata, Ryo and Whittaker, Edward and Sheinman, Vera
UK and XP stand for unknown and X phrase, respectively.
number of tokens (1) If the number of tokens in a sentence was different in the human annotation and the system output, the sentence was excluded from the calculation.
UK and XP stand for unknown and X phrase, respectively.
This discrepancy sometimes occurred because the tokenization of the system sometimes differed from that of the human annotators .
UK and XP stand for unknown and X phrase, respectively.
In the technique, transformation rules are obtained by comparing the output of a POS tagger and the human annotation so that the differences between the two are reduced.
human annotators is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Ramteke, Ankit and Malu, Akshat and Bhattacharyya, Pushpak and Nath, J. Saketha
Building domain ontology
Some additional features get added by human annotator to increase the coverage of the ontology.
Building domain ontology
The abstract concept of storage is contributed by the human annotator through his/her world knowledge.
Building domain ontology
Step 2: The features thus obtained are arranged in the form of a hierarchy by a human annotator .
human annotators is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Lo, Chi-kiu and Wu, Dekai
Abstract
We study the cost/benefit tradeoff of using human annotators from different language backgrounds for the proposed evaluation metric, and compare whether providing the original source text helps.
Abstract
The correlation coefficient of the SRL based evaluation metric driven by bilingual human annotators (0.351) is slightly better than that driven by monolingual human annotators (0.315); however, using bilinguals in the evaluation process is more costly than using monolinguals.
Abstract
The correlation coefficient of the SRL based evaluation metric driven by bilingual human annotators who see also the source input sentences is 0.315 which is the same as that driven by monolingual human annotators .
human annotators is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Dwyer, Kenneth and Kondrak, Grzegorz
Abstract
It is often desirable to reduce the quantity of training data — and hence human annotation — that is needed to train an L2P classifier for a new language.
Active learning
Words for which the prediction confidence is above a certain threshold are immediately added to the lexicon, while the remaining words must be verified (and corrected, if necessary) by a human annotator .
Active learning
In the L2P domain, we assume that a human annotator specifies the phonemes for an entire word, and that the active learner cannot query individual letters.
human annotators is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Qian, Longhua and Hui, Haotian and Hu, Ya'nan and Zhou, Guodong and Zhu, Qiaoming
Abstract
Active learning (AL) has been proven effective to reduce human annotation efforts in NLP.
Abstract
machine translation, which make use of multilingual corpora to decrease human annotation efforts by selecting highly informative sentences for a newly added language in multilingual parallel corpora.
Abstract
For future work, on one hand, we plan to combine uncertainty sampling with diversity and informativeness measures; on the other hand, we intend to combine BAL with semi-supervised learning to further reduce human annotation efforts.
human annotators is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: