Index of papers in Proc. ACL 2014 that mention
  • gold-standard
Xu, Wenduan and Clark, Stephen and Zhang, Yue
Abstract
A challenge arises from the fact that the oracle needs to keep track of exponentially many gold-standard derivations, which is solved by integrating a packed parse forest with the beam-search decoder.
Introduction
and Curran, 2007) is to model derivations directly, restricting the gold-standard to be the normal-form derivations (Eisner, 1996) from CCGBank (Hockenmaier and Steedman, 2007).
Introduction
Clark and Curran (2006) show how the dependency model from Clark and Curran (2007) extends naturally to the partial-training case, and also how to obtain dependency data cheaply from gold-standard lexical category sequences alone.
Introduction
A challenge arises from the potentially exponential number of derivations leading to a gold-standard dependency structure, which the oracle needs to keep track of.
Shift-Reduce with Beam-Search
We refer to the shift-reduce model of Zhang and Clark (2011) as the normal-form model, where the oracle for each sentence specifies a unique sequence of gold-standard actions which produces the corresponding normal-form derivation.
Shift-Reduce with Beam-Search
In the next section, we describe a dependency oracle which considers all sequences of actions producing a gold-standard dependency structure to be correct.
The Dependency Model
However, the difference compared to the normal-form model is that we do not assume a single gold-standard sequence of actions.
The Dependency Model
Similar to Goldberg and Nivre (2012), we define an oracle which determines, for a gold-standard dependency structure, G, what the valid transition sequences are (i.e.
The Dependency Model
The dependency model requires all the conjunctive and disjunctive nodes of Q that are part of the derivations leading to a gold-standard dependency structure G. We refer to such derivations as correct derivations and the packed forest containing all these derivations as the oracle forest, denoted as Q0, which is a subset of Q.
gold-standard is mentioned in 23 sentences in this paper.
Topics mentioned in this paper:
Kawahara, Daisuke and Peterson, Daniel W. and Palmer, Martha
Abstract
The effectiveness of our approach is verified through quantitative evaluations based on polysemy-aware gold-standard data.
Experiments and Evaluations
Where there is no frequency information available for class distribution, such as the gold-standard data described in Section 4.3, we use a uniform distribution across the verb’s classes.
Experiments and Evaluations
Table 1: An excerpt of the gold-standard verb classes for several verbs from Korhonen et al.
Experiments and Evaluations
We evaluate the single-class output for each verb based on the predominant gold-standard classes, which are defined for each verb in the test set of Korhonen et al.
Related Work
They evaluated their result with a gold-standard test set, Where a single class is assigned to a verb.
Related Work
They considered multiple classes only in the gold-standard data used for their evaluations.
Related Work
We also evaluate our induced verb classes on this gold-standard data, which was created on the basis of Levin’s classes (Levin, 1993).
gold-standard is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Li, Jiwei and Ott, Myle and Cardie, Claire and Hovy, Eduard
Conclusion and Discussion
In this work, we have developed a multi-domain large-scale dataset containing gold-standard deceptive opinion spam.
Conclusion and Discussion
However, it is still very difficult to estimate the practical impact of such methods, as it is very challenging to obtain gold-standard data in the real world.
Dataset Construction
In this section, we report our efforts to gather gold-standard opinion spam datasets.
Dataset Construction
Due to the difficulty in obtaining gold-standard data in the literature, there is no doubt that our data set is not perfect.
Introduction
Existing approaches for spam detection are usually focused on developing supervised leaming-based algorithms to help users identify deceptive opinion spam, which are highly dependent upon high-quality gold-standard labeled data (J in-dal and Liu, 2008; Jindal et al., 2010; Lim et al., 2010; Wang et al., 2011; Wu et al., 2010).
Introduction
Despite the advantages of soliciting deceptive gold-standard material from Turkers (it is easy, large-scale, and affordable), it is unclear whether Turkers are representative of the general population that generate fake reviews, or in other words, Ott et al.’s data set may correspond to only one type of online deceptive opinion spam — fake reviews generated by people who have never been to offerings or experienced the entities.
Introduction
One contribution of the work presented here is the creation of the cross-domain (i.e., Hotel, Restaurant and Doctor) gold-standard dataset.
Related Work
created a gold-standard collection by employing Turkers to write fake reviews, and followup research was based on their data (Ott et al., 2012; Ott et al., 2013; Li et al., 2013b; Feng and Hirst, 2013).
gold-standard is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Riezler, Stefan and Simianer, Patrick and Haas, Carolin
Response-based Online Learning
Such “un-reachable” gold-standard translations need to be replaced by “surrogate” gold-standard translations that are close to the human-generated translations and still lie within the reach of the SMT system.
Response-based Online Learning
Applied to SMT, this means that we predict translations and use positive response from acting in the world to create “surrogate” gold-standard translations.
Response-based Online Learning
We need to ensure that gold-standard translations lead to positive task-based feedback, that means they can
gold-standard is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Li, Qi and Ji, Heng
Algorithm 3.1 The Model
It is worth noting that this can only happen if the gold-standard has a segment ending at the current token.
Algorithm 3.1 The Model
y’ is the prefix of the gold-standard and z is the top assignment.
Related Work
In addition, (Singh et al., 2013) used gold-standard mention boundaries.
gold-standard is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Li, Zhenghua and Zhang, Min and Chen, Wenliang
Ambiguity-aware Ensemble Training
In standard entire-tree based semi-supervised methods such as self/co/tri-training, automatically parsed unlabeled sentences are used as additional training data, and noisy l-best parse trees are considered as gold-standard .
Ambiguity-aware Ensemble Training
Here, “ambiguous labelings” mean an unlabeled sentence may have multiple parse trees as gold-standard reference, represented by parse forest (see Figure l).
Introduction
Different from traditional self/co/tri-training which only use l-best parse trees on unlabeled data, our approach adopts ambiguous labelings, represented by parse forest, as gold-standard for unlabeled sentences.
gold-standard is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Packard, Woodley and Bender, Emily M. and Read, Jonathon and Oepen, Stephan and Dridan, Rebecca
Conclusion and Outlook
In future work, we will seek to better understand the division of labor between the systems involved through contrastive error analysis and possibly another oracle experiment, constructing gold-standard MRSs for part of the data.
Introduction
(2012), who report results for each subproblem using gold-standard inputs; in this setup, scope resolution showed by far the lowest performance levels.
Related Work
The ranking approach showed a modest advantage over the heuristics (with F1 equal to 77.9 and 76.7, respectively, when resolving the scope of gold-standard cues in evaluation data).
gold-standard is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Wang, Zhiguo and Xue, Nianwen
Experiment
We built three parsing systems: Pipeline-Gold system is our baseline parser (described in Section 2) taking gold-standard POS tags as input; Pipeline system is our baseline parser taking as input POS tags automatically assigned by Stanford POS Tagger 3; and JointParsing system is our joint POS tagging and transition-based parsing system described in subsection 3.1.
Experiment
We can see that the parsing F1 decreased by about 8.5 percentage points in F1 score when using automatically assigned POS tags instead of gold-standard ones, and this shows that the pipeline approach is greatly affected by the quality of its preliminary POS tagging step.
Joint POS Tagging and Parsing with Nonlocal Features
In our experiment (described in Section 4.2), parsing accuracy would decrease by 8.5% in F1 in Chinese parsing when using automatically generated POS tags instead of gold-standard ones.
gold-standard is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: