Index of papers in Proc. ACL 2009 that mention
  • gold-standard
Ge, Ruifang and Mooney, Raymond
Ensuring Meaning Composition
Note that unlike SCISSOR (Ge and Mooney, 2005), training our method does not require gold-standard SAPTs.
Experimental Evaluation
For GEOQUERY, an MR was correct if it retrieved the same answer as the gold-standard query, thereby reflecting the quality of the final result returned to the user.
Experimental Evaluation
Listed together with their PARSEVAL F-measures these are: gold-standard parses from the treebank (GoldSyn, 100%), a parser trained on WSJ plus a small number of in-domain training sentences required to achieve good performance, 20 for CLANG (Syn20, 88.21%) and 40 for GEOQUERY (Syn40, 91.46%), and a parser trained on no in-domain data (Syn0, 82.15% for CLANG and 76.44% for GEOQUERY).
Experimental Evaluation
Note that some of these approaches require additional human supervision, knowledge, or engineered features that are unavailable to the other systems; namely, SCISSOR requires gold-standard SAPTs, Z&C requires hand-built template grammar rules, LU requires a reranking model using specially designed global features, and our approach requires an existing syntactic parser.
gold-standard is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Clark, Stephen and Curran, James R.
Evaluation
The first row shows the results on only those sentences which the conversion process can convert sucessfully (as measured by converting gold-standard CCGbank derivations and comparing with PTB trees; although, to be clear, the scores are for the CCG parser on those sentences).
Evaluation
The second row shows the scores on those sentences for which the conversion process was somewhat lossy, but when the gold-standard CCGbank derivations are converted, the oracle F-measure is greater than 95%.
The CCG to PTB Conversion
shows that converting gold-standard CCG derivations into the GRs in DepBank resulted in an F-score of only 85%; hence the upper bound on the performance of the CCG parser, using this evaluation scheme, was only 85%.
The CCG to PTB Conversion
The schemas were developed by manual inspection using section ()0 of CCGbank and the PTB as a development set, following the oracle methodology of Clark and Curran (2007), in which gold-standard derivations from CCGbank are converted to the new representation and compared with the gold standard for that representation.
gold-standard is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Boxwell, Stephen and Mehay, Dennis and Brew, Chris
Error Analysis
Problems with relative clause attachment to genitives are not limited to automatic parses — errors in gold-standard treebank parses cause similar problems when Treebank parses disagree with Propbank annotator intuitions.
Error Analysis
Figure 8: CCGbank gold-standard parse of a relative clause attachment.
This is easily read off of the CCG PARG relationships.
For gold-standard parses, we remove functional tag and trace information from the Penn Treebank parses before we extract features over them, so as to simulate the conditions of an automatic parse.
gold-standard is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Dou, Qing and Bergsma, Shane and Jiampojamarn, Sittichai and Kondrak, Grzegorz
Lexical stress and L2P conversion
5) ORACLESTRESS: The same input/output as LETTERSTRESS, except it uses the gold-standard stress on letters (Section 4.1).
Stress Prediction Experiments
2) ORACLESYL splits the input word into syllables according to the CELEX gold-standard , before applying SVM ranking.
Stress Prediction Experiments
The output pattern is evaluated directly against the gold-standard , without pattem-to-vowel mapping.
gold-standard is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Jiang, Wenbin and Huang, Liang and Liu, Qun
Experiments
Input Type Parsing F1 % gold-standard segmentation 82.35 baseline segmentation 80.28 adapted segmentation 81.07
Experiments
Note that if we input the gold-standard segmented test set into the parser, the F-measure under the two definitions are the same.
Experiments
The parsing F-measure corresponding to the gold-standard segmentation, 82.35, represents the “oracle” accuracy (i.e., upperbound) of parsing on top of automatic word segmention.
gold-standard is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Reisinger, Joseph and Pasca, Marius
Experimental Setup 4.1 Data Analysis
where rank(c) is the rank (from 1 up to 10) of a concept 0 in C(21)), and PathToGold is the length of the minimum path along IsA edges in the conceptual hierarchies between the concept 0, on one hand, and any of the gold-standard concepts manually identified for the attribute 212, on the other hand.
Experimental Setup 4.1 Data Analysis
The length PathToGold is 0, if the returned concept is the same as the gold-standard concept.
Experimental Setup 4.1 Data Analysis
Conversely, a gold-standard attribute receives no credit (that is, DRR is 0) if no path is found in the hierarchies between the top 10 concepts of C and any of the gold-standard concepts, or if C is empty.
gold-standard is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: