Ensuring Meaning Composition | Note that unlike SCISSOR (Ge and Mooney, 2005), training our method does not require gold-standard SAPTs. |
Experimental Evaluation | For GEOQUERY, an MR was correct if it retrieved the same answer as the gold-standard query, thereby reflecting the quality of the final result returned to the user. |
Experimental Evaluation | Listed together with their PARSEVAL F-measures these are: gold-standard parses from the treebank (GoldSyn, 100%), a parser trained on WSJ plus a small number of in-domain training sentences required to achieve good performance, 20 for CLANG (Syn20, 88.21%) and 40 for GEOQUERY (Syn40, 91.46%), and a parser trained on no in-domain data (Syn0, 82.15% for CLANG and 76.44% for GEOQUERY). |
Experimental Evaluation | Note that some of these approaches require additional human supervision, knowledge, or engineered features that are unavailable to the other systems; namely, SCISSOR requires gold-standard SAPTs, Z&C requires hand-built template grammar rules, LU requires a reranking model using specially designed global features, and our approach requires an existing syntactic parser. |
Evaluation | The first row shows the results on only those sentences which the conversion process can convert sucessfully (as measured by converting gold-standard CCGbank derivations and comparing with PTB trees; although, to be clear, the scores are for the CCG parser on those sentences). |
Evaluation | The second row shows the scores on those sentences for which the conversion process was somewhat lossy, but when the gold-standard CCGbank derivations are converted, the oracle F-measure is greater than 95%. |
The CCG to PTB Conversion | shows that converting gold-standard CCG derivations into the GRs in DepBank resulted in an F-score of only 85%; hence the upper bound on the performance of the CCG parser, using this evaluation scheme, was only 85%. |
The CCG to PTB Conversion | The schemas were developed by manual inspection using section ()0 of CCGbank and the PTB as a development set, following the oracle methodology of Clark and Curran (2007), in which gold-standard derivations from CCGbank are converted to the new representation and compared with the gold standard for that representation. |
Error Analysis | Problems with relative clause attachment to genitives are not limited to automatic parses — errors in gold-standard treebank parses cause similar problems when Treebank parses disagree with Propbank annotator intuitions. |
Error Analysis | Figure 8: CCGbank gold-standard parse of a relative clause attachment. |
This is easily read off of the CCG PARG relationships. | For gold-standard parses, we remove functional tag and trace information from the Penn Treebank parses before we extract features over them, so as to simulate the conditions of an automatic parse. |
Lexical stress and L2P conversion | 5) ORACLESTRESS: The same input/output as LETTERSTRESS, except it uses the gold-standard stress on letters (Section 4.1). |
Stress Prediction Experiments | 2) ORACLESYL splits the input word into syllables according to the CELEX gold-standard , before applying SVM ranking. |
Stress Prediction Experiments | The output pattern is evaluated directly against the gold-standard , without pattem-to-vowel mapping. |
Experiments | Input Type Parsing F1 % gold-standard segmentation 82.35 baseline segmentation 80.28 adapted segmentation 81.07 |
Experiments | Note that if we input the gold-standard segmented test set into the parser, the F-measure under the two definitions are the same. |
Experiments | The parsing F-measure corresponding to the gold-standard segmentation, 82.35, represents the “oracle” accuracy (i.e., upperbound) of parsing on top of automatic word segmention. |
Experimental Setup 4.1 Data Analysis | where rank(c) is the rank (from 1 up to 10) of a concept 0 in C(21)), and PathToGold is the length of the minimum path along IsA edges in the conceptual hierarchies between the concept 0, on one hand, and any of the gold-standard concepts manually identified for the attribute 212, on the other hand. |
Experimental Setup 4.1 Data Analysis | The length PathToGold is 0, if the returned concept is the same as the gold-standard concept. |
Experimental Setup 4.1 Data Analysis | Conversely, a gold-standard attribute receives no credit (that is, DRR is 0) if no path is found in the hierarchies between the top 10 concepts of C and any of the gold-standard concepts, or if C is empty. |