Abstract | Unlike conventional reranking used in syntactic and semantic parsing, gold-standard reference trees are not naturally available in a grounded setting. |
Abstract | successful task completion) can be used as an alternative, experimentally demonstrating that its performance is comparable to training on gold-standard parse trees. |
Experimental Evaluation | It is calculated by comparing the system’s MR output to the gold-standard MR. |
Introduction | Standard reranking requires gold-standard interpretations (e.g. |
Introduction | However, grounded language learning does not provide gold-standard interpretations for the training examples. |
Introduction | Instead of using gold-standard annotations to determine the correct interpretations, we simply prefer interpretations of navigation instructions that, when executed in the world, actually reach the intended destination. |
Modified Reranking Algorithm | Instead, our modified model replaces the gold-standard reference parse with the “pseudo-gold” parse tree |
Modified Reranking Algorithm | To circumvent the need for gold-standard reference parses, we select a pseudo-gold parse from the candidates produced by the GEN function. |
Modified Reranking Algorithm | In a similar vein, when reranking semantic parses, Ge and Mooney (2006) chose as a reference parse the one which was most similar to the gold-standard semantic annotation. |
Applying Class Attributes | Our first technique provides a simple way to use our identified self-distinguishing attributes in conjunction with a classifier trained on gold-standard data. |
Applying Class Attributes | (3) BootStacked: Gold Standard and Bootstrapped Combination Although we show that an accurate classifier can be trained using auto-annotated Bootstrapped data alone, we also test whether we can combine this data with any gold-standard training examples to achieve even better performance. |
Conclusion | We presented three effective techniques for leveraging this knowledge within the framework of supervised user characterization: rule-based postprocessing, a leaming-by-bootstrapping approach, and a stacking approach that integrates the predictions of the bootstrapped system into a system trained on annotated gold-standard training data. |
Conclusion | While our technique has advanced the state-of-the-art on this important task, our approach may prove even more useful on other tasks where training on thousands of gold-standard examples is not even an option. |
Introduction | Our bootstrapped system, trained purely from automatically-annotated Twitter data, significantly reduces error over a state-of-the-art system trained on thousands of gold-standard training examples. |
Learning Class Attributes | In our gold-standard gender data (Section 5), however, every user has a homepage [by dataset construction]; we might therefore incorrectly classify every user as Male. |
Results | A standard classifier trained on 100 gold-standard training examples improves over this baseline, to 72.0%, while one with 2282 training examples achieves 84.0%. |
Twitter Gender Prediction | We can therefore benchmark our approach against state-of-the-art supervised systems trained with plentiful gold-standard data, giving us an idea of how well our Bootstrapped system might compare to theoretically top-performing systems on other tasks, domains, and social media platforms where such gold-standard training data is not available. |
Related work | In order to prepare a gold-standard data set, we obtained 1,041 sentences by randomly sampling about 1% of the sentences containing numbers (Arabic digits and/or Chinese numerical characters) in a Japanese Web corpus (100 million pages) (Shinzato et al., 2012). |
Related work | recall using the gold-standard data set”. |
Related work | We built a gold-standard data set for numerical common sense. |
Discussion | But the actual gold-standard annotation is: [argl buyers that weren’t disclosed]. |
Evaluation | For every argument position in the gold-standard the scorer expects a single predicted constituent to fill in. |
Evaluation | The function above relates the set of tokens that form a predicted constituent, Predicted, and the set of tokens that are part of an annotated constituent in the gold-standard , True. |
Evaluation | For each missing argument, the gold-standard includes the whole coreference chain of the filler. |
Introduction | The following example includes the gold-standard annotations for a traditional SRL process: |
Experiments | “Express intent to deescalate military engagement”), we elect to measure model quality as lexical scale parity: whether all the predicate paths within one automatically learned frame tend to have similar gold-standard scale scores. |
Experiments | (This measures cluster cohesiveness against a one-dimensional continuous scale, instead of measuring cluster cohesiveness against a gold-standard clustering as in VI, Rand index, or purity.) |
Experiments | We assign each path 212 a gold-standard scale g(w) by resolving through its matching pattern’s CAMEO code. |
Selectional branching | Among all transition sequences generated by Mr_1, training instances from only T1 and T9 are used to train Mr, where T1 is the one-best sequence and T9 is a sequence giving the most accurate parse output compared to the gold-standard tree. |
Transition-based dependency parsing | This decision is consulted by gold-standard trees during training and a classifier during decoding. |
Transition-based dependency parsing | Table 3 shows a transition sequence generated by our parsing algorithm using gold-standard decisions. |
Introduction | Our rich set of features significantly improves the performance of the QSD model, even though we give up the gold-standard dependency features (Sect. |
Related work | 19To find the gain that can be obtained with gold-standard parses, we used MAll’s system with their hand-annotated and the equivalent automatically generated features. |
Task definition | For example if G3 in Figure l is a gold-standard DAG and G1 is a candidate DAG, TC-based metrics count 2 > 3 as another match, even though it is entailed from 2 > 1 and 1 > 3. |
Experiments | The gold-standard edits are with —> to and e —> the. |
Experiments | Given a set of gold-standard edits, the original (ungrammatical) input text, and the corrected system output text, the M 2 scorer searches for the system edits that have the largest overlap with the gold—standard edits. |
Experiments | The H00 2011 shared task provides two sets of gold-standard edits: the original gold-standard edits produced by the annotator, and the official gold— |