Experiments and Analysis | We adopt unlabeled attachment score ( UAS ) as the primary evaluation metric. |
Experiments and Analysis | The UAS on CDT—test is 84.45%. |
Experiments and Analysis | Table 4: Parsing accuracy ( UAS ) comparison on CTB5—test with gold—standard POS tags. |
Experiments | We measured the parser quality by the unlabeled attachment score ( UAS ), i.e., the percentage of tokens (excluding all punctuation tokens) with the correct HEAD. |
Experiments | Figure 4 shows the UAS curves on the development set, where K is beam size for Intersect and K-best for Rescoring, the X-aXis represents K, and the Y—aXis represents the UAS scores. |
Experiments | UAS |
Evaluation | The unlabeled attache-ment score [ UAS ] evaluates the quality of unlabeled |
Evaluation | In order to establish the statistical significance of results between two parsing experiments in terms of F1 and UAS , we used a unidirectional t-test for two independent samples”. |
Evaluation | Parser | F1 ‘ LA ‘ UAS ‘ F1(MWE) | |
Experimental Evaluation | The “unadjusted” ( UA ) score, does not correct for errors made by the extractor. |
Experimental Evaluation | UA AD Precision 29.73 (443/1490) 35.24 (443/1257) |
Experimental Evaluation | “UA” and “AD” refer to the unadjusted and adjusted scores respectively |
Results and Discussion | Table 2 gives the unadjusted ( UA ) and adjusted (AD) precision for logical deduction. |
Evaluations | Unlabeled Attachment Score ( UAS ) The fraction of events whose head events were correctly predicted. |
Evaluations | Tree Edit Distance In addition to the UAS and LAS the tree edit distance score has been recently introduced for evaluating dependency structures (Tsarfaty et al., 2011). |
Evaluations | UAS LAS UTEDS LTEDS LinearSeq 0.830 0.581 0.689 0.549 ClassifySeq 0.830 0.581 0.689 0.549 MST 0.837 0.614* 0.710 0.571 SRP 0.830 0.647*Jr 0.712 0.596* |
Experiments | Therefore, we could not use the constituent parser for ASR rescoring since utterances can be very long, although the shorter up-training text data was not a problem.7 We evaluate both unlabeled ( UAS ) and labeled dependency accuracy (LAS). |
Experiments | Figure 3 shows improvements to parser accuracy through up-training for different amount of (randomly selected) data, where the last column indicates constituent parser score (91.4% UAS ). |
Experiments | vs. 86.2% UAS ), up-training can cut the difference by 44% to 88.5%, and improvements saturate around 40m words (about 2m sentences. |