Index of papers in Proc. ACL that mention
  • UAS
Zhang, Yuan and Lei, Tao and Barzilay, Regina and Jaakkola, Tommi and Globerson, Amir
Experimental Setup
Evaluation Measures Following standard practice, we use Unlabeled Attachment Score ( UAS ) as the evaluation metric in all our experiments.
Experimental Setup
We report UAS excluding punctuation on CoNLL datasets, following Martins et al.
Experimental Setup
For the CATiB dataset, we report UAS including punctuation in order to be consistent with the published results in the 2013 SPMRL shared task (Seddah et al., 2013).
Results
Moreover, our model also outperforms the 88.80% average UAS reported in Martins et al.
Results
With these features our model achieves an average UAS 89.28%.
Results
UAS POS Acc.
UAS is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Candito, Marie and Constant, Matthieu
Experiments
Evaluation metrics: we evaluate our parsing systems by using the standard metrics for dependency parsing: Labeled Attachment Score (LAS) and Unlabeled Attachment Score ( UAS ), computed using all tokens including punctuation.
Experiments
In the “labeled representation” evaluation, the UAS provides a measure of syntactic attachments for sequences of words, independently of the (regular) MWE status of subse-quences.
Experiments
The UAS for labeled representation will be maximal, whereas for the flat representation, the last two tokens will count as incorrect for UAS .
UAS is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Li, Zhenghua and Liu, Ting and Che, Wanxiang
Experiments and Analysis
We adopt unlabeled attachment score ( UAS ) as the primary evaluation metric.
Experiments and Analysis
The UAS on CDT—test is 84.45%.
Experiments and Analysis
Table 4: Parsing accuracy ( UAS ) comparison on CTB5—test with gold—standard POS tags.
UAS is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Chen, Wenliang and Zhang, Min and Li, Haizhou
Experiments
We measured the parser quality by the unlabeled attachment score ( UAS ), i.e., the percentage of tokens (excluding all punctuation tokens) with the correct HEAD.
Experiments
Figure 4 shows the UAS curves on the development set, where K is beam size for Intersect and K-best for Rescoring, the X-aXis represents K, and the Y—aXis represents the UAS scores.
Experiments
UAS
UAS is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Li, Zhenghua and Zhang, Min and Chen, Wenliang
Experiments and Analysis
We measure parsing performance using the standard unlabeled attachment score ( UAS ), excluding punctuation marks.
Experiments and Analysis
Table 4: UAS comparison on English test data.
Experiments and Analysis
UAS Li et al.
UAS is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Lei, Tao and Xin, Yu and Zhang, Yuan and Barzilay, Regina and Jaakkola, Tommi
Abstract
We also obtain the best published UAS results on 5 languages.1
Experimental Setup
As the evaluation measure, we use unlabeled attachment scores ( UAS ) excluding punctuation.
Results
Our model also achieves the best UAS on 5 languages.
Results
Figure 1 shows the average UAS on CoNLL test datasets after each training epoch.
Results
Figure 1: Average UAS on CoNLL testsets after different epochs.
UAS is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Zhou, Guangyou and Zhao, Jun and Liu, Kang and Cai, Li
Experiments
We measured the performance of the parsers using the following metrics: unlabeled attachment score ( UAS ), labeled attachment score (LAS) and complete match (CM), which were defined by Hall et al.
Experiments
Type Systems UAS CM Yamada and Matsumoto (2003) 90.3 38.7 McDonald et a1.
Experiments
UAS Score (°/o) S S N 00
UAS is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Ma, Xuezhe and Xia, Fei
Data and Tools
Table 3: UAS for two versions of our approach, together with baseline and oracle systems on Google Universal Treebanks version 1.0.
Data and Tools
Table 4: UAS for two versions of our approach, together with baseline and oracle systems on Google Universal Treebanks version 2.0.
Experiments
Parsing accuracy is measured with unlabeled attachment score ( UAS ): the percentage of words with the correct head.
Experiments
Moreover, our approach considerably bridges the gap to fully supervised dependency parsers, whose average UAS is 84.67%.
Experiments
Table 5 illustrates the UAS of our approach trained on different amounts of parallel data, together with the results of the projected transfer parser re-implemented by us (PTPT).
UAS is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Lee, John and Naradowsky, Jason and Smith, David A.
Experimental Results
Case 84.1 85.6 74.3 76.5 Degree 97.9 98.0 90.1 90.1 UAS 67.4 68.7 — —
Experimental Results
i all all non-null non-null POS 94.9 95.7 94.9 95.7 Person 98.7 99.0 92.2 94.6 Number 97.4 97.9 96.5 97.1 Tense 96.8 97.2 84.1 86.8 Mood 97.9 98.3 91.4 93.2 Voice 97.8 98.0 91.3 92.4 Gender 95.4 96.1 90.7 91.9 Case 95.9 96.3 92.0 92.6 Degree 99.8 99.9 33.3 55.6 UAS 68.0 70.5 — —
Experimental Results
Tense 98.9 99.3 97.2 97.3 Mood 98.7 99.2 95.8 97.3 Case 96.7 97.0 94.5 94.9 Degree 97.9 98.1 87.5 88.6 UAS 78.2 78.8 — —
Experimental Setup
i all all non-null non-null POS 94.4 94.5 94.4 94.5 Person 99.4 99.5 97.1 97.6 Number 95.3 95.9 93.7 94.5 Tense 98.0 98.2 93.2 93.9 Mood 98.1 98.3 93.8 94.4 Voice 98.5 98.6 95.3 95.7 Gender 93.1 93.9 87.7 89.1 Case 89.3 90.0 79.9 81.2 Degree 99.9 99.9 86.4 90.8 UAS 61.0 61.9 — —
UAS is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Constant, Matthieu and Sigogne, Anthony and Watrin, Patrick
Evaluation
The unlabeled attache-ment score [ UAS ] evaluates the quality of unlabeled
Evaluation
In order to establish the statistical significance of results between two parsing experiments in terms of F1 and UAS , we used a unidirectional t-test for two independent samples”.
Evaluation
Parser | F1 ‘ LA ‘ UAS ‘ F1(MWE) |
UAS is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Bollegala, Danushka and Weir, David and Carroll, John
Domain Adaptation
Specifically, we measure the similarity, sim(ug), 103), between the source domain distributions of ua) and w, and select the top 7“ similar neighbours ua ) for each word 21) as additional features for 212.
Domain Adaptation
The value of a neighbour ua ) selected as a distributional feature is set to its similarity score sim(ug), 103).
Domain Adaptation
At test time, for each word 21) that appears in a target domain test sentence, we measure the similarity, sim(Mug), 107), and select the most similar 7“ words ua ) in the source domain labeled sentences as the distributional features for 212, with their values set to sim(Mug), wT).
UAS is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Ma, Ji and Zhu, Jingbo and Xiao, Tong and Yang, Nan
Conclusion and related work
PTB CTB uas compl uas compl 91.77 45.29 84.54 33.75 221 92.29 46.28 85.11 34.62 124 92.50 46.82 85.62 37.11 71 92.74 48.12 86.00 35.87 39
Conclusion and related work
‘uas’ and ‘compl’ denote unlabeled score and complete match rate respectively (all excluding punctuations).
Conclusion and related work
Systems s uas compl
Experiments
In particular, we achieve 86.33% uas on CTB which is 1.54% uas improvement over the greedy baseline parser.
UAS is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Raghavan, Sindhu and Mooney, Raymond and Ku, Hyeonseo
Experimental Evaluation
The “unadjusted” ( UA ) score, does not correct for errors made by the extractor.
Experimental Evaluation
UA AD Precision 29.73 (443/1490) 35.24 (443/1257)
Experimental Evaluation
“UA” and “AD” refer to the unadjusted and adjusted scores respectively
Results and Discussion
Table 2 gives the unadjusted ( UA ) and adjusted (AD) precision for logical deduction.
UAS is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Valitutti, Alessandro and Toivonen, Hannu and Doucet, Antoine and Toivanen, Jukka M.
Evaluation
For the analysis of the results, we then measured the effectiveness of the constraints using two derived variables: the Collective F unniness (CF) of a message is its mean funniness, while its Upper Agreement ( UA (t)) is the fraction of funniness scores greater than or equal to a given threshold 75.
Evaluation
To rank the generated messages, we take the product of Collective Funniness and Upper Agreement UA (3) and call it the overall Humor Eflectiveness (HE).
Evaluation
The Upper Agreement UA (4) increases from 0.18 to 0.36 and to 0.43, respectively.
UAS is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Choi, Jinho D. and McCallum, Andrew
Experiments
UAS
Experiments
UAS : unlabeled attachment score, LAS: labeled attachment score.
Experiments
Approach UAS ‘ LAS | Time Zhang and Clark (2008) 92.1
UAS is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Zhang, Yi and Wang, Rui
Experiment Results & Error Analyses
Table 3 shows the agreement between the HP SG backbone and CoNLL dependency in unlabeled attachment score ( UAS ).
Experiment Results & Error Analyses
UAS are reported on all complete test sets, as well as fully parsed subsets (suffixed with “-p”>.
Experiment Results & Error Analyses
Most notable is that the dependency backbone achieved over 80% UAS on BROWN, which is close to the performance of state-of-the-art statistical dependency parsing systems trained on WSJ (see Table 5 and Table 4).
UAS is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Sun, Ang and Grishman, Ralph and Sekine, Satoshi
Cluster Feature Selection
Use All Prefixes (UA): UA produces a cluster feature at every available bit length with the hope that the underlying supervised system can learn proper weights of different cluster features during training.
Cluster Feature Selection
For example, if the full bit representation of “Apple” is “000”, UA would produce three cluster features: prefix] =0, prefix2=00 and prefix3=000.
Experiments
UA 71.19 +0.49 1.5
Experiments
Table 6 shows that all the 4 proposed methods improved baseline performance, with UA as the fastest and ES as the slowest.
UAS is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Koo, Terry and Collins, Michael
Conclusion
Table 3: UAS for modified versions of our parsers on validation data.
Parsing experiments
measured with unlabeled attachment score ( UAS ): the percentage of words with the correct head.8
Parsing experiments
Pass = %dependencies surviving the beam in training data, Orac = maximum achievable UAS on validation data, Accl/Acc2 = UAS of Models 1/2 on validation data, and Timel/Time2 = minutes per perceptron training iteration for Models 1/2, averaged over all 10 iterations.
Parsing experiments
Table 2: UAS of Models 1 and 2 on test data, with relevant results from related work.
UAS is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Zhao, Hai and Song, Yan and Kit, Chunyu and Zhou, Guodong
Evaluation Results
The quality of the parser is measured by the parsing accuracy or the unlabeled attachment score ( UAS ), i.e., the percentage of tokens with correct head.
Evaluation Results
Two types of scores are reported for comparison: “UAS without p” is the UAS score without all punctuation tokens and “UAS with p” is the one with all punctuation tokens.
Evaluation Results
Table 5 shows the results achieved by other researchers and ours ( UAS with p), which indicates that our parser outperforms any other ones 4.
UAS is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Rastrow, Ariya and Dredze, Mark and Khudanpur, Sanjeev
Experiments
Therefore, we could not use the constituent parser for ASR rescoring since utterances can be very long, although the shorter up-training text data was not a problem.7 We evaluate both unlabeled ( UAS ) and labeled dependency accuracy (LAS).
Experiments
Figure 3 shows improvements to parser accuracy through up-training for different amount of (randomly selected) data, where the last column indicates constituent parser score (91.4% UAS ).
Experiments
vs. 86.2% UAS ), up-training can cut the difference by 44% to 88.5%, and improvements saturate around 40m words (about 2m sentences.
UAS is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Sartorio, Francesco and Satta, Giorgio and Nivre, Joakim
Experimental Assessment
‘ parser | iter | UAS ‘ LAS | UEM ‘ arc-standard 23 90.02 87.69 38.33 arc-eager 12 90.18 87.83 40.02 this work 30 91.33 89.16 42.38 arc-standard + easy-first 21 90.49 88.22 39.61 arc-standard + spine 27 90.44 88.23 40.27
Experimental Assessment
Table 2: Accuracy on test set, excluding punctuation, for unlabeled attachment score ( UAS ), labeled attachment score (LAS), unlabeled exact match (UEM).
Experimental Assessment
Considering UAS , our parser provides an improvement of 1.15 over the arc-eager parser and an improvement of 1.31 over the arc-standard parser, that is an error reduction of ~12% and ~13%, respectively.
UAS is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Kolomiyets, Oleksandr and Bethard, Steven and Moens, Marie-Francine
Evaluations
Unlabeled Attachment Score ( UAS ) The fraction of events whose head events were correctly predicted.
Evaluations
Tree Edit Distance In addition to the UAS and LAS the tree edit distance score has been recently introduced for evaluating dependency structures (Tsarfaty et al., 2011).
Evaluations
UAS LAS UTEDS LTEDS LinearSeq 0.830 0.581 0.689 0.549 ClassifySeq 0.830 0.581 0.689 0.549 MST 0.837 0.614* 0.710 0.571 SRP 0.830 0.647*Jr 0.712 0.596*
UAS is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Dickinson, Markus
Evaluation
For development, we also report unlabeled attachement scores ( UAS ).
Evaluation
In the rest of table 1, we report the best-performing results for each of the methods,5 providing the number of rules below and above a particular threshold, along with corresponding UAS and LAS values.
Evaluation
The whole rule and bigram methods reveal greater precision in identifying problematic dependencies, isolating elements with lower UAS and LAS scores than with frequency, along with corresponding greater pre-
UAS is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Chen, Wenliang and Kazama, Jun'ichi and Torisawa, Kentaro
Experiments
We reported the parser quality by the unlabeled attachment score ( UAS ), i.e., the percentage of tokens (excluding all punctuation tokens) with correct HEADs.
Experiments
The results showed that the reordering features yielded an improvement of 0.53 and 0.58 points ( UAS ) for the first- and second-order models respectively.
Experiments
In total, we obtained an absolute improvement of 0.88 points ( UAS ) for the first-order model and 1.36 points for the second-order model by adding all the bilingual subtree features.
UAS is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: