Experiments | We also trained and evaluated on binarized versions of the ordinal GUG labels: a sentence was labeled 1 if the average judgment was at least 3.5 (i.e., would round to 4), and 0 otherwise. |
Experiments | To train our system on binarized data, we replaced the £2 -regularized linear regression model with an 62-regularized logistic regression and used Kendall’s 7' rank correlation between the predicted probabilities of the positive class and the binary gold standard labels as the grid search metric (§3.1) instead of Pearson’s 7“. |
Experiments | We also evaluated the binary system for the ordinal task by computing correlations between its estimated probabilities and the averaged human scores, and we evaluated the ordinal system for the binary task by binarizing its predictions.12 |
System Description | From an initial prediction y, it produces the final prediction: A AM re not affect Pearson’s 'r correlations or rankings, but it would affect binarized predictions. |