Experiments | Three metrics are used for evaluation: precision (P), recall (R) and balanced f-score (F) defined by 2PR/(P+R). |
Experiments | The baseline of the character-based joint solver (CTagctb) is competitive, and achieves an f-score of 92.93. |
Experiments | ging model achieves an f-score of 94.03 ([31th and |
Introduction | Our structure-based stacking model achieves an f-score of 94.36, which is superior to a feature-based stacking model introduced in (Jiang et al., 2009). |
Introduction | Our final system achieves an f-score of 94.68, which yields a relative error reduction of 11% over the best published result (94.02). |
Experimental Setup | We follow the suggestion of (Scharenborg et al., 2010) and use a 20-ms tolerance window to compute recall, precision rates and F-score of the segmentation our model proposed for TIMIT’s training set. |
Introduction | the-art unsupervised method and improves the relative F-score by 18.8 points (Dusan and Rabiner, 2006). |
Results | unit(%) Recall Precision F-score Dusan (2006) 75.2 66.8 70.8 Qiao et al. |
Results | When compared to the baseline in which the number of phone boundaries in each utterance was also unknown (Dusan and Rabiner, 2006), our model outperforms in both recall and precision, improving the relative F-score by 18.8%. |
Experimental Setup | Model F-score 0.4 _ ---- -- SVM F-score ---------- -- All-text F-score |
Experimental Setup | Precondition prediction F-score |
Introduction | Specifically, it yields an F-score of 66% compared to the 65% of the baseline. |