Experimental Setup | A word error rate (WER) of 31% on the SM dataset was observed. |
Experimental Setup | However, due to substantial amount of speech recognition errors in our data, the POS error rate (resulting from the combined errors of ASR and automated POS tagger) is expected to be higher. |
Related Work | Automatic recognition of nonnative speakers’ spontaneous speech is a challenging task as evidenced by the error rate of the state-of-the- |
Related Work | For instance, Chen and Zechner (2011) reported a 50.5% word error rate (WER) and Yoon and Bhat (2012) reported a 30% WER in the recognition of ESL students’ spoken responses. |
Related Work | These high error rates at the recognition stage negatively affect the subsequent stages of the speech scoring system in general, and in particular, during a deep syntactic analysis, which operates on a long sequence of words as its context. |
Abstract | A linear model is defined over derivations, and minimum error rate training is used to tune feature weights based on a set of question-answer pairs. |
Introduction | Derivations generated during such a translation procedure are modeled by a linear model, and minimum error rate training (MERT) (Och, 2003) is used to tune feature weights based on a set of question-answer pairs. |
Introduction | Given a set of question-answer pairs {Qh A1?” } as the development (dev) set, we use the minimum error rate training (MERT) (Och, 2003) algorithm to tune the feature weights A11” in our proposed model. |
Introduction | A lower disambiguation error rate results in better relation expressions. |
Conclusion | However, our model’s performance today correlates strongly with an orthogonal accuracy metric, word error rate , on unseen data. |
Conclusion | This suggests that “retry rate” is a reasonable offline quality metric, to be considered in context among other metrics and traditional evaluation based on word error rate . |
Introduction | In particular, we seek to measure and minimize the word error rate (WER) of a system, with a WER of zero indicating perfect transcription. |
Prediction task | We do not have retry annotations for this larger set, but we have transcriptions for the first member of each query pair, enabling us to calculate the word error rate (WER) of each query’s recognition hypothesis, and thus obtain ground truth for half of our retry definition. |
Abstract | Evaluation results show a 44% reduction in the error rate relative to the best prior systems, averaging over all metrics, and up to 61% reduction in the error rate on grammaticality judgments. |
Conclusions | Our system achieved a 44% reduction in the error rate relative to both the Heilman and Smith, and the Lindberg et al. |
Results | As seen in Table 4, our results represent a 44% reduction in the error rate relative to Heilman and Smith on the average rating over all metrics, and as high as 61% reduction in the error rate on grammaticality judgments. |
Results | Interestingly, our system again achieved a 44% reduction in the error rate when averaging over all metrics, just as it did in the Heilman and Smith comparison. |
Abstract | We present an adaptive translation quality estimation (QE) method to predict the human-targeted translation error rate (HTER) for a document-specific machine translation model. |
Introduction | In this paper we propose an adaptive quality estimation that predicts sentence-level human-targeted translation error rate (HTER) (Snover et al., 2006) for a document-specific MT post-editing system. |
Static MT Quality Estimation | score or translation error rate of the translated sentences or documents based on a set of features. |
Experiments | Although we did not examine the accuracy of real tasks in this paper, there is an interesting report that the word error rate of language models follows a power law with respect to perplexity (Klakow and Peters, 2002). |
Experiments | Thus, we conjecture that the word error rate also has a similar tendency as perplexity with respect to the reduced vocabulary size. |
Introduction | Each of these studies experimentally discusses tradeoff relationships between the size of the reduced corpus/model and its performance measured by perplexity, word error rate , and other factors. |
Comparison to BabySRL | Error rate Initial .36 Trained .11 Initial (given 2 args) .66 Trained (given 2 args) .13 2008 arg—arg position .65 2008 arg-verb position 0 2009 arg—arg position .82 2009 arg-verb position .63 |
Comparison to BabySRL | The model presented in this paper does not share this restriction, so the raw error rate for this model is presented in the first two lines; the error rate once this additional restriction is imposed is given in the second two lines. |
Comparison to BabySRL | The 1-1 role bias error rate (before training) of the model presented in this paper is comparable to that of Connor et al. |