SciSurf: Index of 'Which Words Are Hard to Recognize? Prosodic, Lexical, and Disfluency Factors that Increase ASR Error Rates'

Which Words Are Hard to Recognize? Prosodic, Lexical, and Disfluency Factors that Increase ASR Error Rates

Goldwater, Sharon and Jurafsky, Dan and Manning, Christopher D.

Published in Proc. ACL, 2008

Article Structure

Abstract

Many factors are thought to increase the chances of misrecognizing a word in ASR, including low frequency, nearby disfluencies, short duration, and being at the start of a turn.

Introduction

In order to improve the performance of automatic speech recognition (ASR) systems on conversational speech, it is important to understand the factors that cause problems in recognizing words.

Data

For our analysis, we used the output from the SRMCSWW RT-04 CTS system (Stolcke et al., 2006) on the NIST RT-03 development set.

Analysis of individual features

3.1 Features

Analysis using a joint model

In the previous section, we investigated the effects of various individual features on ASR error rates.

Conclusion

In this paper, we introduced the individual word error rate (IWER) for measuring ASR performance on individual words, including insertions as well as deletions and substitutions.

Topics

error rates

Appears in 36 sentences as: error rate (5) Error rates (1) error rates (34)

In Which Words Are Hard to Recognize? Prosodic, Lexical, and Disfluency Factors that Increase ASR Error Rates

This paper analyzes a variety of lexical, prosodic, and disfluency factors to determine which are likely to increase ASR error rates .
Page 1, “Abstract”
(3) Although our results are based on output from a system with speaker adaptation, speaker differences are a major factor influencing error rates , and the effects of features such as frequency, pitch, and intensity may vary between speakers.
Page 1, “Abstract”
Previous work on recognition of spontaneous monologues and dialogues has shown that infrequent words are more likely to be misrecognized (Fosler—Lussier and Morgan, 1999; Shinozaki and Furui, 2001) and that fast speech increases error rates (Siegler and Stern, 1995; Fosler—Lussier and Morgan, 1999; Shinozaki
Page 1, “Introduction”
Siegler and Stern (1995) and Shinozaki and Furui (2001) also found higher error rates in very slow speech.
Page 1, “Introduction”
Word length (in phones) has also been found to be a useful predictor of higher error rates (Shinozaki and Furui, 2001).
Page 1, “Introduction”
Results for speech rate were ambiguous: faster utterances had higher error rates in one corpus, but lower error rates in the other.
Page 1, “Introduction”
Hirschberg et al.’s (2004) work suggests that prosodic factors can impact error rates , but leaves open the question of which factors are important at the word level and how they influence recognition of natural conversational speech.
Page 1, “Introduction”
Adda-Decker and Lamel’s (2005) suggestion that higher rates of disfluency are a cause of worse recognition for male speakers presupposes that disfluencies raise error rates .
Page 1, “Introduction”
In the remainder of this paper, we first describe the data set used in our study and introduce a new measure of error, individual word error rate (IWER), that allows us to include insertion errors in our analysis, along with deletions and substitutions.
Page 2, “Introduction”
The standard measure of error used in ASR is word error rate (WER), computed as 100(I —|— D —|—S ) / R, where I , D and S are the number of insertions, deletions, and substitutions found by aligning the ASR hypotheses with the reference transcriptions, and R is the number of reference words.
Page 2, “Data”
Since we wish to know what features of a reference word increase the probability of an error, we need a way to measure the errors attributable to individual words — an individual word error rate (IWER).
Page 2, “Data”

See all papers in Proc. ACL 2008 that mention error rates.

See all papers in Proc. ACL that mention error rates.

joint model

Appears in 5 sentences as: joint model (5)

In Which Words Are Hard to Recognize? Prosodic, Lexical, and Disfluency Factors that Increase ASR Error Rates

According to our joint model , these effects still hold even after controlling for other features.
Page 6, “Analysis using a joint model”
Our joint model controls for the first two of these factors, suggesting that the third factor or some other explanation must account for the remaining differences between males and females.
Page 6, “Analysis using a joint model”
In the joint model , we see the same effect of pitch mean and an even stronger effect for intensity, with the predicted odds of an error dramatically higher for extreme intensity values.
Page 6, “Analysis using a joint model”
However, as with the other prosodic features, predictions of the joint model are dominated by quadratic trends, i.e., predicted error rates are lower for average values of duration and speech rate than for extreme values.
Page 6, “Analysis using a joint model”
Using IWER, we analyzed the effects of various word-level lexical and prosodic features, both individually and in a joint model .
Page 8, “Conclusion”

See all papers in Proc. ACL 2008 that mention joint model.

See all papers in Proc. ACL that mention joint model.

logistic regression

Appears in 3 sentences as: logistic regression (3)

In Which Words Are Hard to Recognize? Prosodic, Lexical, and Disfluency Factors that Increase ASR Error Rates

To model data with a binary dependent variable, a logistic regression model is an appropriate choice.
Page 4, “Analysis using a joint model”
In logistic regression , we model the log odds as a linear combination of feature values 2130 .
Page 4, “Analysis using a joint model”
Standard logistic regression models assume that all categorical features are fixed efi‘ects, meaning that all possible values for these features are known in advance, and each value may have an arbitrarily different effect on the outcome.
Page 4, “Analysis using a joint model”

See all papers in Proc. ACL 2008 that mention logistic regression.

See all papers in Proc. ACL that mention logistic regression.