Automatic Syllabification with Structured SVMs for Letter-to-Phoneme Conversion
Bartlett, Susan and Kondrak, Grzegorz and Cherry, Colin

Article Structure

Abstract

We present the first English syllabification system to improve the accuracy of letter-to-phoneme conversion.

Introduction

Pronouncing an unfamiliar word is a task that is often accomplished by breaking the word down into smaller components.

Related Work

Automatic preprocessing of words is desirable because the productive nature of language ensures that no finite lexicon will contain all words.

Structured SVMs

A structured support vector machine (SVM) is a large-margin training method that can learn to predict structured outputs, such as tag sequences or parse trees, instead of performing binary classification (Tsochantaridis et al., 2004).

Syllabification with Structured SVMs

In this paper we apply structured SVMs to the syllabification problem.

Syllabification Experiments

In this section, we will discuss the results of our best emission feature set (five-gram features with a context window of eleven letters) on held-out unseen test sets.

L2P Performance

As we stated from the outset, one of our primary motivations for exploring orthographic syllabification is the improvements it can produce in L2P systems.

Conclusion

We have applied structured SVMs to the syllabification problem, clearly outperforming existing systems.

Topics

SVM

Appears in 16 sentences as: SVM (18)
In Automatic Syllabification with Structured SVMs for Letter-to-Phoneme Conversion
  1. We formulate syllabification as a tagging problem, and learn a discriminative tagger from labeled data using a structured support vector machine ( SVM ) (Tsochantaridis et al., 2004).
    Page 1, “Introduction”
  2. A structured support vector machine ( SVM ) is a large-margin training method that can learn to predict structured outputs, such as tag sequences or parse trees, instead of performing binary classification (Tsochantaridis et al., 2004).
    Page 2, “Structured SVMs”
  3. We employ a structured SVM that predicts tag sequences, called an SVM Hidden Markov Model, or SVM-HMM.
    Page 2, “Structured SVMs”
  4. This approach can be considered an SVM because the model parameters are trained discrimi-natively to separate correct tag sequences from incorrect ones by as large a margin as possible.
    Page 2, “Structured SVMs”
  5. There are a number of good reasons to apply the structured SVM formalism to this problem.
    Page 2, “Structured SVMs”
  6. Unlike a traditional SVM, the structured SVM considers complete tag sequences during training, instead of breaking each sequence into a number of training instances.
    Page 3, “Structured SVMs”
  7. Training a structured SVM can be viewed as a multi-class classification problem.
    Page 3, “Structured SVMs”
  8. Given a trained weight vector w, the SVM tags new instances :0,- according to:
    Page 3, “Structured SVMs”
  9. A structured SVM finds a 212 that satisfies Equation 1, and separates the correct taggings by as large a margin as possible.
    Page 3, “Structured SVMs”
  10. The structured SVM solves this problem with an iterative online approach:
    Page 3, “Structured SVMs”
  11. The SVM framework is less restrictive: we can include 0 as an emission feature, but we can also include features indicating that the preceding and following letters are m and r respectively.
    Page 4, “Syllabification with Structured SVMs”

See all papers in Proc. ACL 2008 that mention SVM.

See all papers in Proc. ACL that mention SVM.

Back to top.

error rate

Appears in 12 sentences as: error rate (14)
In Automatic Syllabification with Structured SVMs for Letter-to-Phoneme Conversion
  1. In comparison with a state-of-the-art syllabification system, we reduce the syllabification word error rate for English by 33%.
    Page 1, “Abstract”
  2. With this approach, we reduce the error rate for English by 33%, relative to the best existing system.
    Page 1, “Introduction”
  3. Syllable break error rate (SB ER) captures the incorrect tags that cause an error in syllabification.
    Page 6, “Syllabification Experiments”
  4. Table 1 presents the word accuracy and syllable break error rate achieved by each of our tag sets on both the CELEX and NETtalk datasets.
    Page 6, “Syllabification Experiments”
  5. Overall, our best tag set lowers the error rate by one-third, relative to SbA’s performance.
    Page 6, “Syllabification Experiments”
  6. In English, perfect syllabification produces a relative error reduction of 10.6%, and our model captures over half of the possible improvement, reducing the error rate by 6.0%.
    Page 8, “L2P Performance”
  7. Although perfect syllabification reduces their L2P relative error rate by 18%, they find that their learned model actually increases the error rate .
    Page 8, “L2P Performance”
  8. For Dutch, perfect syllabification reduces the relative L2P error rate by 17.5%; we realize over 70% of the available improvement with our syllabification model, reducing the relative error rate by 12.4%.
    Page 8, “L2P Performance”
  9. In German, perfect syllabification produces only a small reduction of 3.9% in the relative error rate .
    Page 8, “L2P Performance”
  10. Experiments show that our learned model actually produces a slightly higher reduction in the relative error rate .
    Page 8, “L2P Performance”
  11. (2007) report that oracle morphological annotation produces a relative error rate reduction
    Page 8, “L2P Performance”

See all papers in Proc. ACL 2008 that mention error rate.

See all papers in Proc. ACL that mention error rate.

Back to top.

development set

Appears in 7 sentences as: development set (7)
In Automatic Syllabification with Structured SVMs for Letter-to-Phoneme Conversion
  1. We use a linear kernel, and tune the SVM’s cost parameter on a development set .
    Page 3, “Syllabification with Structured SVMs”
  2. While developing our tagging schemes and feature representation, we used a development set of 5K words held out from our CELEX training data.
    Page 3, “Syllabification with Structured SVMs”
  3. Our development set experiments suggested that numbering ONC tags increases their performance.
    Page 4, “Syllabification with Structured SVMs”
  4. After experimenting with the development set , we decided to include in our feature set a window of eleven characters around the focus character, five on either side.
    Page 4, “Syllabification with Structured SVMs”
  5. Figure 1: Word accuracy as a function of the window size around the focus character, using unigram features on the development set .
    Page 5, “Syllabification with Structured SVMs”
  6. Figure 2: Word accuracy as a function of maximum n—gram size on the development set .
    Page 5, “Syllabification with Structured SVMs”
  7. Recall that 5K training examples were held out as a development set .
    Page 6, “Syllabification Experiments”

See all papers in Proc. ACL 2008 that mention development set.

See all papers in Proc. ACL that mention development set.

Back to top.

feature set

Appears in 4 sentences as: feature set (4)
In Automatic Syllabification with Structured SVMs for Letter-to-Phoneme Conversion
  1. With SVM-HMM, the crux of the task is to create a tag scheme and feature set that produce good results.
    Page 3, “Syllabification with Structured SVMs”
  2. After experimenting with the development set, we decided to include in our feature set a window of eleven characters around the focus character, five on either side.
    Page 4, “Syllabification with Structured SVMs”
  3. As is apparent from Figure 2, we see a substantial improvement by adding bigrams to our feature set .
    Page 5, “Syllabification with Structured SVMs”
  4. In this section, we will discuss the results of our best emission feature set (five-gram features with a context window of eleven letters) on held-out unseen test sets.
    Page 5, “Syllabification Experiments”

See all papers in Proc. ACL 2008 that mention feature set.

See all papers in Proc. ACL that mention feature set.

Back to top.

bigram

Appears in 3 sentences as: bigram (2) bigrams (2)
In Automatic Syllabification with Structured SVMs for Letter-to-Phoneme Conversion
  1. For example, the bigram bl frequently occurs within a single English syllable, while the bigram lb generally straddles two syllables.
    Page 4, “Syllabification with Structured SVMs”
  2. Thus, in addition to the single-letter features outlined above, we also include in our representation any bigrams , trigrams, four-grams, and five-grams that fit inside our context window.
    Page 5, “Syllabification with Structured SVMs”
  3. As is apparent from Figure 2, we see a substantial improvement by adding bigrams to our feature set.
    Page 5, “Syllabification with Structured SVMs”

See all papers in Proc. ACL 2008 that mention bigram.

See all papers in Proc. ACL that mention bigram.

Back to top.

n-gram

Appears in 3 sentences as: n-gram (3)
In Automatic Syllabification with Structured SVMs for Letter-to-Phoneme Conversion
  1. Chen (2003) uses an n-gram model and Viterbi decoder as a syllabifier, and then applies it as a preprocessing step in his maximum-entropy-based English L2P system.
    Page 2, “Related Work”
  2. In addition to these primary n-gram features, we experimented with linguistically-derived features.
    Page 5, “Syllabification with Structured SVMs”
  3. We believe that this is caused by the ability of the SVM to learn such generalizations from the n-gram features alone.
    Page 5, “Syllabification with Structured SVMs”

See all papers in Proc. ACL 2008 that mention n-gram.

See all papers in Proc. ACL that mention n-gram.

Back to top.

Viterbi

Appears in 3 sentences as: Viterbi (3)
In Automatic Syllabification with Structured SVMs for Letter-to-Phoneme Conversion
  1. Chen (2003) uses an n-gram model and Viterbi decoder as a syllabifier, and then applies it as a preprocessing step in his maximum-entropy-based English L2P system.
    Page 2, “Related Work”
  2. This approach can be considered an HMM because the Viterbi algorithm is used to find the highest scoring tag sequence for a given observation sequence.
    Page 2, “Structured SVMs”
  3. The argmax in Equation 2 is conducted using the Viterbi algorithm.
    Page 3, “Structured SVMs”

See all papers in Proc. ACL 2008 that mention Viterbi.

See all papers in Proc. ACL that mention Viterbi.

Back to top.