Applying a Grammar-Based Language Model to a Simplified Broadcast-News Transcription Task
Kaufmann, Tobias and Pfister, Beat

Article Structure

Abstract

We propose a language model based on a precise, linguistically motivated grammar (a handcrafted Head-driven Phrase Structure Grammar) and a statistical model estimating the probability of a parse tree.

Introduction

It has repeatedly been pointed out that N-grams model natural language only superficially: an Nth-order Markov chain is a very crude model of the complex dependencies between words in an utterance.

Language Model 2.1 The General Approach

Speech recognizers choose the word sequence W which maximizes the posterior probability P(W|O), where O is the acoustic observation.

Linguistic Resources

3.1 Particularities of the Recognizer Output

Experiments

4.1 Experimental Setup

Conclusions and Outlook

We have presented a language model based on a precise, linguistically motivated grammar, and we have successfully applied it to a difficult broad-domain task.

Topics

language model

Appears in 23 sentences as: language model (21) language modeling (1) language models (2)
In Applying a Grammar-Based Language Model to a Simplified Broadcast-News Transcription Task
  1. We propose a language model based on a precise, linguistically motivated grammar (a handcrafted Head-driven Phrase Structure Grammar) and a statistical model estimating the probability of a parse tree.
    Page 1, “Abstract”
  2. The language model is applied by means of an N -best rescoring step, which allows to directly measure the performance gains relative to the baseline system without rescoring.
    Page 1, “Abstract”
  3. Other linguistically inspired language models like Chelba and J elinek (2000) and Roark (2001) have been applied to continuous speech recognition.
    Page 1, “Introduction”
  4. In the first place, we want our language model to reliably distinguish between grammatical and ungrammatical phrases.
    Page 1, “Introduction”
  5. However, their grammar-based language model did not make use of a probabilistic component, and it was applied to a rather simple recognition task (dictation texts for pupils read and recorded under good acoustic conditions, no out-of-vocabulary words).
    Page 2, “Introduction”
  6. Besides proposing an improved language model , this paper presents experimental results for a much more difficult and realistic task and compares them to the performance of a state-of-the-art baseline system.
    Page 2, “Introduction”
  7. In the following Section, we will first describe our grammar-based language model .
    Page 2, “Introduction”
  8. The language model weight A and the word insertion penalty ip lead to a better performance in practice, but they have no theoretical justification.
    Page 2, “Language Model 2.1 The General Approach”
  9. Our grammar-based language model is incorporated into the above expression as an additional probability Pyram(W), weighted by a parameter ,u:
    Page 2, “Language Model 2.1 The General Approach”
  10. A major problem of grammar-based approaches to language modeling is how to deal with out-of-grammar utterances.
    Page 3, “Language Model 2.1 The General Approach”
  11. In all these cases, the best hypothesis available is likely to be out-of- grammar, but the language model should nevertheless prefer it to competing hypotheses.
    Page 3, “Language Model 2.1 The General Approach”

See all papers in Proc. ACL 2008 that mention language model.

See all papers in Proc. ACL that mention language model.

Back to top.

error rate

Appears in 15 sentences as: error rate (16)
In Applying a Grammar-Based Language Model to a Simplified Broadcast-News Transcription Task
  1. We report a significant reduction in word error rate compared to a state-of-the-art baseline system.
    Page 1, “Abstract”
  2. The influence of N on the word error rate is discussed in the results section.
    Page 3, “Language Model 2.1 The General Approach”
  3. For a given test set we could then compare the word error rate of the baseline system with that of the extended system employing the grammar-based language model.
    Page 5, “Experiments”
  4. tionally high baseline word error rate .
    Page 6, “Experiments”
  5. These classes are interviews (a word error rate of 36.1%), sports reports (28.4%) and press conferences (25.7%).
    Page 6, “Experiments”
  6. The baseline word error rate of the remaining 447 lattices (sentences) is 11.8%.
    Page 6, “Experiments”
  7. The first-best word error rate is 11.79%, and the 100-best oracle word error rate is 4.8%.
    Page 6, “Experiments”
  8. The word error rate was evaluated for each possible pair of parameter values.
    Page 6, “Experiments”
  9. As shown in Table l, the grammar-based language model reduced the word error rate by 9.2% relative over the baseline system.
    Page 6, “Experiments”
  10. on the test data), the word error rate is reduced by 10.7% relative.
    Page 6, “Experiments”
  11. experiment word error rate baseline 11.79%
    Page 7, “Experiments”

See all papers in Proc. ACL 2008 that mention error rate.

See all papers in Proc. ACL that mention error rate.

Back to top.

parse tree

Appears in 10 sentences as: Parse Tree (1) parse tree (6) parse trees (3)
In Applying a Grammar-Based Language Model to a Simplified Broadcast-News Transcription Task
  1. We propose a language model based on a precise, linguistically motivated grammar (a handcrafted Head-driven Phrase Structure Grammar) and a statistical model estimating the probability of a parse tree .
    Page 1, “Abstract”
  2. (2) Pyram(W) is defined as the probability of the most likely parse tree of a word sequence W: P W = P T 3 gram( ) Tepggefiw) ( > ( ) To determine Pyram(W) is an expensive operation as it involves parsing.
    Page 2, “Language Model 2.1 The General Approach”
  3. 2.2 The Probability of a Parse Tree
    Page 2, “Language Model 2.1 The General Approach”
  4. The parse trees produced by our parser are binary-branching and rather deep.
    Page 2, “Language Model 2.1 The General Approach”
  5. In order to compute the probability of a parse tree , it is transformed to a flat dependency tree similar to the syntax graph representation used in the TIGER treebank Brants et al (2002).
    Page 2, “Language Model 2.1 The General Approach”
  6. In particular this means that it should provide a reasonable parse tree for any possible word sequence W. However, our approach is to use an accurate, linguistically motivated grammar, and it is undesirable to weaken the constraints encoded in the grammar.
    Page 3, “Language Model 2.1 The General Approach”
  7. As P(T) does not directly apply to parse trees , all possible readings have to be unpacked.
    Page 6, “Experiments”
  8. For these lattices the grammar-based language model was simply switched off in the experiment, as no parse trees were produced for efficiency reasons.
    Page 6, “Experiments”
  9. first step in this direction by estimating the probability of a parse tree .
    Page 7, “Conclusions and Outlook”
  10. However, our model only looks at the structure of a parse tree and does not take the actual words into account.
    Page 7, “Conclusions and Outlook”

See all papers in Proc. ACL 2008 that mention parse tree.

See all papers in Proc. ACL that mention parse tree.

Back to top.

baseline system

Appears in 6 sentences as: baseline system (6)
In Applying a Grammar-Based Language Model to a Simplified Broadcast-News Transcription Task
  1. The language model is applied by means of an N -best rescoring step, which allows to directly measure the performance gains relative to the baseline system without rescoring.
    Page 1, “Abstract”
  2. We report a significant reduction in word error rate compared to a state-of-the-art baseline system .
    Page 1, “Abstract”
  3. Besides proposing an improved language model, this paper presents experimental results for a much more difficult and realistic task and compares them to the performance of a state-of-the-art baseline system .
    Page 2, “Introduction”
  4. For a given test set we could then compare the word error rate of the baseline system with that of the extended system employing the grammar-based language model.
    Page 5, “Experiments”
  5. Our primary aim was to design a task which allows us to investigate the properties of our grammar-based approach and to compare its performance with that of a competitive baseline system .
    Page 5, “Experiments”
  6. As shown in Table l, the grammar-based language model reduced the word error rate by 9.2% relative over the baseline system .
    Page 6, “Experiments”

See all papers in Proc. ACL 2008 that mention baseline system.

See all papers in Proc. ACL that mention baseline system.

Back to top.

natural language

Appears in 4 sentences as: natural language (4)
In Applying a Grammar-Based Language Model to a Simplified Broadcast-News Transcription Task
  1. It has repeatedly been pointed out that N-grams model natural language only superficially: an Nth-order Markov chain is a very crude model of the complex dependencies between words in an utterance.
    Page 1, “Introduction”
  2. More accurate statistical models of natural language have mainly been developed in the field of statistical parsing, e.g.
    Page 1, “Introduction”
  3. On the other hand, they are not suited to reliably decide on the grammaticality of a given phrase, as they do not accurately model the linguistic constraints inherent in natural language .
    Page 1, “Introduction”
  4. It is a well-known fact that natural language is highly ambiguous: a correct and seemingly unambiguous sentence may have an enormous number of readings.
    Page 7, “Conclusions and Outlook”

See all papers in Proc. ACL 2008 that mention natural language.

See all papers in Proc. ACL that mention natural language.

Back to top.

noun phrases

Appears in 3 sentences as: noun phrases (3)
In Applying a Grammar-Based Language Model to a Simplified Broadcast-News Transcription Task
  1. Three tags are used for different types of noun phrases : pronominal NPs, non-pronominal NPs and prenominal genitives.
    Page 2, “Language Model 2.1 The General Approach”
  2. The model for noun phrases is based on the joint probability of the head type (either noun, adjective or proper name), the presence of a determiner and the presence of pre-and postnominal modifiers.
    Page 3, “Language Model 2.1 The General Approach”
  3. sentences, subordinate clauses, relative and interrogative clauses, noun phrases , prepositional phrases, adjective phrases and expressions of date and time.
    Page 4, “Linguistic Resources”

See all papers in Proc. ACL 2008 that mention noun phrases.

See all papers in Proc. ACL that mention noun phrases.

Back to top.

probability distributions

Appears in 3 sentences as: probability distribution (1) probability distributions (2)
In Applying a Grammar-Based Language Model to a Simplified Broadcast-News Transcription Task
  1. important reason for the success of these models is the fact that they are lexicalized: the probability distributions are also conditioned on the actual words occuring in the utterance, and not only on their parts of speech.
    Page 1, “Introduction”
  2. P was modeled by means of a dedicated probability distribution for each conditioning tag.
    Page 2, “Language Model 2.1 The General Approach”
  3. The resulting probability distributions were trained on the German TIGER treebank which consists of about 50000 sentences of newspaper text.
    Page 3, “Language Model 2.1 The General Approach”

See all papers in Proc. ACL 2008 that mention probability distributions.

See all papers in Proc. ACL that mention probability distributions.

Back to top.

treebank

Appears in 3 sentences as: treebank (3)
In Applying a Grammar-Based Language Model to a Simplified Broadcast-News Transcription Task
  1. These models have in common that they explicitly or implicitly use a context-free grammar induced from a treebank , with the exception of Chelba and J elinek (2000).
    Page 1, “Introduction”
  2. In order to compute the probability of a parse tree, it is transformed to a flat dependency tree similar to the syntax graph representation used in the TIGER treebank Brants et al (2002).
    Page 2, “Language Model 2.1 The General Approach”
  3. The resulting probability distributions were trained on the German TIGER treebank which consists of about 50000 sentences of newspaper text.
    Page 3, “Language Model 2.1 The General Approach”

See all papers in Proc. ACL 2008 that mention treebank.

See all papers in Proc. ACL that mention treebank.

Back to top.