Robust Approach to Abbreviating Terms: A Discriminative Latent Variable Model with Global Information
Sun, Xu and Okazaki, Naoaki and Tsujii, Jun'ichi

Article Structure

Abstract

The present paper describes a robust approach for abbreviating terms.

Introduction

Abbreviations represent fully expanded forms (e.g., hidden markov model) through the use of shortened forms (e.g., HMM).

Abbreviator with Nonlocal Information

2.1 A Latent Variable Abbreviator

Experiments

For Chinese abbreviation generation, we used the corpus of Sun et al.

Results and Discussion

4.1 Chinese Abbreviation Generation

Recognition as a Generation Task

We directly migrate this model to the abbreviation recognition task.

Conclusions and Future Research

We have presented the DPLVM and GI encoding by which to incorporate nonlocal information in abbreviating terms.

Topics

latent variable

Appears in 24 sentences as: Latent Variable (1) latent variable (13) latent variables (13)
In Robust Approach to Abbreviating Terms: A Discriminative Latent Variable Model with Global Information
  1. First, in order to incorporate nonlocal information into abbreviation generation tasks, we present both implicit and explicit solutions: the latent variable model, or alternatively, the label encoding approach with global information.
    Page 1, “Abstract”
  2. Variables at, y, and h represent observation, label, and latent variables , respectively.
    Page 2, “Introduction”
  3. discriminative probabilistic latent variable model (DPLVM) in which nonlocal information is modeled by latent variables .
    Page 2, “Introduction”
  4. 2.1 A Latent Variable Abbreviator
    Page 2, “Abbreviator with Nonlocal Information”
  5. To implicitly incorporate nonlocal information, we propose discriminative probabilistic latent variable models (DPLVMs) (Morency et al., 2007; Petrov and Klein, 2008) for abbreviating terms.
    Page 2, “Abbreviator with Nonlocal Information”
  6. The DPLVM is a natural extension of the CRF model (see Figure 2), which is a special case of the DPLVM, with only one latent variable assigned for each label.
    Page 2, “Abbreviator with Nonlocal Information”
  7. The DPLVM uses latent variables to capture additional information that may not be expressed by the observable labels.
    Page 2, “Abbreviator with Nonlocal Information”
  8. For example, using the DPLVM, a possible feature could be “the current character at, = X, the label 3;, = P, and the latent variable hi 2 LV.” The nonlocal information can be effectively modeled in the DPLVM, and the additional information at the previous position or many of the other positions in the past could be transferred via the latent variables (see Figure 2).
    Page 2, “Abbreviator with Nonlocal Information”
  9. For each sequence, we also assume a sequence of latent variables h = h1,h2, .
    Page 3, “Abbreviator with Nonlocal Information”
  10. To ensure that the training and inference are efficient, the model is often restricted to have disjointed sets of latent variables associated with each label (Morency et al., 2007).
    Page 3, “Abbreviator with Nonlocal Information”
  11. Each hj is a member in a set Hyj of possible latent variables for the label yj.
    Page 3, “Abbreviator with Nonlocal Information”

See all papers in Proc. ACL 2009 that mention latent variable.

See all papers in Proc. ACL that mention latent variable.

Back to top.

Viterbi

Appears in 8 sentences as: Viterbi (8)
In Robust Approach to Abbreviating Terms: A Discriminative Latent Variable Model with Global Information
  1. The CRF+GI produced a Viterbi labeling with a low probability, which is an incorrect abbreviation.
    Page 6, “Results and Discussion”
  2. To perform a systematic analysis of the superior-performance of DPLVM compare to CRF+GI, we collected the probability distributions (see Figure 7) of the Viterbi labelings from these models (“DPLVM vs. CRF+GI” is highlighted).
    Page 6, “Results and Discussion”
  3. A large percentage (37.9%) of the Viterbi labelings from the CRF+GI (ENG) have very small probability values (p < 0.1).
    Page 6, “Results and Discussion”
  4. For the DPLVM (ENG), there were only a few (0.5%) Viterbi labelings with small probabilities.
    Page 6, “Results and Discussion”
  5. 0 0.2 0.4 0.6 0.8 1 Probability of Viterbi labeling
    Page 6, “Results and Discussion”
  6. Hence, the features become more sparse than in the Chinese case.5 Therefore, a significant number of features could have been inadequately trained, resulting in Viterbi labelings with low probabilities.
    Page 6, “Results and Discussion”
  7. When a context expression (CE) with a parenthetical expression (PE) is met, the recognizer generates the Viterbi labeling for the CE, which leads to the PE or NULL.
    Page 7, “Recognition as a Generation Task”
  8. Then, if the Viterbi labeling leads to the PE, we can, at the same time, use the labeling to decide the full form within the CE.
    Page 7, “Recognition as a Generation Task”

See all papers in Proc. ACL 2009 that mention Viterbi.

See all papers in Proc. ACL that mention Viterbi.

Back to top.

CRF

Appears in 7 sentences as: CRF (9)
In Robust Approach to Abbreviating Terms: A Discriminative Latent Variable Model with Global Information
  1. Figure 2: CRF vs. DPLVM.
    Page 2, “Introduction”
  2. The DPLVM is a natural extension of the CRF model (see Figure 2), which is a special case of the DPLVM, with only one latent variable assigned for each label.
    Page 2, “Abbreviator with Nonlocal Information”
  3. special case of the DPLVM is exactly the CRF (see Section 2.1), this case is hereinafter denoted as the CRF .
    Page 5, “Results and Discussion”
  4. The results revealed that the latent variable model significantly improved the performance over the CRF model.
    Page 5, “Results and Discussion”
  5. All of its top-l, top-2, and top-3 accuracies were consistently better than those of the CRF model.
    Page 5, “Results and Discussion”
  6. As can be seen in Table 6, using the latent variables significantly improved the performance (see DPLVM vs. CRF), and using the GI encoding improved the performance of both the DPLVM and the CRF .
    Page 8, “Recognition as a Generation Task”
  7. Table 7 shows that the back-off method further improved the performance of both the DPLVM and the CRF model.
    Page 8, “Recognition as a Generation Task”

See all papers in Proc. ACL 2009 that mention CRF.

See all papers in Proc. ACL that mention CRF.

Back to top.

Feature templates

Appears in 6 sentences as: Feature templates (3) feature templates (3)
In Robust Approach to Abbreviating Terms: A Discriminative Latent Variable Model with Global Information
  1. Feature templates #4 to #7 in Table l are used for Chinese abbreviations.
    Page 3, “Abbreviator with Nonlocal Information”
  2. Feature templates #8—#11 are designed for English abbreviations.
    Page 4, “Abbreviator with Nonlocal Information”
  3. Since the number of letters in Chinese (more than 10K characters) is much larger than the number of letters in English (26 letters), in order to avoid a possible overfitting problem, we did not apply these feature templates to Chinese abbreviations.
    Page 4, “Abbreviator with Nonlocal Information”
  4. Feature templates are instantiated with values that occur in positive training examples.
    Page 4, “Abbreviator with Nonlocal Information”
  5. We employ the feature templates defined in Section 2.3, taking into account these 81,827 features for the Chinese abbreviation generation task, and the 50,149 features for the English abbreviation generation task.
    Page 4, “Experiments”
  6. In implementing the recognizer, we simply use the model from the abbreviation generator, with the same feature templates (31,868 features) and training method; the major difference is in the restriction (according to the PE) of the decoding stage and penalizing the probability values of the NULL labelings7.
    Page 7, “Recognition as a Generation Task”

See all papers in Proc. ACL 2009 that mention Feature templates.

See all papers in Proc. ACL that mention Feature templates.

Back to top.

sequential labeling

Appears in 5 sentences as: sequence labeling (1) sequential labeling (4)
In Robust Approach to Abbreviating Terms: A Discriminative Latent Variable Model with Global Information
  1. Figure 1: English (a) and Chinese (b) abbreviation generation as a sequential labeling problem.
    Page 2, “Introduction”
  2. (2005) formalized the processes of abbreviation generation as a sequence labeling problem.
    Page 2, “Introduction”
  3. In order to formalize this task as a sequential labeling problem, we have assumed that the label of a character is determined by the local information of the character and its previous label.
    Page 2, “Introduction”
  4. Even though humans may use global or nonlocal information to abbreviate words, previous studies have not incorporated this information into a sequential labeling model.
    Page 2, “Introduction”
  5. In general, the results indicate that all of the sequential labeling models outperformed the SVM regression model with less training time.3 In the SVM regression approach, a large number of negative examples are explicitly generated for the training, which slowed the process.
    Page 5, “Results and Discussion”

See all papers in Proc. ACL 2009 that mention sequential labeling.

See all papers in Proc. ACL that mention sequential labeling.

Back to top.

data sparseness

Appears in 4 sentences as: data sparseness (4)
In Robust Approach to Abbreviating Terms: A Discriminative Latent Variable Model with Global Information
  1. The curves suggest that the data sparseness problem could be the reason for the differences in performance.
    Page 6, “Results and Discussion”
  2. For the latent variable approach, its curve demonstrates that it did not cause a severe data sparseness problem.
    Page 6, “Results and Discussion”
  3. 5 In addition, the training data of the English task is much smaller than for the Chinese task, which could make the models more sensitive to data sparseness .
    Page 6, “Results and Discussion”
  4. The DPVLM+GIB model was robust even when the data sparseness problem was severe.
    Page 7, “Results and Discussion”

See all papers in Proc. ACL 2009 that mention data sparseness.

See all papers in Proc. ACL that mention data sparseness.

Back to top.

proposed model

Appears in 4 sentences as: proposed model (2) proposed models (2)
In Robust Approach to Abbreviating Terms: A Discriminative Latent Variable Model with Global Information
  1. Experiments revealed that the proposed model worked robustly, and outperformed five out of six state-of-the-art abbreviation recognizers.
    Page 1, “Abstract”
  2. Experimental results indicate that the proposed models significantly outperform previous abbreviation generation studies.
    Page 2, “Introduction”
  3. In addition, we apply the proposed models to the task of abbreviation recognition, in which a model extracts the abbreviation definitions in a given text.
    Page 2, “Introduction”
  4. Note that all of the six systems were specifically designed and optimized for this recognition task, whereas the proposed model is directly transported from the generation task.
    Page 8, “Recognition as a Generation Task”

See all papers in Proc. ACL 2009 that mention proposed model.

See all papers in Proc. ACL that mention proposed model.

Back to top.

SVM

Appears in 4 sentences as: SVM (5)
In Robust Approach to Abbreviating Terms: A Discriminative Latent Variable Model with Global Information
  1. We compared the performance of the DPLVM with the CRFs and other baseline systems, including the heuristic system (Heu), the HMM model, and the SVM model described in $08, i.e., Sun et al.
    Page 5, “Results and Discussion”
  2. The SVM method described by Sun et al.
    Page 5, “Results and Discussion”
  3. In general, the results indicate that all of the sequential labeling models outperformed the SVM regression model with less training time.3 In the SVM regression approach, a large number of negative examples are explicitly generated for the training, which slowed the process.
    Page 5, “Results and Discussion”
  4. The proposed method, the latent variable model with GI encoding, is 9.6% better with respect to the top-l accuracy compared to the best system on this corpus, namely, the SVM regression method.
    Page 5, “Results and Discussion”

See all papers in Proc. ACL 2009 that mention SVM.

See all papers in Proc. ACL that mention SVM.

Back to top.

cross validation

Appears in 3 sentences as: cross validation (3)
In Robust Approach to Abbreviating Terms: A Discriminative Latent Variable Model with Global Information
  1. Table 5: Results of English abbreviation generation with fivefold cross validation .
    Page 7, “Results and Discussion”
  2. Concerning the training time in the cross validation , we simply chose the DPLVM for comparison.
    Page 7, “Results and Discussion”
  3. (2008), we perform 10-fold cross validation .
    Page 8, “Recognition as a Generation Task”

See all papers in Proc. ACL 2009 that mention cross validation.

See all papers in Proc. ACL that mention cross validation.

Back to top.

overfitting

Appears in 3 sentences as: overfitting (3)
In Robust Approach to Abbreviating Terms: A Discriminative Latent Variable Model with Global Information
  1. The first term expresses the conditional log-likelihood of the training data, and the second term represents a regularizer that reduces the overfitting problem in parameter estimation.
    Page 3, “Abbreviator with Nonlocal Information”
  2. Since the number of letters in Chinese (more than 10K characters) is much larger than the number of letters in English (26 letters), in order to avoid a possible overfitting problem, we did not apply these feature templates to Chinese abbreviations.
    Page 4, “Abbreviator with Nonlocal Information”
  3. To reduce overfitting , we employed a L2 Gaussian weight prior (Chen and Rosenfeld, 1999), with the objective function: MG) = 221:110gP(yz|Xi,@)-||@||2/02-Dur-ing training and validation, we set 0 = 1 for the DPLVM generators.
    Page 4, “Experiments”

See all papers in Proc. ACL 2009 that mention overfitting.

See all papers in Proc. ACL that mention overfitting.

Back to top.

significantly improved

Appears in 3 sentences as: significantly improved (2) significantly improves (1)
In Robust Approach to Abbreviating Terms: A Discriminative Latent Variable Model with Global Information
  1. The results revealed that the latent variable model significantly improved the performance over the CRF model.
    Page 5, “Results and Discussion”
  2. Whereas the use of the latent variables still significantly improves the generation performance, using the GI encoding undermined the performance in this task.
    Page 6, “Results and Discussion”
  3. As can be seen in Table 6, using the latent variables significantly improved the performance (see DPLVM vs. CRF), and using the GI encoding improved the performance of both the DPLVM and the CRF.
    Page 8, “Recognition as a Generation Task”

See all papers in Proc. ACL 2009 that mention significantly improved.

See all papers in Proc. ACL that mention significantly improved.

Back to top.