Co-regularizing character-based and word-based models for semi-supervised Chinese word segmentation
Zeng, Xiaodong and Wong, Derek F. and Chao, Lidia S. and Trancoso, Isabel

Article Structure

Abstract

This paper presents a semi-supervised Chinese word segmentation (CWS) approach that co-regularizes character-based and word-based models.

Introduction

Chinese word segmentation (CWS) is a critical and a necessary initial procedure with respect to the majority of high-level Chinese language processing tasks such as syntax parsing, information extraction and machine translation, since Chinese scripts are written in continuous characters without explicit word boundaries.

Segmentation Models

There are two classes of CWS models: character-based and word-based.

Semi-supervised Learning via Co-regularizing Both Models

As mentioned earlier, the primary challenge of semi-supervised CWS concentrates on the unlabeled data.

Experiment

4.1 Setting

Conclusion

This paper proposed an alternative semi-supervised CWS model that co-regularizes a character- and word-based model by using their segmentation agreements on unlabeled data.

Topics

unlabeled data

Appears in 17 sentences as: unlabeled data (17)
In Co-regularizing character-based and word-based models for semi-supervised Chinese word segmentation
  1. Similarly to multi-view learning, the “segmentation agreements” between the two different types of view are used to overcome the scarcity of the label information on unlabeled data .
    Page 1, “Abstract”
  2. The agreements are regarded as a set of valuable constraints for regularizing the learning of both models on unlabeled data .
    Page 1, “Abstract”
  3. Sun and Xu (2011) enhanced the segmentation results by interpolating the statistics-based features derived from unlabeled data to a CRFs model.
    Page 1, “Introduction”
  4. The crux of solving semi-supervised learning problem is the learning on unlabeled data .
    Page 1, “Introduction”
  5. Inspired by multi-view learning that exploits redundant views of the same input data (Ganchev et al., 2008), this paper proposes a semi-supervised CWS model of co-regularizing from two different views (intrinsically two different models), character-based and word-based, on unlabeled data .
    Page 1, “Introduction”
  6. The proposed approach begins by training a character-based and word-based model on labeled data respectively, and then both models are regularized from each view by their segmentation agreements, i.e., the identical outputs, of unlabeled data .
    Page 1, “Introduction”
  7. As mentioned earlier, the primary challenge of semi-supervised CWS concentrates on the unlabeled data .
    Page 2, “Semi-supervised Learning via Co-regularizing Both Models”
  8. Obviously, the learning on unlabeled data does not come for “free”.
    Page 2, “Semi-supervised Learning via Co-regularizing Both Models”
  9. Very often, it is necessary to discover certain gainful information, e.g., label constraints of unlabeled data , that is incorporated to guide the learner toward a desired solution.
    Page 2, “Semi-supervised Learning via Co-regularizing Both Models”
  10. This naturally leads to a new learning objective that maximizes segmentation agreements between two models on unlabeled data .
    Page 2, “Semi-supervised Learning via Co-regularizing Both Models”
  11. This study proposes a co-regularized CWS model based on character-based and word-based models, built on a small amount of segmented sentences (labeled data) and a large amount of raw sentences ( unlabeled data ).
    Page 3, “Semi-supervised Learning via Co-regularizing Both Models”

See all papers in Proc. ACL 2013 that mention unlabeled data.

See all papers in Proc. ACL that mention unlabeled data.

Back to top.

semi-supervised

Appears in 13 sentences as: semi-supervised (13)
In Co-regularizing character-based and word-based models for semi-supervised Chinese word segmentation
  1. This paper presents a semi-supervised Chinese word segmentation (CWS) approach that co-regularizes character-based and word-based models.
    Page 1, “Abstract”
  2. The evaluation on the Chinese tree bank reveals that our model results in better gains over the state-of-the-art semi-supervised models reported in the literature.
    Page 1, “Abstract”
  3. This naturally provides motivation for using easily accessible raw texts to enhance supervised CWS models, in semi-supervised approaches.
    Page 1, “Introduction”
  4. In the past years, however, few semi-supervised CWS models have been proposed.
    Page 1, “Introduction”
  5. (2008) described a Bayesian semi-supervised model by considering the segmentation as the hidden variable in machine translation.
    Page 1, “Introduction”
  6. The crux of solving semi-supervised learning problem is the learning on unlabeled data.
    Page 1, “Introduction”
  7. Inspired by multi-view learning that exploits redundant views of the same input data (Ganchev et al., 2008), this paper proposes a semi-supervised CWS model of co-regularizing from two different views (intrinsically two different models), character-based and word-based, on unlabeled data.
    Page 1, “Introduction”
  8. As mentioned earlier, the primary challenge of semi-supervised CWS concentrates on the unlabeled data.
    Page 2, “Semi-supervised Learning via Co-regularizing Both Models”
  9. The line of “ours” reports the performance of our semi-supervised model with the tuned parameters.
    Page 4, “Experiment”
  10. It can be observed that our semi-supervised model is able to benefit from unlabeled data and greatly improves the results over the supervised baseline.
    Page 4, “Experiment”
  11. We also compare our model with two state-of-the-art semi-supervised methods of Wang ’11 (Wang et al., 2011) and Sun ’11 (Sun and Xu, 2011).
    Page 4, “Experiment”

See all papers in Proc. ACL 2013 that mention semi-supervised.

See all papers in Proc. ACL that mention semi-supervised.

Back to top.

CRFs

Appears in 10 sentences as: CRFs (10)
In Co-regularizing character-based and word-based models for semi-supervised Chinese word segmentation
  1. Sun and Xu (2011) enhanced the segmentation results by interpolating the statistics-based features derived from unlabeled data to a CRFs model.
    Page 1, “Introduction”
  2. This section briefly reviews two supervised models in these categories, a character-based CRFs model, and a word-based Perceptrons model, which are used in our approach.
    Page 2, “Segmentation Models”
  3. 2.1 Character-based CRFs Model
    Page 2, “Segmentation Models”
  4. Xue (2003) first proposed the use of CRFs model (Lafferty et al., 2001) in character-based CWS.
    Page 2, “Segmentation Models”
  5. The aim of CRFs is to estimate the weight parameters 60 that maximizes the conditional likelihood of the training data:
    Page 2, “Segmentation Models”
  6. The model induction process is described in Algorithm 1: given labeled dataset D; and unlabeled dataset Du, the first two steps are training a CRFs (character-based) and Perceptrons (word-based) model on the labeled data D; , respectively.
    Page 3, “Semi-supervised Learning via Co-regularizing Both Models”
  7. Afterwards, the agreements A are used as a set of constraints to bias the learning of CRFs (§ 3.2) and Perceptron (§ 3.3) on the unlabeled data.
    Page 3, “Semi-supervised Learning via Co-regularizing Both Models”
  8. 3.2 CRFs with Constraints
    Page 3, “Semi-supervised Learning via Co-regularizing Both Models”
  9. For the character-based model, this paper follows (Tackstrom et al., 2013) to incorporate the segmentation agreements into CRFs .
    Page 3, “Semi-supervised Learning via Co-regularizing Both Models”
  10. The feature templates in (Zhao et al., 2006) and (Zhang and Clark, 2007) are used in training the CRFs model and Perceptrons model, respectively.
    Page 4, “Experiment”

See all papers in Proc. ACL 2013 that mention CRFs.

See all papers in Proc. ACL that mention CRFs.

Back to top.

Perceptrons

Appears in 9 sentences as: Perceptron (1) Perceptrons (8)
In Co-regularizing character-based and word-based models for semi-supervised Chinese word segmentation
  1. This section briefly reviews two supervised models in these categories, a character-based CRFs model, and a word-based Perceptrons model, which are used in our approach.
    Page 2, “Segmentation Models”
  2. 2.2 Word-based Perceptrons Model
    Page 2, “Segmentation Models”
  3. Zhang and Clark (2007) first proposed a word-based segmentation model using a discriminative Perceptrons algorithm.
    Page 2, “Segmentation Models”
  4. The parameters Qw can be estimated by using the Perceptrons method (Collins, 2002) or other online learning algorithms, e.g., Passive Aggressive (Crammer et al., 2006).
    Page 2, “Segmentation Models”
  5. The model induction process is described in Algorithm 1: given labeled dataset D; and unlabeled dataset Du, the first two steps are training a CRFs (character-based) and Perceptrons (word-based) model on the labeled data D; , respectively.
    Page 3, “Semi-supervised Learning via Co-regularizing Both Models”
  6. Afterwards, the agreements A are used as a set of constraints to bias the learning of CRFs (§ 3.2) and Perceptron (§ 3.3) on the unlabeled data.
    Page 3, “Semi-supervised Learning via Co-regularizing Both Models”
  7. 3.3 Perceptrons with Constraints
    Page 3, “Semi-supervised Learning via Co-regularizing Both Models”
  8. For the word-based model, this study incorporates segmentation agreements by a modified parameter update criterion in Perceptrons online training, as shown in Algorithm 2.
    Page 3, “Semi-supervised Learning via Co-regularizing Both Models”
  9. The feature templates in (Zhao et al., 2006) and (Zhang and Clark, 2007) are used in training the CRFs model and Perceptrons model, respectively.
    Page 4, “Experiment”

See all papers in Proc. ACL 2013 that mention Perceptrons.

See all papers in Proc. ACL that mention Perceptrons.

Back to top.

word segmentation

Appears in 5 sentences as: word segmentation (5)
In Co-regularizing character-based and word-based models for semi-supervised Chinese word segmentation
  1. This paper presents a semi-supervised Chinese word segmentation (CWS) approach that co-regularizes character-based and word-based models.
    Page 1, “Abstract”
  2. Chinese word segmentation (CWS) is a critical and a necessary initial procedure with respect to the majority of high-level Chinese language processing tasks such as syntax parsing, information extraction and machine translation, since Chinese scripts are written in continuous characters without explicit word boundaries.
    Page 1, “Introduction”
  3. Character-based models treat word segmentation as a sequence labeling problem, assigning labels to the characters in a sentence indicating their positions in a word.
    Page 2, “Segmentation Models”
  4. Table 2 shows the F-score results of word segmentation on CTB-5, CTB-6 and CTB-7 testing sets.
    Page 4, “Experiment”
  5. It is a supervised joint model of word segmentation , POS tagging and dependency parsing.
    Page 5, “Experiment”

See all papers in Proc. ACL 2013 that mention word segmentation.

See all papers in Proc. ACL that mention word segmentation.

Back to top.

labeled data

Appears in 4 sentences as: labeled data (4)
In Co-regularizing character-based and word-based models for semi-supervised Chinese word segmentation
  1. The proposed approach trains a character-based and word-based model on labeled data , respectively, as the initial models.
    Page 1, “Abstract”
  2. The proposed approach begins by training a character-based and word-based model on labeled data respectively, and then both models are regularized from each view by their segmentation agreements, i.e., the identical outputs, of unlabeled data.
    Page 1, “Introduction”
  3. This study proposes a co-regularized CWS model based on character-based and word-based models, built on a small amount of segmented sentences ( labeled data ) and a large amount of raw sentences (unlabeled data).
    Page 3, “Semi-supervised Learning via Co-regularizing Both Models”
  4. The model induction process is described in Algorithm 1: given labeled dataset D; and unlabeled dataset Du, the first two steps are training a CRFs (character-based) and Perceptrons (word-based) model on the labeled data D; , respectively.
    Page 3, “Semi-supervised Learning via Co-regularizing Both Models”

See all papers in Proc. ACL 2013 that mention labeled data.

See all papers in Proc. ACL that mention labeled data.

Back to top.

scoring function

Appears in 4 sentences as: Score Function (1) scoring function (3)
In Co-regularizing character-based and word-based models for semi-supervised Chinese word segmentation
  1. The segmentation for an input sentence is decoded by using a joint scoring function combining the two induced models.
    Page 1, “Abstract”
  2. Moreover, in order to better combine the strengths of the two models, the proposed approach uses a joint scoring function in a log-linear combination form for the decoding in the segmentation phase.
    Page 1, “Introduction”
  3. 3.4 The Joint Score Function for Decoding
    Page 4, “Semi-supervised Learning via Co-regularizing Both Models”
  4. This paper employs a log-linear interpolation combination (Bishop, 2006) to formulate a joint scoring function based on character-based and word-based models in the decoding:
    Page 4, “Semi-supervised Learning via Co-regularizing Both Models”

See all papers in Proc. ACL 2013 that mention scoring function.

See all papers in Proc. ACL that mention scoring function.

Back to top.

segmentations

Appears in 4 sentences as: segmentations (3) segmentations” (1)
In Co-regularizing character-based and word-based models for semi-supervised Chinese word segmentation
  1. Since each of the models has its own merits, their consensuses signify high confidence segmentations .
    Page 2, “Semi-supervised Learning via Co-regularizing Both Models”
  2. )”, the two segmentations shown in Figure 1 are the predictions from a character-based and word-based model.
    Page 3, “Semi-supervised Learning via Co-regularizing Both Models”
  3. Figure l: The segmentations given by character-based and word-based model, Where the words in “D” refer to the segmentation agreements.
    Page 3, “Semi-supervised Learning via Co-regularizing Both Models”
  4. Because there are no “gold segmentations” for unlabeled sentences, the output sentence predicted by the current model is compared with the agreements instead of the “answers” in the supervised case.
    Page 3, “Semi-supervised Learning via Co-regularizing Both Models”

See all papers in Proc. ACL 2013 that mention segmentations.

See all papers in Proc. ACL that mention segmentations.

Back to top.

development sets

Appears in 3 sentences as: development sets (3)
In Co-regularizing character-based and word-based models for semi-supervised Chinese word segmentation
  1. The development sets are mainly used to tune the values of the weight factor 04 in Equation 5.
    Page 4, “Experiment”
  2. We evaluated the performance (F-score) of our model on the three development sets by using different 04 values, where 04 is progressively increased in steps of 0.1 (0 < 04 < 1.0).
    Page 4, “Experiment”
  3. 1The “baseline” uses a different training configuration so that the oz values in the decoding are also need to be tuned on the development sets .
    Page 4, “Experiment”

See all papers in Proc. ACL 2013 that mention development sets.

See all papers in Proc. ACL that mention development sets.

Back to top.

F-score

Appears in 3 sentences as: F-score (3)
In Co-regularizing character-based and word-based models for semi-supervised Chinese word segmentation
  1. We evaluated the performance ( F-score ) of our model on the three development sets by using different 04 values, where 04 is progressively increased in steps of 0.1 (0 < 04 < 1.0).
    Page 4, “Experiment”
  2. Table 2 shows the F-score results of word segmentation on CTB-5, CTB-6 and CTB-7 testing sets.
    Page 4, “Experiment”
  3. Table 2: F-score (%) results of five CWS models on CTB-5, CTB-6 and CTB-7.
    Page 5, “Experiment”

See all papers in Proc. ACL 2013 that mention F-score.

See all papers in Proc. ACL that mention F-score.

Back to top.