SciSurf: Index of 'Exploring Deterministic Constraints: from a Constrained English POS Tagger to an Efficient ILP Solution to Chinese Word Segmentation'

Topics

POS tagging (31)
ILP (20)
Viterbi (18)
word segmentation (15)
feature set (14)
feature templates (9)
Chinese word segmentation (9)
Chinese word (9)
bigram (8)
Treebank (6)
development set (5)
feature space (4)
Penn Treebank (4)
probabilistic models (4)
F-measure (3)
highest scoring (3)
linear programming (3)
perceptron (3)
segmentations (3)

Topics

POS tagging (31)
ILP (20)
Viterbi (18)
word segmentation (15)
feature set (14)
feature templates (9)
Chinese word segmentation (9)
Chinese word (9)
bigram (8)
Treebank (6)
development set (5)
feature space (4)
Penn Treebank (4)
probabilistic models (4)
F-measure (3)
highest scoring (3)
linear programming (3)
perceptron (3)
segmentations (3)

Exploring Deterministic Constraints: from a Constrained English POS Tagger to an Efficient ILP Solution to Chinese Word Segmentation

Zhao, Qiuye and Marcus, Mitch

Published in Proc. ACL, 2012

Article Structure

Abstract

We show for both English POS tagging and Chinese word segmentation that with proper representation, large number of deterministic constraints can be learned from training examples, and these are useful in constraining probabilistic inference.

Topics

POS tagging

Appears in 31 sentences as: POS tag (2) POS tagger (2) POS tagging (23) POS taging (1) POS tags (4)

In Exploring Deterministic Constraints: from a Constrained English POS Tagger to an Efficient ILP Solution to Chinese Word Segmentation

We show for both English POS tagging and Chinese word segmentation that with proper representation, large number of deterministic constraints can be learned from training examples, and these are useful in constraining probabilistic inference.
Page 1, “Abstract”
”assign label 75 to word w” for POS tagging .
Page 1, “Abstract”
In this work, we explore deterministic constraints for two fundamental NLP problems, English POS tagging and Chinese word segmentation.
Page 1, “Abstract”
For POS tagging , the learned constraints are directly used to constrain Viterbi decoding.
Page 1, “Abstract”
For Chinese word segmentation (CWS), which can be formulated as character tagging, analogous constraints can be learned with the same templates as English POS tagging .
Page 1, “Abstract”
Therefore, these character-based constraints are not directly used for determining predictions as in English POS tagging .
Page 1, “Abstract”
2 English POS tagging
Page 2, “Abstract”
Part-of-speech (POS) tags are used to categorize words, for example, the POS tag VBG tags verbal gerunds, NNS tags nominal plurals, DT tags determiners and so on.
Page 2, “Abstract”
Following this POS representation, there are as many as 10 possible POS tags that may occur in between the—0f, as estimated from the WSJ corpus of Penn Treebank.
Page 2, “Abstract”
To explore determinacy in the distribution of POS tags in Penn Treebank, we need to consider that a POS tag marks the basic syntactic category of a word as well as its morphological inflection.
Page 2, “Abstract”
Table 2: The templates for generating potentially deterministic constraints of English POS tagging .
Page 2, “Abstract”

See all papers in Proc. ACL 2012 that mention POS tagging.

See all papers in Proc. ACL that mention POS tagging.

Back to top.

ILP

Appears in 20 sentences as: ILP (21)

In Exploring Deterministic Constraints: from a Constrained English POS Tagger to an Efficient ILP Solution to Chinese Word Segmentation

However, they are better applied to a word-based model, thus an integer linear programming ( ILP ) formulation is proposed.
Page 1, “Abstract”
In recent work, interesting results are reported for applications of integer linear programming ( ILP ) such as semantic role labeling (SRL) (Roth and Yih, 2005), dependency parsing (Martins et al., 2009) and so on.
Page 1, “Abstract”
In an ILP formulation, ’nonlocal’ deterministic constraints on output structures can be naturally incorporated, such as ”a verb cannot take two subject arguments” for SRL, and the projectiv-ity constraint for dependency parsing.
Page 1, “Abstract”
We propose an ILP formulation of the CWS problem.
Page 1, “Abstract”
By adopting this ILP formulation, segmentation F—measure is increased from 0.968 to 0.974, as compared to Viterbi decoding with the same feature set.
Page 1, “Abstract”
This reduction of problem size immediately speeds up an ILP solver by more than 100 times.
Page 1, “Abstract”
We propose an Integer Linear Programming ( ILP ) formulation of word segmentation, which is naturally viewed as a word-based model for CWS.
Page 4, “Abstract”
3.3 ILP formulation of CWS
Page 4, “Abstract”
This resembles the ILP formulation of the set cover problem, though the first constraint is different.
Page 4, “Abstract”
We proposed an ILP formulation of the CWS problem in Section 3.3, where we present a word-based model.
Page 7, “Abstract”
The two possible ILP solutions give two possible segmenta-tions {01, 02} and {0102}, thus there are 2 tag sequences evaluated by ILP , BB and BI.
Page 7, “Abstract”

See all papers in Proc. ACL 2012 that mention ILP.

See all papers in Proc. ACL that mention ILP.

Back to top.

Viterbi

Appears in 18 sentences as: Viterbi (18)

In Exploring Deterministic Constraints: from a Constrained English POS Tagger to an Efficient ILP Solution to Chinese Word Segmentation

For tagging, learned constraints are directly used to constrain Viterbi decoding.
Page 1, “Abstract”
Dynamic programming techniques based on Markov assumptions, such as Viterbi decoding, cannot handle those ’nonlocal’ constraints as discussed above.
Page 1, “Abstract”
However, it is possible to constrain Viterbi
Page 1, “Abstract”
For POS tagging, the learned constraints are directly used to constrain Viterbi decoding.
Page 1, “Abstract”
By adopting this ILP formulation, segmentation F—measure is increased from 0.968 to 0.974, as compared to Viterbi decoding with the same feature set.
Page 1, “Abstract”
Viterbi algorithm is widely used for tagging, and runs in 0(nT2) when searching in an unconstrained space.
Page 3, “Abstract”
Viterbi decoder runs in 0(m1T + m2T2) while searching in such a constrained space.
Page 3, “Abstract”
In (Shen et al., 2007), lookahead features may be available for use during decoding since searching is bidirectional instead of left-to-right as in Viterbi decoding.
Page 3, “Abstract”
In this work, deterministic constraints are decoded before the application of probabilistic models, therefore lookahead features are made available during Viterbi decoding.
Page 3, “Abstract”
The Viterbi algorithm is also widely used for decoding.
Page 3, “Abstract”
A well-known technique to speed up Viterbi decoding is to conduct beam search.
Page 6, “Abstract”

See all papers in Proc. ACL 2012 that mention Viterbi.

See all papers in Proc. ACL that mention Viterbi.

Back to top.

word segmentation

Appears in 15 sentences as: Word Segmentation (1) Word segmentation (1) word segmentation (12) word segmentations (1)

In Exploring Deterministic Constraints: from a Constrained English POS Tagger to an Efficient ILP Solution to Chinese Word Segmentation

We show for both English POS tagging and Chinese word segmentation that with proper representation, large number of deterministic constraints can be learned from training examples, and these are useful in constraining probabilistic inference.
Page 1, “Abstract”
In this work, we explore deterministic constraints for two fundamental NLP problems, English POS tagging and Chinese word segmentation .
Page 1, “Abstract”
For Chinese word segmentation (CWS), which can be formulated as character tagging, analogous constraints can be learned with the same templates as English POS tagging.
Page 1, “Abstract”
3 Chinese Word Segmentation (CWS)
Page 3, “Abstract”
3.1 Word segmentation as character tagging
Page 3, “Abstract”
3.2 Word-based word segmentation
Page 4, “Abstract”
We propose an Integer Linear Programming (ILP) formulation of word segmentation , which is naturally viewed as a word-based model for CWS.
Page 4, “Abstract”
Sun (2011) uses this technique to merge different levels of predictors for word segmentation .
Page 5, “Abstract”
We run experiments on Chinese word segmentation on the Penn Chinese Treebank 5.0.
Page 5, “Abstract”
For the character tagging formulation of Chinese word segmentation , we discussed two tagsets IB and BMES in Section 3.1.
Page 6, “Abstract”
5.4 Chinese word segmentation
Page 7, “Abstract”

See all papers in Proc. ACL 2012 that mention word segmentation.

See all papers in Proc. ACL that mention word segmentation.

Back to top.

feature set

Appears in 14 sentences as: feature set (13) feature sets (2)

In Exploring Deterministic Constraints: from a Constrained English POS Tagger to an Efficient ILP Solution to Chinese Word Segmentation

By adopting this ILP formulation, segmentation F—measure is increased from 0.968 to 0.974, as compared to Viterbi decoding with the same feature set .
Page 1, “Abstract”
We adopt the basic feature set used in (Ratnaparkhi, 1996) and (Collins, 2002).
Page 3, “Abstract”
As introduced in Section 2.2, we adopt a very compact feature set used in (Ratnaparkhi, l996)1.
Page 6, “Abstract”
While searching in a constrained space, we can also extend this feature set with some basic lookahead features as defined in Section 2.2.
Page 6, “Abstract”
This replicates the feature set B used in (Shen et al., 2007).
Page 6, “Abstract”
1Our implementation of this feature set is basically the same as the version used in (Collins, 2002).
Page 6, “Abstract”
Our baseline model, which uses Ratnaparkhi (l996)’s feature set and conducts a beam (=5) search in the unconstrained space, achieves a tagging accuracy of 97.16%.
Page 7, “Abstract”
strained greedy tagger using Ratnaparkhi (l996)’s feature set , with the additional use of three locally lookahead feature templates, tagging accuracy is increased from 96.80% to 97.02%.
Page 7, “Abstract”
When no further data is used other than training data, the bidirectional tagger described in (Shen et al., 2007) achives an accuracy of 97.33%, using a much richer feature set (E) than feature set B, the one we compare with here.
Page 7, “Abstract”
As noted above, the addition of three feature templates already has a notable negative impact on efficiency, thus the use of feature set E will hurt tagging efficiency much worse.
Page 7, “Abstract”
Rich feature sets are also widely used in other work that pursue state-of-art tagging accuracy, e.g.
Page 7, “Abstract”

See all papers in Proc. ACL 2012 that mention feature set.

See all papers in Proc. ACL that mention feature set.

Back to top.

feature templates

Appears in 9 sentences as: feature templates (9)

In Exploring Deterministic Constraints: from a Constrained English POS Tagger to an Efficient ILP Solution to Chinese Word Segmentation

Moreover, when deterministic constraints have applied to contextual words of wo, it is also possible to include some lookahead feature templates , such as:
Page 3, “Abstract”
Character-based feature templates
Page 3, “Abstract”
We adopt the ’non—leXical-target’ feature templates in (Jiang et al., 2008a).
Page 3, “Abstract”
As studied in previous work, word-based feature templates usually include the word itself, sub-words contained in the word, contextual characters/words and so on.
Page 4, “Abstract”
From this View, the character-based feature templates defined in Section 3.1 are naturally used in a word-based model.
Page 5, “Abstract”
The specific word-based feature templates are illustrated in the second row of Table 5.
Page 5, “Abstract”
strained greedy tagger using Ratnaparkhi (l996)’s feature set, with the additional use of three locally lookahead feature templates , tagging accuracy is increased from 96.80% to 97.02%.
Page 7, “Abstract”
As noted above, the addition of three feature templates already has a notable negative impact on efficiency, thus the use of feature set E will hurt tagging efficiency much worse.
Page 7, “Abstract”
With the same feature templates as described in Section 3.1, we now compare these two decoding methods.
Page 7, “Abstract”

See all papers in Proc. ACL 2012 that mention feature templates.

See all papers in Proc. ACL that mention feature templates.

Back to top.

Chinese word segmentation

Appears in 9 sentences as: Chinese Word Segmentation (1) Chinese word segmentation (8)

In Exploring Deterministic Constraints: from a Constrained English POS Tagger to an Efficient ILP Solution to Chinese Word Segmentation

We show for both English POS tagging and Chinese word segmentation that with proper representation, large number of deterministic constraints can be learned from training examples, and these are useful in constraining probabilistic inference.
Page 1, “Abstract”
In this work, we explore deterministic constraints for two fundamental NLP problems, English POS tagging and Chinese word segmentation .
Page 1, “Abstract”
For Chinese word segmentation (CWS), which can be formulated as character tagging, analogous constraints can be learned with the same templates as English POS tagging.
Page 1, “Abstract”
3 Chinese Word Segmentation (CWS)
Page 3, “Abstract”
We run experiments on Chinese word segmentation on the Penn Chinese Treebank 5.0.
Page 5, “Abstract”
For the character tagging formulation of Chinese word segmentation , we discussed two tagsets IB and BMES in Section 3.1.
Page 6, “Abstract”
5.4 Chinese word segmentation
Page 7, “Abstract”
Table 10: F—measure on Chinese word segmentation .
Page 8, “Abstract”
These deterministic constraints are very useful in constraining probabilistic search, for example, they may be directly used for determining predictions as in English POS tagging, or used for reducing the number of variables in an ILP solution as in Chinese word segmentation .
Page 8, “Abstract”

See all papers in Proc. ACL 2012 that mention Chinese word segmentation.

See all papers in Proc. ACL that mention Chinese word segmentation.

Back to top.

Chinese word

Appears in 9 sentences as: Chinese Word (1) Chinese word (8)

In Exploring Deterministic Constraints: from a Constrained English POS Tagger to an Efficient ILP Solution to Chinese Word Segmentation

We show for both English POS tagging and Chinese word segmentation that with proper representation, large number of deterministic constraints can be learned from training examples, and these are useful in constraining probabilistic inference.
Page 1, “Abstract”
In this work, we explore deterministic constraints for two fundamental NLP problems, English POS tagging and Chinese word segmentation.
Page 1, “Abstract”
For Chinese word segmentation (CWS), which can be formulated as character tagging, analogous constraints can be learned with the same templates as English POS tagging.
Page 1, “Abstract”
3 Chinese Word Segmentation (CWS)
Page 3, “Abstract”
We run experiments on Chinese word segmentation on the Penn Chinese Treebank 5.0.
Page 5, “Abstract”
For the character tagging formulation of Chinese word segmentation, we discussed two tagsets IB and BMES in Section 3.1.
Page 6, “Abstract”
5.4 Chinese word segmentation
Page 7, “Abstract”
Table 10: F—measure on Chinese word segmentation.
Page 8, “Abstract”
These deterministic constraints are very useful in constraining probabilistic search, for example, they may be directly used for determining predictions as in English POS tagging, or used for reducing the number of variables in an ILP solution as in Chinese word segmentation.
Page 8, “Abstract”

See all papers in Proc. ACL 2012 that mention Chinese word.

See all papers in Proc. ACL that mention Chinese word.

Back to top.

bigram

Appears in 8 sentences as: bigram (8)

In Exploring Deterministic Constraints: from a Constrained English POS Tagger to an Efficient ILP Solution to Chinese Word Segmentation

We consider bigram and trigram templates for generating potentially deterministic constraints.
Page 2, “Abstract”
bigram constraint includes one contextual word (w_1|w1) or the corresponding morph feature; and a trigram constraint includes both contextual words or their morph features.
Page 2, “Abstract”
precision recall F1 bigram 0.993 0.841 0.911 trigram 0.996 0.608 0.755
Page 6, “Abstract”
As shown in Table 6, deterministic constraints learned with both bigram and trigram templates are all very accurate in predicting POS tags of words in their context.
Page 6, “Abstract”
Constraints generated by bigram template alone can already cover 84.1% of the input words with a high precision of 0.993.
Page 6, “Abstract”
There are 114589 bigram deterministic constraints and 130647 trigram constraints learned from the training data.
Page 6, “Abstract”
We show a couple of examples of bigram deterministic constraints in Table 7.
Page 6, “Abstract”
With respect to either tagset, we use both bigram and trigram templates to generate deterministic constraints for the corresponding tagging problem.
Page 6, “Abstract”

See all papers in Proc. ACL 2012 that mention bigram.

See all papers in Proc. ACL that mention bigram.

Back to top.

Treebank

Appears in 6 sentences as: Treebank (6)

In Exploring Deterministic Constraints: from a Constrained English POS Tagger to an Efficient ILP Solution to Chinese Word Segmentation

On the other hand, consider the annotation guideline of English Treebank (Marcus et a1., 1993) instead.
Page 2, “Abstract”
Following this POS representation, there are as many as 10 possible POS tags that may occur in between the—0f, as estimated from the WSJ corpus of Penn Treebank .
Page 2, “Abstract”
To explore determinacy in the distribution of POS tags in Penn Treebank , we need to consider that a POS tag marks the basic syntactic category of a word as well as its morphological inflection.
Page 2, “Abstract”
Table 1: Morph features of frequent words and rare words as computed from the WSJ Corpus of Penn Treebank .
Page 2, “Abstract”
We run experiments on English POS tagging on the WSJ corpus in the Penn Treebank .
Page 5, “Abstract”
We run experiments on Chinese word segmentation on the Penn Chinese Treebank 5.0.
Page 5, “Abstract”

See all papers in Proc. ACL 2012 that mention Treebank.

See all papers in Proc. ACL that mention Treebank.

Back to top.

development set

Appears in 5 sentences as: development set (5)

In Exploring Deterministic Constraints: from a Constrained English POS Tagger to an Efficient ILP Solution to Chinese Word Segmentation

Following most previous work, e. g. (Collins, 2002) and (Shen et al., 2007), we divide this corpus into training set (sections 0-18), development set (sections 19-21) and the final test set (sections 22-24).
Page 5, “Abstract”
Following (J iang et al., 2008a), we divide this corpus into training set (chapters 1-260), development set (chapters 271-300) and the final test set (chapters 301-325).
Page 5, “Abstract”
Experiments in this section are carried out on the development set .
Page 5, “Abstract”
For English POS tagging, as well as the CWS problem that will be discussed in the next section, we use the development set to choose training iterations (= 5), set beam width etc.
Page 6, “Abstract”
on the development set , we set beam-width of our baseline model as 5.
Page 7, “Abstract”

See all papers in Proc. ACL 2012 that mention development set.

See all papers in Proc. ACL that mention development set.

Back to top.

feature space

Appears in 4 sentences as: feature space (4)

In Exploring Deterministic Constraints: from a Constrained English POS Tagger to an Efficient ILP Solution to Chinese Word Segmentation

The score of tag predictions are usually computed in a high-dimensional feature space .
Page 3, “Abstract”
Consider a character-based feature function gb(c, t, c) that maps a character-tag pair to a high-dimensional feature space , with respect to an input character sequence c. For a possible word over c of length l , w, = 0,0 .
Page 5, “Abstract”
In Section 3.4, we describe a way of mapping words to a character-based feature space .
Page 7, “Abstract”
Tagset BMES is used for character tagging as well as for mapping words to character-based feature space .
Page 7, “Abstract”

See all papers in Proc. ACL 2012 that mention feature space.

See all papers in Proc. ACL that mention feature space.

Back to top.

Penn Treebank

Appears in 4 sentences as: Penn Treebank (4)

In Exploring Deterministic Constraints: from a Constrained English POS Tagger to an Efficient ILP Solution to Chinese Word Segmentation

Following this POS representation, there are as many as 10 possible POS tags that may occur in between the—0f, as estimated from the WSJ corpus of Penn Treebank .
Page 2, “Abstract”
To explore determinacy in the distribution of POS tags in Penn Treebank , we need to consider that a POS tag marks the basic syntactic category of a word as well as its morphological inflection.
Page 2, “Abstract”
Table 1: Morph features of frequent words and rare words as computed from the WSJ Corpus of Penn Treebank .
Page 2, “Abstract”
We run experiments on English POS tagging on the WSJ corpus in the Penn Treebank .
Page 5, “Abstract”

See all papers in Proc. ACL 2012 that mention Penn Treebank.

See all papers in Proc. ACL that mention Penn Treebank.

Back to top.

probabilistic models

Appears in 4 sentences as: probabilistic model (1) probabilistic models (3)

In Exploring Deterministic Constraints: from a Constrained English POS Tagger to an Efficient ILP Solution to Chinese Word Segmentation

In this work, deterministic constraints are decoded before the application of probabilistic models , therefore lookahead features are made available during Viterbi decoding.
Page 3, “Abstract”
Since these deterministic constraints are applied before the decoding of probabilistic models , reliably high precision of their predictions is crucial.
Page 6, “Abstract”
However, when tagset BMES is used, the learned constraints don’t always make reliable predictions, and the overall precision is not high enough to constrain a probabilistic model .
Page 6, “Abstract”
These constraints make more accurate predictions than probabilistic models , thus besides improving the overall tagging speed as we eXpect, tagging accuracy also improves by a little.
Page 7, “Abstract”

See all papers in Proc. ACL 2012 that mention probabilistic models.

See all papers in Proc. ACL that mention probabilistic models.

Back to top.

F-measure

Appears in 3 sentences as: F-measure (3)

In Exploring Deterministic Constraints: from a Constrained English POS Tagger to an Efficient ILP Solution to Chinese Word Segmentation

We transform tagged character sequences to word segmentations first, and then evaluate word segmenta-tions by F-measure , as defined in Section 5.2.
Page 7, “Abstract”
We focus on the task of word segmentation only in this work and show that a comparable F-measure is achievable in a much more efficient manner.
Page 8, “Abstract”
The addition of these features makes a moderate improvement on the F-measure , from 0.974 to 0.975.
Page 8, “Abstract”

See all papers in Proc. ACL 2012 that mention F-measure.

See all papers in Proc. ACL that mention F-measure.

Back to top.

highest scoring

Appears in 3 sentences as: highest scoring (3)

In Exploring Deterministic Constraints: from a Constrained English POS Tagger to an Efficient ILP Solution to Chinese Word Segmentation

A character-based CWS decoder is to find the highest scoring tag sequence t over the input character sequence c, i.e.
Page 3, “Abstract”
A word-based CWS decoder finds the highest scoring segmentation sequence w that is composed by the input character sequence c, i.e.
Page 4, “Abstract”
From this view, the highest scoring tagging sequence is computed subject to structural constraints, giving us an inference alternative to Viterbi decoding.
Page 7, “Abstract”

See all papers in Proc. ACL 2012 that mention highest scoring.

See all papers in Proc. ACL that mention highest scoring.

Back to top.

linear programming

Appears in 3 sentences as: Linear Programming (1) linear programming (2)

In Exploring Deterministic Constraints: from a Constrained English POS Tagger to an Efficient ILP Solution to Chinese Word Segmentation

However, they are better applied to a word-based model, thus an integer linear programming (ILP) formulation is proposed.
Page 1, “Abstract”
In recent work, interesting results are reported for applications of integer linear programming (ILP) such as semantic role labeling (SRL) (Roth and Yih, 2005), dependency parsing (Martins et al., 2009) and so on.
Page 1, “Abstract”
We propose an Integer Linear Programming (ILP) formulation of word segmentation, which is naturally viewed as a word-based model for CWS.
Page 4, “Abstract”

See all papers in Proc. ACL 2012 that mention linear programming.

See all papers in Proc. ACL that mention linear programming.

Back to top.

perceptron

Appears in 3 sentences as: perceptron (3)

In Exploring Deterministic Constraints: from a Constrained English POS Tagger to an Efficient ILP Solution to Chinese Word Segmentation

parameter estimation of 6, we use the averaged perceptron as described in (Collins, 2002).
Page 5, “Abstract”
POS-/+: perceptron trained without/with POS.
Page 8, “Abstract”
These results are compared to the core perceptron trained without POS in (J iang et al., 2008a).
Page 8, “Abstract”

See all papers in Proc. ACL 2012 that mention perceptron.

See all papers in Proc. ACL that mention perceptron.

Back to top.

segmentations

Appears in 3 sentences as: segGEN (1) segmentations (3)

In Exploring Deterministic Constraints: from a Constrained English POS Tagger to an Efficient ILP Solution to Chinese Word Segmentation

Since it is hard to achieve the best segmentations with tagset IB, we propose an indirect way to use these constraints in the following section, instead of applying these constraints as straightforwardly as in English POS tagging.
Page 4, “Abstract”
wEsegGEN(c) i=1 where function segGEN maps character sequence c to the set of all possible segmentations of c. For example, W = (cl..cll)...(cn_lk+1...cn) represents a segmentation of 1:: words and the lengths of the first and last word are [1 and lk respectively.
Page 4, “Abstract”
We transform tagged character sequences to word segmentations first, and then evaluate word segmenta-tions by F-measure, as defined in Section 5.2.
Page 7, “Abstract”

See all papers in Proc. ACL 2012 that mention segmentations.

See all papers in Proc. ACL that mention segmentations.

Back to top.