Experiment and Results | One feature of our approach is that it permits mining the data for tree patterns of arbitrary size using different types of labelling information ( POS tags , dependencies, word forms and any combination thereof). |
Experiment and Results | 4.3.1 Mining on single labels (word form, POS tag or dependency) |
Experiment and Results | Mining on a single label permits (i) assessing the relative impact of each category in a given label category and (ii) identifying different sources of errors depending on the type of label considered ( POS tag , dependency or word form). |
Abstract | We propose the first joint model for word segmentation, POS tagging , and dependency parsing for Chinese. |
Abstract | Based on an extension of the incremental joint model for POS tagging and dependency parsing (Hatori et al., 2011), we propose an efficient character-based decoding method that can combine features from state-of-the-art segmentation, POS tagging , and dependency parsing models. |
Abstract | In experiments using the Chinese Treebank (CTB), we show that the accuracies of the three tasks can be improved significantly over the baseline models, particularly by 0.6% for POS tagging and 2.4% for dependency parsing. |
Introduction | Furthermore, the word-level information is often augmented with the POS tags , which, along with segmentation, form the basic foundation of statistical NLP. |
Introduction | Because the tasks of word segmentation and POS tagging have strong interactions, many studies have been devoted to the task of joint word segmentation and POS tagging for languages such as Chinese (e.g. |
Introduction | This is because some of the segmentation ambiguities cannot be resolved without considering the surrounding grammatical constructions encoded in a sequence of POS tags . |
Related Works | In Chinese, Luo (2003) proposed a joint constituency parser that performs segmentation, POS tagging , and parsing within a single character-based framework. |
Dependency Parsing | Given an input sentence x = wowl...wn and its POS tag sequence 1; = totl...tn, the goal of dependency parsing is to build a dependency tree as depicted in Figure l, denoted by d = {(h, m, l) : 0 g h 3 72,0 < m g n,l E L}, where (h,m, l) indicates an directed arc from the head word (also called father) w, to the modifier (also called child or dependent) wm with a dependency label l, and L is the label set. |
Dependency Parsing with QG Features | The type of the TP is conjoined with the related words and POS tags , such that the QG—enhanced parsing models can make more elaborate decisions based on the context. |
Experiments and Analysis | CDT and CTB5/6 adopt different POS tag sets, and converting from one tag set to another is difficult (Niu et al., 2009).5 To overcome this problem, we use the People’s Daily corpus (PD),6 a large—scale corpus annotated with word segmentation and POS tags, to train a statistical POS tagger . |
Experiments and Analysis | The tagger produces a universal layer of POS tags for both the source and target treebanks. |
Experiments and Analysis | For all models used in current work ( POS tagging and parsing), we adopt averaged perceptron to train the feature weights (Collins, 2002). |
Conclusions and future work | Second, simply treating POS tags within a small window of the verb as pseudo-GRs produces state-of-the-art results without the need for a parsing model. |
Conclusions and future work | In fact, by integrating results from unsupervised POS tagging (Teichert and Daume III, 2009) we could render this approach fully domain- and language-independent. |
Introduction | Second, by replacing the syntactic features with an approximation based on POS tags , we achieve state-of-the-art performance without relying on error-prone unlexicalized or domain-specific lexicalized parsers. |
Methodology | The CONLL format is a common language for comparing output from dependency parsers: each lexical item has an index, lemma, POS tag , tGR in which it is the dependent, and index to the corresponding head. |
Methodology | Table 2 shows the three variations we tested: the simple tGR type, with parameterization for the POS tags of head and dependent, and with closed-class POS tags (determiners, pronouns and prepositions) lexicalized. |
Methodology | An unlexicalized parser cannot distinguish these based just on POS tags , while a lexicalized parser requires a large treebank. |
Previous work | Graphical models have been increasingly popular for a variety of tasks such as distributional semantics (Blei et al., 2003) and unsupervised POS tagging (Finkel et al., 2007), and sampling methods allow efficient estimation of full joint distributions (Neal, 1993). |
Previous work | Their study employed unsupervised POS tagging and parsing, and measures of selectional preference and argument structure as complementary features for the classifier. |
Results | Since POS tagging is more reliable and robust across domains than parsing, retraining on new domains will not suffer the effects of a mismatched parsing model (Lippincott et al., 2010). |
Results | Third, lexicalizing the closed-class POS tags introduces semantic information outside the scope of the alternation-based definition of subcategorization. |
Experiments | The dependency parser and POS tagger are trained on supervised data and up-trained on data labeled by the CKY—style bottom-up constituent parser of Huang et al. |
Experiments | We use the POS tagger to generate tags for dependency training to match the test setting. |
Incorporating Syntactic Structures | Long-span models — generative or discriminative, N -best or hill climbing — rely on auxiliary tools, such as a POS tagger or a parser, for extracting features for each hypothesis during rescoring, and during training for discriminative models. |
Incorporating Syntactic Structures | A major complexity factor is due to processing 100s or 1000s of hypotheses for each speech utterance, even during hill climbing, each of which must be POS tagged and parsed. |
Incorporating Syntactic Structures | For integer typed features the mapping is trivial, for string typed features (e. g. a POS tag identity) we use a mapping of the corresponding vocabulary to integers. |
Syntactic Language Models | where h.w and h.t denote the word identity and the POS tag of the corresponding exposed head word. |
Up-Training | We apply up-training to improve the accuracy of both our fast POS tagger and dependency parser. |
Abstract | From the perspective of structural linguistics, we explore paradigmatic and syntagmatic lexical relations for Chinese POS tagging , an important and challenging task for Chinese language processing. |
Introduction | Automatically assigning POS tags to words plays an important role in parsing, word sense disambiguation, as well as many other NLP applications. |
Introduction | While state-of-the-art tagging systems have achieved accuracies above 97% on English, Chinese POS tagging has proven to be more challenging and obtained accuracies about 93-94% (Tseng et al., 2005b; Huang et al., 2007, 2009; Li et al., 2011). |
Introduction | It is generally accepted that Chinese POS tagging often requires more sophisticated language processing techniques that are capable of drawing inferences from more subtle linguistic knowledge. |
State-of-the-Art | In some cases, the methods work well without large modifications, such as German POS tagging . |
About Heterogeneous Annotations | For Chinese word segmentation and POS tagging , supervised learning has become a dominant paradigm. |
About Heterogeneous Annotations | Although several institutions to date have released their segmented and POS tagged data, acquiring sufficient quantities of high quality training examples is still a major bottleneck. |
About Heterogeneous Annotations | The statistics after colons are how many times this POS tag pair appears among the 3561 words that are consistently segmented. |
Introduction | In particular, joint word segmentation and POS tagging is addressed as a two step process. |
Joint Chinese Word Segmentation and POS Tagging | words, word segmentation and POS tagging are important initial steps for Chinese language processing. |
Joint Chinese Word Segmentation and POS Tagging | Two kinds of approaches are popular for joint word segmentation and POS tagging . |
Joint Chinese Word Segmentation and POS Tagging | In this kind of approach, the task is formulated as the classification of characters into POS tags with boundary information. |
Structure-based Stacking | Table 1: Mapping between CTB and PPD POS Tags . |
Abstract | We show for both English POS tagging and Chinese word segmentation that with proper representation, large number of deterministic constraints can be learned from training examples, and these are useful in constraining probabilistic inference. |
Abstract | ”assign label 75 to word w” for POS tagging . |
Abstract | In this work, we explore deterministic constraints for two fundamental NLP problems, English POS tagging and Chinese word segmentation. |
Learning Time Constraints | n—gram POS The 4—gram and 3-gram of POS tags that end with the year |
Previous Work | Kanhabua and Norvag (2008; 2009) extended this approach with the same model, but expanded its unigrams with POS tags , collocations, and tf-idf scores. |
Timestamp Classifiers | Word Classes: include only nouns, verbs, and adjectives as labeled by a POS tagger |
Timestamp Classifiers | on POS tags and tf-idf scores. |
Timestamp Classifiers | Typed Dependency POS: Similar to Typed Dependency, this feature uses POS tags of the dependency relation’s governor. |
Experiments | Examining translation rules extracted from the training data shows that there are 72,366 types of non-terminals with respect to 33 types of POS tags . |
Head-Driven HPB Translation Model | Instead of collapsing all non-terminals in the source language into a single symbol X as in Chiang (2007), given a word sequence f2- from position i to position 3', we first find heads and then concatenate the POS tags of these heads as fé’s nonterminal symbol. |
Head-Driven HPB Translation Model | We look for initial phrase pairs that contain other phrases and then replace sub-phrases with POS tags corresponding to their heads. |
Introduction | Here, each Chinese word is attached with its POS tag and Pinyin. |
Discussion and Future Work | We can then award partial scores for related words, such as those identified as such by WordNet or those with the same POS tags . |
Experiments | However, its use of POS tags and synonym dictionaries prevents its use at the character-level. |
Experiments | We use the Stanford Chinese word segmenter (Tseng et al., 2005) and POS tagger (Toutanova et al., 2003) for preprocessing and Cilin for synonym |
Introduction | However, many different segmentation standards eXist for different purposes, such as Microsoft Research Asia (MSRA) for Named Entity Recognition (NER), Chinese Treebank (CTB) for parsing and part-of-speech (POS) tagging, and City University of Hong Kong (CITYU) and Academia Sinica (AS) for general word segmentation and POS tagging . |
A Class-based Model of Agreement | The coarse categories are the universal POS tag set described by Petrov et al. |
A Class-based Model of Agreement | For Arabic, we used the coarse POS tags plus definiteness and the so-called phi features (gender, number, and person).4 For example, SJWl ‘the car’ would be tagged “Noun+Def+Sg+Fem”. |
Discussion of Translation Results | For comparison, +POS indicates our class-based model trained on the 11 coarse POS tags only (e.g., “Noun”). |
Experiments | We apply 1-best and k-best sequential decoding algorithms to five NLP tagging tasks: Penn TreeBank (PTB) POS tagging, CoNLLZOOO joint POS tagging and chunking, CoNLL 2003 joint POS tagging , chunking and named entity tagging, HPSG supertag-ging (Matsuzaki et al., 2007) and a search query named entity recognition (NER) dataset. |
Experiments | As in (Kaji et al., 2010), we combine the POS tags and chunk tags to form joint tags for CoNLL 2000 dataset, e.g., NN|B-NP. |
Experiments | Similarly we combine the POS tags , chunk tags, and named entity tags to form joint tags for CoNLL 2003 dataset, e.g., PRP$|I-NP|O. |