Abstract | We propose the first joint model for word segmentation, POS tagging , and dependency parsing for Chinese. |
Abstract | Based on an extension of the incremental joint model for POS tagging and dependency parsing (Hatori et al., 2011), we propose an efficient character-based decoding method that can combine features from state-of-the-art segmentation, POS tagging , and dependency parsing models. |
Abstract | In experiments using the Chinese Treebank (CTB), we show that the accuracies of the three tasks can be improved significantly over the baseline models, particularly by 0.6% for POS tagging and 2.4% for dependency parsing. |
Introduction | Furthermore, the word-level information is often augmented with the POS tags , which, along with segmentation, form the basic foundation of statistical NLP. |
Introduction | Because the tasks of word segmentation and POS tagging have strong interactions, many studies have been devoted to the task of joint word segmentation and POS tagging for languages such as Chinese (e.g. |
Introduction | This is because some of the segmentation ambiguities cannot be resolved without considering the surrounding grammatical constructions encoded in a sequence of POS tags . |
Related Works | In Chinese, Luo (2003) proposed a joint constituency parser that performs segmentation, POS tagging , and parsing within a single character-based framework. |
Abstract | From the perspective of structural linguistics, we explore paradigmatic and syntagmatic lexical relations for Chinese POS tagging , an important and challenging task for Chinese language processing. |
Introduction | Automatically assigning POS tags to words plays an important role in parsing, word sense disambiguation, as well as many other NLP applications. |
Introduction | While state-of-the-art tagging systems have achieved accuracies above 97% on English, Chinese POS tagging has proven to be more challenging and obtained accuracies about 93-94% (Tseng et al., 2005b; Huang et al., 2007, 2009; Li et al., 2011). |
Introduction | It is generally accepted that Chinese POS tagging often requires more sophisticated language processing techniques that are capable of drawing inferences from more subtle linguistic knowledge. |
State-of-the-Art | In some cases, the methods work well without large modifications, such as German POS tagging . |
Experiments | The dependency parser and POS tagger are trained on supervised data and up-trained on data labeled by the CKY—style bottom-up constituent parser of Huang et al. |
Experiments | We use the POS tagger to generate tags for dependency training to match the test setting. |
Incorporating Syntactic Structures | Long-span models — generative or discriminative, N -best or hill climbing — rely on auxiliary tools, such as a POS tagger or a parser, for extracting features for each hypothesis during rescoring, and during training for discriminative models. |
Incorporating Syntactic Structures | A major complexity factor is due to processing 100s or 1000s of hypotheses for each speech utterance, even during hill climbing, each of which must be POS tagged and parsed. |
Incorporating Syntactic Structures | For integer typed features the mapping is trivial, for string typed features (e. g. a POS tag identity) we use a mapping of the corresponding vocabulary to integers. |
Syntactic Language Models | where h.w and h.t denote the word identity and the POS tag of the corresponding exposed head word. |
Up-Training | We apply up-training to improve the accuracy of both our fast POS tagger and dependency parser. |
Algorithm | To estimate this joint distribution, PSH samples are extracted from the training corpus using unsupervised POS taggers (Clark, 2003; Abend et al., 2010) and an unsupervised parser (Seginer, 2007). |
Algorithm | This parser is unique in its ability to induce a bracketing (unlabeled parsing) from raw text (without even using POS tags ) with strong results. |
Algorithm | We continue by tagging the corpus using Clark’s unsupervised POS tagger (Clark, 2003) and the unsupervised Prototype Tagger (Abend et al., 2010)2. |
Conclusion | The algorithm applies state-of-the-art unsupervised parser and POS tagger to collect statistics from a large raw text corpus. |
Core-Adjunct in Previous Work | In addition, supervised models utilize supervised parsers and POS taggers, while the current state-of-the-art in unsupervised parsing and POS tagging is considerably worse than their supervised counterparts. |
Core-Adjunct in Previous Work | First, all works use manual or supervised syntactic annotations, usually including a POS tagger . |
Experimental Setup | This scenario decouples the accuracy of the algorithm from the quality of the unsupervised POS tagging . |
Experimental Setup | Finally, we experiment on a scenario where even argument identification on the test set is not provided, but performed by the algorithm of (Abend et al., 2009), which uses neither syntactic nor SRL annotation but does utilize a supervised POS tagger . |
Introduction | However, no work has tack-lwnyamammemmfimdmmmm.Umw pervised models reduce reliance on the costly and error prone manual multilayer annotation ( POS tagging , parsing, core-adjunct tagging) commonly used for this task. |
Abstract | Overall, we can say that the improvements are small and not significant using automatic POS tags, contrary to previously published results using gold POS tags (Agirre et al., 2011). |
Experimental Framework | We modified the system in order to add semantic features, combining them with wordforms and POS tags , on the parent and child nodes of each arc. |
Introduction | using MaltParser on gold POS tags . |
Introduction | In this work, we will investigate the effect of semantic information using predicted POS tags . |
Related work | (201 1) successfully introduced WordNet classes in a dependency parser, obtaining improvements on the full PTB using gold POS tags , trying different combinations of semantic classes. |
Results | For all the tests, we used a perceptron POS-tagger (Collins, 2002), trained on WSJ sections 2—21, to assign POS tags automatically to both the training (using 10—way jackknifing) and test data, obtaining a POS tagging accuracy of 97.32% on the test data. |
Results | Overall, we see that the small improvements do not confirm the previous results on Penn2Malt, MaltParser and gold POS tags . |
Results | One of the obstacles of automatic parsers is the presence of incorrect POS tags due to auto- |
Abstract | :0, is the POS tag of 21),). |
Abstract | The word embeddings are used during the leam-ing process, but the final decoder that the learning algorithm outputs maps a POS tag sequence a: to a parse tree. |
Abstract | While ideally we would want to use the word information in decoding as well, much of the syntax of a sentence is determined by the POS tags, and relatively high level of accuracy can be achieved by learning, for example, a supervised parser from POS tag sequences. |
Conclusions and future work | Second, simply treating POS tags within a small window of the verb as pseudo-GRs produces state-of-the-art results without the need for a parsing model. |
Conclusions and future work | In fact, by integrating results from unsupervised POS tagging (Teichert and Daume III, 2009) we could render this approach fully domain- and language-independent. |
Introduction | Second, by replacing the syntactic features with an approximation based on POS tags , we achieve state-of-the-art performance without relying on error-prone unlexicalized or domain-specific lexicalized parsers. |
Methodology | The CONLL format is a common language for comparing output from dependency parsers: each lexical item has an index, lemma, POS tag , tGR in which it is the dependent, and index to the corresponding head. |
Methodology | Table 2 shows the three variations we tested: the simple tGR type, with parameterization for the POS tags of head and dependent, and with closed-class POS tags (determiners, pronouns and prepositions) lexicalized. |
Methodology | An unlexicalized parser cannot distinguish these based just on POS tags , while a lexicalized parser requires a large treebank. |
Previous work | Graphical models have been increasingly popular for a variety of tasks such as distributional semantics (Blei et al., 2003) and unsupervised POS tagging (Finkel et al., 2007), and sampling methods allow efficient estimation of full joint distributions (Neal, 1993). |
Previous work | Their study employed unsupervised POS tagging and parsing, and measures of selectional preference and argument structure as complementary features for the classifier. |
Results | Since POS tagging is more reliable and robust across domains than parsing, retraining on new domains will not suffer the effects of a mismatched parsing model (Lippincott et al., 2010). |
Results | Third, lexicalizing the closed-class POS tags introduces semantic information outside the scope of the alternation-based definition of subcategorization. |
Abstract | In this paper, we address the problem of web-domain POS tagging using a two-phase approach. |
Abstract | The representation is integrated as features into a neural network that serves as a scorer for an easy-first POS tagger . |
Introduction | However, state-of-the-art POS taggers in the literature (Collins, 2002; Shen et al., 2007) are mainly optimized on the the Penn Treebank (PTB), and when shifted to web data, tagging accuracies drop significantly (Petrov and McDonald, 2012). |
Introduction | We integrate the learned encoder with a set of well-established features for POS tagging (Ratnaparkhi, 1996; Collins, 2002) in a single neural network, which is applied as a scorer to an easy-first POS tagger . |
Introduction | We choose the easy-first tagging approach since it has been demonstrated to give higher accuracies than the standard left-to-right POS tagger (Shen et al., 2007; Ma et al., 2013). |
Learning from Web Text | This may partly be due to the fact that unlike computer vision tasks, the input structure of POS tagging or other sequential labelling tasks is relatively simple, and a single nonlinear layer is enough to model the interactions within the input (Wang and Manning, 2013). |
Neural Network for POS Disambiguation | The main challenge to designing the neural network structure is: on the one hand, we hope that the model can take the advantage of information provided by the learned WRRBM, which reflects general properties of web texts, so that the model generalizes well in the web domain; on the other hand, we also hope to improve the model’s discriminative power by utilizing well-established POS tagging features, such as those of Ratnaparkhi (1996). |
Neural Network for POS Disambiguation | Under the output layer, the network consists of two modules: the web-feature module, which incorporates knowledge from the pre-trained WRRBM, and the sparse-feature module, which makes use of other POS tagging features. |
Neural Network for POS Disambiguation | For POS tagging , we found that a simple linear layer yields satisfactory accuracies. |
Dependency Parsing | Given an input sentence x = wowl...wn and its POS tag sequence 1; = totl...tn, the goal of dependency parsing is to build a dependency tree as depicted in Figure l, denoted by d = {(h, m, l) : 0 g h 3 72,0 < m g n,l E L}, where (h,m, l) indicates an directed arc from the head word (also called father) w, to the modifier (also called child or dependent) wm with a dependency label l, and L is the label set. |
Dependency Parsing with QG Features | The type of the TP is conjoined with the related words and POS tags , such that the QG—enhanced parsing models can make more elaborate decisions based on the context. |
Experiments and Analysis | CDT and CTB5/6 adopt different POS tag sets, and converting from one tag set to another is difficult (Niu et al., 2009).5 To overcome this problem, we use the People’s Daily corpus (PD),6 a large—scale corpus annotated with word segmentation and POS tags, to train a statistical POS tagger . |
Experiments and Analysis | The tagger produces a universal layer of POS tags for both the source and target treebanks. |
Experiments and Analysis | For all models used in current work ( POS tagging and parsing), we adopt averaged perceptron to train the feature weights (Collins, 2002). |
About Heterogeneous Annotations | For Chinese word segmentation and POS tagging , supervised learning has become a dominant paradigm. |
About Heterogeneous Annotations | Although several institutions to date have released their segmented and POS tagged data, acquiring sufficient quantities of high quality training examples is still a major bottleneck. |
About Heterogeneous Annotations | The statistics after colons are how many times this POS tag pair appears among the 3561 words that are consistently segmented. |
Introduction | In particular, joint word segmentation and POS tagging is addressed as a two step process. |
Joint Chinese Word Segmentation and POS Tagging | words, word segmentation and POS tagging are important initial steps for Chinese language processing. |
Joint Chinese Word Segmentation and POS Tagging | Two kinds of approaches are popular for joint word segmentation and POS tagging . |
Joint Chinese Word Segmentation and POS Tagging | In this kind of approach, the task is formulated as the classification of characters into POS tags with boundary information. |
Structure-based Stacking | Table 1: Mapping between CTB and PPD POS Tags . |
Distribution Prediction | As we go on to show in Section 6, this enables us to use the same distribution prediction method for both POS tagging and sentiment classification. |
Domain Adaptation | We consider two DA tasks: (a) cross-domain POS tagging (Section 4.1), and (b) cross-domain sentiment classification (Section 4.2). |
Domain Adaptation | 4.1 Cross-Domain POS Tagging |
Domain Adaptation | manually POS tagged ) sentence, we select its neighbours 7N) in the source domain as additional features. |
Introduction | 0 Using the learnt distribution prediction model, we propose a method to learn a cross-domain POS tagger . |
Related Work | words that appear in both the source and target domains) to adapt a POS tagger to a target domain. |
Related Work | Choi and Palmer (2012) propose a cross-domain POS tagging method by training two separate models: a generalised model and a domain-specific model. |
Related Work | Adding latent states to the smoothing model further improves the POS tagging accuracy (Huang and Yates, 2012). |
Experiment and Results | One feature of our approach is that it permits mining the data for tree patterns of arbitrary size using different types of labelling information ( POS tags , dependencies, word forms and any combination thereof). |
Experiment and Results | 4.3.1 Mining on single labels (word form, POS tag or dependency) |
Experiment and Results | Mining on a single label permits (i) assessing the relative impact of each category in a given label category and (ii) identifying different sources of errors depending on the type of label considered ( POS tag , dependency or word form). |
Introduction | Such a comparison brings up another crucial question: “Do existing POS taggers and chun- |
Introduction | Nevertheless, a great number of researchers have used existing POS taggers and chunkers to analyze the writing of learners of English. |
Introduction | For instance, error detection methods normally use a POS tagger and/or a chunker in the error detection process. |
Method | Considering this, we determined a basic rule as follows: “Use the Penn Treebank tag set and preserve the original texts as much as possible.” To handle such errors, we made several modifications and added two new POS tags (CE and UK) and another two for chunking (XP and PH), which are described below. |
Method | Note that each POS tag is hyphenated. |
UK and XP stand for unknown and X phrase, respectively. | 5.1 POS Tagging |
UK and XP stand for unknown and X phrase, respectively. | HMM-based and CRF-based POS taggers were tested on the shallow-parsed corpus. |
UK and XP stand for unknown and X phrase, respectively. | Both use the Penn Treebank POS tag set. |
Approach Overview | The focus of this work is on building POS taggers for foreign languages, assuming that we have an English POS tagger and some parallel text between the two languages. |
Approach Overview | The POS distributions over the foreign trigram types are used as features to learn a better unsupervised POS tagger (§5). |
Experiments and Results | 9We extracted only the words and their POS tags from the treebanks. |
Experiments and Results | (2011) provide a mapping A from the fine-grained language specific POS tags in the foreign treebank to the universal POS tags . |
Graph Construction | Graph construction for structured prediction problems such as POS tagging is nontrivial: on the one hand, using individual words as the vertices throws away the context |
Graph Construction | They considered a semi-supervised POS tagging scenario and showed that one can use a graph over trigram types, and edge weights based on distributional similarity, to improve a supervised conditional random field tagger. |
Introduction | Unfortunately, the best completely unsupervised English POS tagger (that does not make use of a tagging dictionary) reaches only 76.1% accuracy (Christodoulopoulos et al., 2010), making its practical usability questionable at best. |
Introduction | Our final average POS tagging accuracy of 83.4% compares very favorably to the average accuracy of Berg-Kirkpatrick et al.’s monolingual unsupervised state-of-the-art model (73.0%), and considerably bridges the gap to fully supervised POS tagging performance (96.6%). |
PCS Induction | After running label propagation (LP), we compute tag probabilities for foreign word types cc by marginalizing the POS tag distributions of foreign trigrams ui = :c_ cc 55+ over the left and right con- |
PCS Induction | This vector tag is constructed for every word in the foreign vocabulary and will be used to provide features for the unsupervised foreign language POS tagger . |
PCS Induction | For English POS tagging , Berg-Kirkpatrick et al. |
Mention Extraction System | These are a combination of 21);, itself, its POS tag , and its integer offset from the last word (lw) in the mention. |
Mention Extraction System | These features are meant to capture the word and POS tag sequences in mentions. |
Mention Extraction System | Contextual We extract the word C_1,_1 immediately before mi, the word C+1,+1 immediately after mi, and their associated POS tags P. |
Relation Extraction System | POS features If there is a single word between the two mentions, we extract its POS tag . |
Relation Extraction System | Given the hw of m, Pm- refers to the sequence of POS tags in the immediate context of hw (we exclude the POS tag of hw). |
Relation Extraction System | The offsets i and j denote the position (relative to hw) of the first and last POS tag respectively. |
Syntactico-Semantic Structures | 0 If u* is not empty, we require that it satisfies any of the following POS tag sequences: JJ+ \/ JJ and JJ? |
Syntactico-Semantic Structures | These are (optional) POS tag sequences that normally start a valid noun phrase. |
Syntactico-Semantic Structures | 0 We use two patterns to differentiate between premodifier relations and possessive relations, by checking for the existence of POS tags PRP$, WP$, POS, and the word “’s”. |
Discussion | Except in Row 8 and Row 11, when two head nouns of entity pair were combined as semantic pair and when POS tag were combined with the entity type, the performances are decreased. |
Discussion | Comparing the reference set (5) with the reference set (3), the Head noan and adjacent entity POS tag get a better performance when used as singletons. |
Discussion | In this paper, for a better demonstration of the constraint condition, we still use the Position Sensitive as the default setting to use the Head noan and the adjacent entity POS tag . |
Feature Construction | All the employed features are simply classified into five categories: Entity Type and Subtype, Head Noun, Position Feature, POS Tag and Omni-word Feature. |
Feature Construction | POS Tag: In our model, we use only the adjacent entity POS tags , which lie in two sides of the entity mention. |
Feature Construction | These POS tags are labelled by the ICTCLAS packagez. |
Experiments | This sample is manually labeled with three annotations: capitalization, POS tags , and segmentation, according to the description of these annotations in Figure 1. |
Experiments | Table 1: Summary of query annotation performance for capitalization (CAP), POS tagging (TAG) and segmentation. |
Experiments | In case of POS tagging , the decisions are ternary, and hence we report the classification accuracy. |
Independent Query Annotations | On the other hand, given sentence from a corpus that is relevant to the query lCh as “Hawaiian Falls is a family-friendly water-:irk”, the word “falls” is correctly identified by a andard POS tagger as a proper noun. |
Independent Query Annotations | (2010), an estimate of p(Cz-|7“) is a smoothed estimator that combines the information from the retrieved sentence 7“ with the information about unigrams (for capitalization and POS tagging ) and bigrams (for segmentation) from a large n-gram corpus (Brants and Franz, 2006). |
Joint Query Annotation | Many query annotations that are useful for IR can be represented using this simple form, including capitalization, POS tagging , phrase chunking, named entity recognition, and stopword indicators, to name just a few. |
Joint Query Annotation | For instance, imagine that we need to perform two annotations: capitalization and POS tagging . |
Query Annotation Example | In this scheme, each query is marked-up using three annotations: capitalization, POS tags , and segmentation indicators. |
Related Work | Most of the previous work on query annotation focuses on performing a particular annotation task (e.g., segmentation or POS tagging ) in isolation. |
Approaches | A typical pipeline consists of a POS tagger , dependency parser, and semantic role labeler. |
Approaches | Brown Clusters We use fully unsupervised Brown clusters (Brown et al., 1992) in place of POS tags . |
Approaches | We define the DMV such that it generates sequences of word classes: either POS tags or Brown clusters as in Spitkovsky et al. |
Experiments | Our experiments are subtractive, beginning with all supervision available and then successively removing (a) dependency syntax, (b) morphological features, (c) POS tags , and (d) lemmas. |
Experiments | The CoNLL-2009 Shared Task (Hajic et al., 2009) dataset contains POS tags , lemmas, morphological features, syntactic dependencies, predicate senses, and semantic roles annotations for 7 languages: Catalan, Chinese, Czech, English, German, Japanese,4 Spanish. |
Experiments | We first compare our models trained as a pipeline, using all available supervision (syntax, morphology, POS tags , lemmas) from the CoNLL-2009 data. |
Introduction | 0 Use of Brown clusters in place of POS tags for low-resource SRL. |
Related Work | (2012) limit their exploration to a small set of basic features, and included high-resource supervision in the form of lemmas, POS tags , and morphology available from the CoNLL 2009 data. |
Related Work | Our experiments also consider ‘longer’ pipelines that include earlier stages: a morphological analyzer, POS tagger , lemmatizer. |
Features | 0 Coordination In a coordinate structure, the two adj acent conjuncts usually agree with each other on POS tags and their span lengths. |
Features | Therefore, we add different features to capture POS tag and span length consistency in a coordinate structure. |
Features | 0 Span Length This feature captures the distribution of the binned span length of each POS tag . |
Introduction | When proposing a small move, i.e., sampling a head of the word, we can also jointly sample its POS tag from a set of alternatives provided by the tagger. |
Sampling-Based Dependency Parsing with Global Features | For instance, we can sample the POS tag , the dependency relation or morphology information. |
Sampling-Based Dependency Parsing with Global Features | POS correction scenario in which only the predicted POS tags are provided in the testing phase, while both gold and predicted tags are available for the training set. |
Sampling-Based Dependency Parsing with Global Features | We extend our model such that it jointly learns how to predict a parse tree and also correct the predicted POS tags for a better parsing performance. |
Abstract | In particular, we extend the monolingual infinite tree model (Finkel et al., 2007) to a bilingual scenario: each hidden state ( POS tag ) of a source-side dependency tree emits a source word together with its aligned target word, either jointly (joint model), or independently (independent model). |
Abstract | Evaluations of J apanese-to-English translation on the NTCIR-9 data show that our induced Japanese POS tags for dependency trees improve the performance of a forest-to-string SMT system. |
Introduction | However, dependency parsing, which is a popular choice for Japanese, can incorporate only shallow syntactic information, i.e., POS tags , compared with the richer syntactic phrasal categories in constituency parsing. |
Introduction | Figure 1: Examples of Existing Japanese POS Tags and Dependency Structures |
Introduction | If we could discriminate POS tags for two cases, we might improve the performance of a Japanese-to-English SMT system. |
Abstract | In terms of robustness, we try using different types of external data to increase lexical coverage, and find that simple POS tags have the most effect, increasing coverage on unseen data by up to 45%. |
Abstract | Even using vanilla POS tags we achieve some efficiency gains, but when using detailed lexical types as supertags we manage to halve parsing time with minimal loss of coverage or precision. |
Background | Supertagging is the process of assigning probable ‘supertags’ to words before parsing to restrict parser ambiguity, where a supertag is a tag that includes more specific information than the typical POS tags . |
Parser Restriction | In these experiments we look at two methods of restricting the parser, first by using POS tags and then using lexical types. |
Parser Restriction | We use TreeTagger (Schmid, 1994) to produce POS tags and then open class words are restricted if the POS tagger assigned a tag with a probability over a certain threshold. |
Parser Restriction | Table 1: Results obtained when restricting the parser lexicon according to the POS tag , where words are restricted according to a threshold of POS probabilities. |
Background | The best performing model interpolates a word trigram model with a trigram model that chains a POS model with a supertag model, where the POS model conditions on the previous two POS tags, and the supertag model conditions on the previous two POS tags as well as the current one. |
The Approach | Clark (2002) notes in his parsing experiments that the POS tags of the surrounding words are highly informative. |
The Approach | As discussed below, a significant gain in hypertagging accuracy resulted from including features sensitive to the POS tags of a node’s parent, the node itself, and all of its arguments and modifiers. |
The Approach | Predicting these tags requires the use of a separate POS tagger , which operates in a manner similar to the hypertagger itself, though exploiting a slightly different set of features (e. g., including features corresponding to the four-character prefixes and suffixes of rare logical predication names). |
A Generative PCFG Model | The entries in such a lexicon may be thought of as meaningful surface segments paired up with their PoS tags I, = (3,, pi), but note that a surface segment 3 need not be a space-delimited token. |
A Generative PCFG Model | (1996) who consider the kind of probabilities a generative parser should get from a PoS tagger , and concludes that these should be P(w|t) “and nothing fancier”.3 In our setting, therefore, the Lattice is not used to induce a probability distribution on a linear context, but rather, it is used as a common-denominator of state-indexation of all segmentations possibilities of a surface form. |
Model Preliminaries | A Hebrew surface token may have several readings, each of which corresponding to a sequence of segments and their corresponding PoS tags . |
Model Preliminaries | We refer to different readings as different analyses whereby the segments are deterministic given the sequence of PoS tags . |
Model Preliminaries | We refer to a segment and its assigned PoS tag as a lexeme, and so analyses are in fact sequences of lexemes. |
Modern Hebrew Structure | Such discrepancies can be aligned via an intermediate level of PoS tags . |
Modern Hebrew Structure | PoS tags impose a unique morphological segmentation on surface tokens and present a unique valid yield for syntactic trees. |
Previous Work on Hebrew Processing | Tsarfaty (2006) used a morphological analyzer (Segal, 2000), a PoS tagger (Bar-Haim et al., 2005), and a general purpose parser (Schmid, 2000) in an integrated framework in which morphological and syntactic components interact to share information, leading to improved performance on the joint task. |
Background | To perform segmentation and tagging simultaneously in a uniform framework, according to Ng and Low (2004), the tag is composed of a word boundary part, and a POS part, e. g., “B _N N” refers to the first character in a word with POS tag “NN”. |
Background | As for the POS tag , we shal-1 use the 33 tags in the Chinese tree bank. |
Introduction | The traditional way of segmentation and tagging is performed in a pipeline approach, first segmenting a sentence into words, and then assigning each word a POS tag . |
Introduction | The pipeline approach is very simple to implement, but frequently causes error propagation, given that wrong seg-mentations in the earlier stage harm the subsequent POS tagging (Ng and Low, 2004). |
Introduction | The joint approaches of word segmentation and POS tagging (joint S&T) are proposed to resolve these two tasks simultaneously. |
Method | In fact, the sparsity is also a common phenomenon among character-based CWS and POS tagging . |
Method | The performance measurement indicators for word segmentation and POS tagging (joint S&T) are balance F-score, F = 2PIU(P+R), the harmonic mean of precision (P) and recall (R), and out-of-vocabulary recall (OOV—R). |
Related Work | There are few explorations of semi-supervised approaches for CWS or POS tagging in previous works. |
Character-based Chinese Parsing | To produce character-level trees for Chinese NLP tasks, we develop a character-based parsing model, which can jointly perform word segmentation, POS tagging and phrase-structure parsing. |
Character-based Chinese Parsing | We make two extensions to their work to enable joint segmentation, POS tagging and phrase-structure parsing from the character level. |
Character-based Chinese Parsing | First, we split the original SHIFT action into SHIFT—SEPARATE (t) and SHIFT—APPEND, which jointly perform the word segmentation and POS tagging tasks. |
Introduction | Compared to a pipeline system, the advantages of a joint system include reduction of error propagation, and the integration of segmentation, POS tagging and syntax features. |
Introduction | To analyze word structures in addition to phrase structures, our character-based parser naturally performs joint word segmentation, POS tagging and parsing jointly. |
Introduction | We extend their shift-reduce framework, adding more transition actions for word segmentation and POS tagging , and defining novel features that capture character information. |
Word Structures and Syntax Trees | They made use of this information to help joint word segmentation and POS tagging . |
Word Structures and Syntax Trees | In particular, we mark the original nodes that represent POS tags in CTB-style trees with “-t”, and insert our word structures as unary subnodes of the “-t” nodes. |
Abstract | In this paper we present an unsupervised algorithm for identifying verb arguments, where the only type of annotation required is POS tagging . |
Algorithm | This parser is unique in that it is able to induce a bracketing (unlabeled parsing) from raw text (without even using POS tags ) achieving state-of-the-art results. |
Algorithm | The only type of supervised annotation we use is POS tagging . |
Algorithm | We use the taggers MX-POST (Ratnaparkhi, 1996) for English and Tree-Tagger (Schmid, 1994) for Spanish, to obtain POS tags for our model. |
Introduction | A standard SRL algorithm requires thousands to dozens of thousands sentences annotated with POS tags , syntactic annotation and SRL annotation. |
Introduction | Rasooli and Faili (2012) and Bisk and Hockenmaier (2012) made some efforts to boost the verbocentricity of the inferred structures; however, both of the approaches require manual identification of the POS tags marking the verbs, which renders them useless when unsupervised POS tags are employed. |
Related Work | Our dependency model contained a submodel which directly prioritized subtrees that form reducible sequences of POS tags . |
Related Work | Reducibility scores of given POS tag sequences were estimated using a large corpus of Wikipedia articles. |
Related Work | The weakness of this approach was the fact that longer sequences of POS tags are very sparse and no reducibility scores could be estimated for them. |
STOP-probability estimation | Hereinafter, Psfifzxch, dir) denotes the STOP-probability we want to estimate from a large corpus; ch is the head’s POS tag and dir is the direction in which the STOP probability is estimated. |
STOP-probability estimation | For each POS tag 0;, in the given corpus, we first compute its left and right “raw” score Sst0p(ch, left) and Sst0p(ch, right) as the relative number of times a word with POS tag 0;, was in the first (or last) position in a reducible sequence found in the corpus. |
STOP-probability estimation | Their main purpose is to sort the POS tags according to their “reducibility”. |
Abstract | In this paper, we combine easy-first dependency parsing and POS tagging algorithms with beam search and structured perceptron. |
Experiments | We use the standard split for dependency parsing and the split used by (Ratnaparkhi, 1996) for POS tagging . |
Experiments | For dependency parsing, POS tags of the training set are generated using 10-fold jackknifing. |
Experiments | For dependency parsing, we assume gold segmentation and POS tags for the input. |
Introduction | The proposed solution is general and can also be applied to other algorithms that exhibit spurious ambiguity, such as easy-first POS tagging (Ma et al., 2012) and transition-based dependency parsing with dynamic oracle (Goldberg and Nivre, 2012). |
Introduction | In this paper, we report experimental results on both easy-first dependency parsing and POS tagging (Ma et al., 2012). |
Introduction | We show that both easy-first POS tagging and dependency parsing can be improved significantly from beam search and global learning. |
Training | wp denotes the head word of p, tp denotes the POS tag of wp. |
Algorithm | We induce number of POS tags of a word type at this step. |
Algorithm | Furthermore, they will have the same POS tags . |
Experiments | As a result, this method inaccurately induces POS tags for the occurrences of word types with high gold tag perplexity. |
Experiments | In other words, we assume that the number of different POS tags of each word type is equal to 2. |
Introduction | part-of-speech or POS tagging ) is an important preprocessing step for many natural language processing applications because grammatical rules are not functions of individual words, instead, they are functions of word categories. |
Introduction | Unlike supervised POS tagging systems, POS induction systems make use of unsupervised methods. |
Introduction | Type based methods suffer from POS ambiguity because one POS tag is assigned to each word type. |
Experiments | We investigate the use of smoothing in two test systems, conditional random field (CRF) models for POS tagging and chunking. |
Experiments | Our baseline CRF system for POS tagging follows the model described by Lafferty et al. |
Experiments | In addition to the transition, word-level, and orthographic features, we include features relating automatically-generated POS tags and the chunk labels. |
Introduction | effects of our smoothing techniques on two sequence-labeling tasks, POS tagging and chunking, to answer the following: I. |
Introduction | Our best smoothing technique improves a POS tagger by 11% on OOV words, and a chunker by an impressive 21% on OOV words. |
Abstract | First, to resolve the error propagation problem of the traditional pipeline approach, we incorporate POS tagging into the syntactic parsing process. |
Introduction | First, POS tagging is typically performed separately as a preliminary step, and POS tagging errors will propagate to the parsing process. |
Introduction | This problem is especially severe for languages where the POS tagging accuracy is relatively low, and this is the case for Chinese where there are fewer contextual clues that can be used to inform the tagging process and some of the tagging decisions are actually influenced by the syntactic structure of the sentence. |
Introduction | First, we integrate POS tagging into the parsing process and jointly optimize these two processes simultaneously. |
Joint POS Tagging and Parsing with Nonlocal Features | To address the drawbacks of the standard transition-based constituent parsing model (described in Section 1), we propose a model to jointly solve POS tagging and constituent parsing with nonlocal features. |
Joint POS Tagging and Parsing with Nonlocal Features | 3.1 Joint POS Tagging and Parsing |
Joint POS Tagging and Parsing with Nonlocal Features | POS tagging is often taken as a preliminary step for transition-based constituent parsing, therefore the accuracy of POS tagging would greatly affect parsing performance. |
Transition-based Constituent Parsing | Figure 1: Two constituent trees for an example sentence wowlwg with POS tags abc. |
Transition-based Constituent Parsing | For example, in Figure l, for the input sentence wowlwg and its POS tags abc, our parser can construct two parse trees using action sequences given below these trees. |
Abstract | We test the efficacy of this method in the context of Chinese word segmentation and part-of-speech tagging, where no segmentation and POS tagging standards are widely accepted due to the lack of morphology in Chinese. |
Experiments | For example, currently, most Chinese constituency and dependency parsers are trained on some version of CTB, using its segmentation and POS tagging as the defacto standards. |
Experiments | Therefore, we expect the knowledge adapted from PD will lead to more precise CTB-style segmenter and POS tagger , which would in turn reduce the error propagation to parsing (and translation). |
Introduction | Figure l: Incompatible word segmentation and POS tagging standards between CTB (upper) and People’s Daily (below). |
Introduction | Our experiments show that adaptation from PD to CTB results in a significant improvement in segmentation and POS tagging , with error reductions of 30.2% and 14%, respectively. |
Segmentation and Tagging as Character Classification | While in Joint S&T, each word is further annotated with a POS tag: |
Segmentation and Tagging as Character Classification | Where tk(l<: = 1..m) denotes the POS tag for the word Cek_1+1;ek. |
Segmentation and Tagging as Character Classification | In Ng and Low (2004), Joint S&T can also be treated as a character classification problem, Where a boundary tag is combined with a POS tag in order to give the POS information of the word containing these characters. |
Abstract | In this paper, we present a discriminative word-character hybrid model for joint Chinese word segmentation and POS tagging . |
Background | In joint word segmentation and the POS tagging process, the task is to predict a path |
Background | p is its POS tag , and a “7%” symbol denotes the number of elements in each variable. |
Background | words found in the system’s word dictionary, have regular POS tags . |
Introduction | Word segmentation and POS tagging results are required as inputs to other NLP tasks, such as phrase chunking, dependency parsing, and machine translation. |
Introduction | Word segmentation and POS tagging in a joint process have received much attention in recent research and have shown improvements over a pipelined fashion (Ng and Low, 2004; Nakagawa and Uchimoto, 2007; Zhang and Clark, 2008; Jiang et al., 2008a; Jiang et al., 2008b). |
Introduction | In joint word segmentation and the POS tagging process, one serious problem is caused by unknown words, which are defined as words that are not found in a training corpus or in a sys- |
Policies for correct path selection | We can directly estimate the statistics of known words from an annotated corpus where a sentence is already segmented into words and assigned POS tags . |
Policies for correct path selection | 3We consider a word and its POS tag a single entry. |
Abstract | We show for both English POS tagging and Chinese word segmentation that with proper representation, large number of deterministic constraints can be learned from training examples, and these are useful in constraining probabilistic inference. |
Abstract | ”assign label 75 to word w” for POS tagging . |
Abstract | In this work, we explore deterministic constraints for two fundamental NLP problems, English POS tagging and Chinese word segmentation. |
Abstract | We propose the first tagset designed for the task of character-level POS tagging . |
Abstract | We propose a method that performs character-level POS tagging jointly with word segmentation and word-level POS tagging . |
Character-level POS Tagset | We propose a tagset for the task of character-level POS tagging . |
Chinese Morphological Analysis with Character-level POS | Previous studies have shown that jointly processing word segmentation and POS tagging is preferable to pipeline processing, which can propagate errors (Nakagawa and Uchimoto, 2007; Kruengkrai et a1., 2009). |
Chinese Morphological Analysis with Character-level POS | Baseline features: For word-level nodes that represent known words, we use the symbols w, p and l to denote the word form, POS tag and length of the word, respectively. |
Chinese Morphological Analysis with Character-level POS | Proposed features: For word-level nodes, the function CPpal-T (w) returns the pair of the char-acter-level POS tags of the first and last characters of w, and CPau(w) returns the sequence of character-level POS tags of w. If either the pair or the sequence of character-level P08 is ambiguous, which means there are multiple paths in the sub-lattice of the word-level node, then the values on the current best path (with local context) during the Viterbi search will be returned. |
Evaluation | To evaluate our proposed method, we have conducted two sets of experiments on CTB5: word segmentation, and joint word segmentation and word-level POS tagging . |
Evaluation | The results of the word segmentation experiment and the joint experiment of segmentation and POS tagging are shown in Table 5(a) and Table 5(b), respectively. |
Introduction | ith Character-level POS Tagging |
Introduction | We propose the first tagset designed for the task of character-level POS tagging , based on which we manually annotate the entire CTB5. |
Introduction | We propose a method that performs character-level POS tagging jointly with word segmentation and word-level POS tagging . |
Experiments | But only sentence boundaries, POS tags and NER labels were kept as the annotation of the corpus. |
Introduction | IR can easily make use of this knowledge: for a when question, IR retrieves sentences with tokens labeled as DATE by NER, or POS tagged as CD. |
Introduction | Moreover, our approach extends easily beyond fixed answer types such as named entities: we are already using POS tags as a demonstration. |
Method | We let the trained QA system guide the query formulation when performing coupled retrieval with Indri (Strohman et al., 2005), given a corpus already annotated with POS tags and NER labels. |
Method | Since NER and POS tags are not lexicalized they accumulate many more counts (i.e. |
Method | NER Types First We found NER labels better indicators of expected answer types than POS tags . |
A Latent Variable Parser | The Berkeley parser has been applied to the TuBaD/Z corpus in the constituent parsing shared task of the ACL-2008 Workshop on Parsing German (Petrov and Klein, 2008), achieving an F1-measure of 85.10% and 83.18% with and without gold standard POS tags respectively2. |
Experiments | As part of our experiment design, we investigated the effect of providing gold POS tags to the parser, and the effect of incorporating edge labels into the nonterminal labels for training and parsing. |
Experiments | In all cases, gold annotations which include gold POS tags were used when training the parser. |
Experiments | This table shows the results after five iterations of grammar modification, parameterized over whether we provide gold POS tags for parsing, and edge labels for training and parsing. |
Introduction | the unlexicalized, latent variable-based Berkeley Innser(PeUInIetal,2006).VVfihoutanylanguage-or model-dependent adaptation, we achieve state-of-the-art results on the TuBa-D/Z corpus (Telljo-hann et al., 2004), with a Fl-measure of 95.15% using gold POS tags . |
Introduction | It is found that the three techniques perform about equally well, with F1 of 94.1% using POS tags from the TnT tagger, and 98.4% with gold tags. |
Abstract | Experiments on three tasks (POS tagging, joint POS tagging and chunking, and supertagging) show that the new algorithm is several orders of magnitude faster than the basic Viterbi and a state-of-the-art algorithm, CARPEDIEM (Esposito and Radicioni, 2009). |
Introduction | Now they are indispensable in a wide range of NLP tasks including chunking, POS tagging , NER and so on (Sha and Pereira, 2003; Tsuruoka and Tsujii, 2005; Lin and Wu, 2009). |
Introduction | For example, there are more than 40 and 2000 labels in POS tagging and supertagging, respectively (Brants, 2000; Matsuzaki et al., 2007). |
Introduction | As we shall see later, we need over 300 labels to reduce joint POS tagging and chunking into the single sequence labeling problem. |
Experimental Setup | These datasets include manually annotated dependency trees, POS tags and morphological information. |
Experimental Setup | In contrast, assume we take the crossproduct of the auxiliary word vector values, POS tags and lexical items of a word and its context, and add the crossed values into a normal model (in gbhmm). |
Introduction | This low dimensional syntactic abstraction can be thought of as a proxy to manually constructed POS tags . |
Introduction | For instance, on the English dataset, the low-rank model trained without POS tags achieves 90.49% on first-order parsing, while the baseline gets 86.70% if trained under the same conditions, and 90.58% if trained with 12 core POS tags . |
Problem Formulation | pos, form, lemma and morph stand for the fine POS tag , word form, word lemma and the morphology feature (provided in CoNLL format file) of the current word. |
Problem Formulation | For example, pos-p means the POS tag to the left of the current word in the sentence. |
Problem Formulation | Other possible features include, for example, the label of the arc h —> m, the POS tags between the head and the modifier, boolean flags which indicate the occurence of in-between punctutations or conjunctions, etc. |
Results | The rationale is that given all other features, the model would induce representations that play a similar role to POS tags . |
Results | Table 4: The first three columns show parsing results when models are trained without POS tags . |
Results | the performance of a parser trained with 12 Core POS tags . |
Character-Level Dependency Tree | system, each word is initialized by the action SHW with a POS tag , before being incrementally modified by a sequence of intra-word actions, and finally being completed by the action PW. |
Character-Level Dependency Tree | L and R denote the two elements over which the dependencies are built; the subscripts lcl and r01 denote the leftmost and rightmost children, respectively; the subscripts 102 and r02 denote the second leftmost and second rightmost children, respectively; w denotes the word; t denotes the POS tag ; 9 denotes the head character; ls_w and w denote the smallest left and right subwords respectively, as shown in Figure 2. |
Character-Level Dependency Tree | Since the first element of the queue can be shifted onto the stack by either SH or AR, it is more difficult to assign a POS tag to each word by using a single action. |
Conversion Process | Since we are applying these to CCGbank NP structures rather than the Penn Treebank, the POS tag based heuristics are sufficient to determine heads accurately. |
Conversion Process | Some POS tags require special behaviour. |
Conversion Process | Accordingly, we do not alter tokens with POS tags of DT and PRP s. Instead, their sibling node is given the category N and their parent node is made the head. |
Experiments | Table 3: Parsing results with gold-standard POS tags |
Experiments | Table 4: Parsing results with automatic POS tags |
Experiments | We have also experimented with using automatically assigned POS tags . |
NER features | Many of these features generalise the head words and/or POS tags that are already part of the feature set. |
NER features | There are already features in the model describing each combination of the children’s head words and POS tags , which we extend to include combinations with |
Experimental Setup | The first stage, ASR, yields an automatic transcription, which is followed by the POS tagging stage. |
Experimental Setup | The steps for automatic assessment of overall proficiency follow an analogous process (either including the POS tagger or not), depending on the objective measure being evaluated. |
Experimental Setup | 5.3.2 POS tagger |
Related Work | The idea of capturing differences in POS tag distributions for classification has been explored in several previous studies. |
Related Work | In the area of text-genre classification, POS tag distributions have been found to capture genre differences in text (Feldman et al., 2009; Marin et al., 2009); in a language testing context, it has been used in grammatical error detection and essay scoring (Chodorow and Leacock, 2000; Tetreault and Chodorow, 2008). |
Shallow-analysis approach to measuring syntactic complexity | Consider the two sentence fragments below taken from actual responses (the bigrams of interest and their associated POS tags are boldfaced). |
Data and Tools | The set of POS tags needs to be consistent across languages and treebanks. |
Data and Tools | For this reason we use the universal POS tag set of Petrov et al. |
Data and Tools | POS tags are not available for parallel data in the Europarl and Kaist corpus, so we need to pro- |
Parsing experiments | For example, fdep contains lexicalized “in-between” features that depend on the head and modifier words as well as a word lying in between the two; in contrast, previous work has generally defined in-between features for POS tags only. |
Parsing experiments | 8A3 in previous work, English evaluation ignores any token whose gold-standard POS tag is one of { ‘ ‘ ' ' : |
Parsing experiments | First, we define 4-gram features that characterize the four relevant indices using words and POS tags; examples include POS 4-grams and mixed 4-grams with one word and three POS tags . |
Related work | These indices allow the use of arbitrary features predicated on the position of the grandparent (e. g., word identity, POS tag, contextual POS tags ) without affecting the asymptotic complexity of the parsing algorithm. |
Clustering phrase pairs directly using the K-means algorithm | Using a scheme based on source and target phrases with accounting for phrase size, with 36 word classes (the size of the Penn English POS tag set) for both languages, yields a grammar with (36 + 2 >|< 362 )2 = 6.9m nonterminal labels. |
Conclusion and discussion | Crucially, our methods only rely on “shallow” lexical tags, either generated by POS taggers or by automatic clustering of words into classes. |
Conclusion and discussion | Using automatically obtained word clusters instead of POS tags yields essentially the same results, thus making our methods applicable to all languages pairs with parallel corpora, whether syntactic resources are available for them or not. |
Conclusion and discussion | On the other extreme, the clustering based approach labels phrases based on the contained words alone.8 The POS grammar represents an intermediate point on this spectrum, since POS tags can change based on surrounding words in the sentence; and the position of the K-means model depends on the influence of the phrase contexts on the clustering process. |
Experiments | The source and target language parses for the syntax-augmented grammar, as well as the POS tags for our POS-based grammars were generated by the Stanford parser (Klein and Manning, 2003). |
Experiments | Our approach, using target POS tags (‘POS-tgt (no phr. |
Experiments | , 36 (the number Penn treebank POS tags , used for the ‘POS’ models, is 36).6 For ‘Clust’, we see a comfortably wide plateau of nearly-identical scores from N = 7,. . |
Hard rule labeling from word classes | We use the simple term ‘tag’ to stand for any kind of word-level analysis—a syntactic, statistical, or other means of grouping word types or tokens into classes, possibly based on their position and context in the sentence, POS tagging being the most obvious example. |
Related work | (2007) improve the statistical phrase-based MT model by injecting supertags, lexical information such as the POS tag of the word and its subcategorization information, into the phrase table, resulting in generalized phrases with placeholders in them. |
Approach | These targeted morphological features are effective during LP because words that share them are much more likely to actually share POS tags . |
Approach | Since the LP graph contains a node for each corpus token, and each node is labeled with a distribution over POS tags , the graph provides a corpus of sentences labeled with noisy tag distributions along with an expanded tag dictionary. |
Data | enized and labeled with POS tags by two linguistics graduate students, each of which was studying one of the languages. |
Data | The KIN and MLG data have 12 and 23 distinct POS tags , respectively. |
Data | The PTB uses 45 distinct POS tags . |
Experiments3 | Moreover, since large gains in accuracy can be achieved by spending a small amount of time just annotating word types with POS tags , we are led to conclude that time should be spent annotating types or tokens instead of developing an FST. |
Introduction | Haghighi and Klein (2006) develop a model in which a POS-tagger is learned from a list of POS tags and just three “prototype” word types for each tag, but their approach requires a vector space to compute the distributional similarity between prototypes and other word types in the corpus. |
Introduction | We evaluate the effectiveness of our method by using linear-chain conditional random fields (CRFs) and three traditional NLP tasks, namely, text chunking (shallow parsing), named entity recognition, and POS tagging . |
Log-Linear Models | The model is used for a variety of sequence labeling tasks such as POS tagging , chunking, and named entity recognition. |
Log-Linear Models | We evaluate the effectiveness our training algorithm using linear-chain CRF models and three NLP tasks: text chunking, named entity recognition, and POS tagging . |
Log-Linear Models | The features used in this experiment were unigrams and bigrams of neighboring words, and unigrams, bigrams and trigrams of neighboring POS tags . |
Experimental Setup | Relations were extracted using regular expressions over the output of a POS tagger and an NP chunker. |
Experimental Setup | We use a Maximum Entropy POS Tagger , trained on the Penn Treebank, and the WordNet lemmatizer, both implemented within the NLTK package (Loper and Bird, 2002). |
Experimental Setup | To obtain a coarse-grained set of POS tags , we collapse the tag set to 7 categories: nouns, verbs, adjectives, adverbs, prepositions, the word “to” and a category that includes all other words. |
Our Proposal: A Latent LC Approach | 1We use a POS tagger to identify content words. |
Our Proposal: A Latent LC Approach | In addition, we use POS-based features that encode the most frequent POS tag for the word lemma and the second most frequent POS tag (according to R). |
Our Proposal: A Latent LC Approach | Information about the second most frequent POS tag can be important in identifying light verb constructions, such as “take a swim” or “give a smile”, where the object is derived from a verb. |
Abstract | The method is almost free of linguistic resources (except POS tags ), and requires no elaborated linguistic rules. |
Conclusion | almost knowledge-free (except POS tags ) framework. |
Conclusion | The method is almost free of linguistic resources (except POS tags ), and does not rely on elaborated linguistic rules. |
Introduction | This framework is fully unsupervised and purely data-driven, and requires very lightweight linguistic resources (i.e., only POS tags ). |
Methodology | In order to obtain lexical patterns, we can define regular expressions with POS tags 2 and apply the regular expressions on POS tagged texts. |
Methodology | 2Such expressions are very simple and easy to write because we only need to consider POS tags of adverbial and auxiliary word. |
Methodology | Our algorithm is in spirit to double propagation (Qiu et al., 2011), however, the differences are apparent in that: firstly, we use very lightweight linguistic information (except POS tags ); secondly, our major contributions are to propose statistical measures to address the following key issues: first, to measure the utility of lexical patterns; second, to measure the possibility of a candidate word being a new word. |
Dependency Parsing: Baseline | pas POS tag of word |
Dependency Parsing: Baseline | cpos] coarse POS: the first letter of POS tag of word |
Dependency Parsing: Baseline | cposZ coarse POS: the first two POS tags of word |
Exploiting the Translated Treebank | Chinese word should be strictly segmented according to the guideline before POS tags and dependency relations are annotated. |
Exploiting the Translated Treebank | The difference is, rootscore counts for the given POS tag occurring as ROOT, and pairscore counts for two POS tag combination occurring for a dependent relationship. |
Treebank Translation and Dependency Transformation | Bind POS tag and dependency relation of a word with itself; 2. |
Treebank Translation and Dependency Transformation | After the target sentence is generated, the attached POS tags and dependency information of each English word will also be transferred to each corresponding Chinese word. |
Ad hoc rule detection | Units of comparison To determine similarity, one can compare dependency relations, POS tags , or both. |
Ad hoc rule detection | Thus, we use the pairs of dependency relations and POS tags as the units of comparison. |
Additional information | One method which does not have this problem of overflagging uses a “lexicon” of POS tag pairs, examining relations between POS, irrespective of position. |
Evaluation | We use the gold standard POS tags for all experiments. |
Evaluation | For example, the parsed rule TA —> IG:IG RO has a correct dependency relation (IG) between the POS tags IG and its head RO, yet is assigned a whole rule score of 2 and a bigram score of 20. |
Evaluation | This is likely due to the fact that Alpino has the smallest label set of any of the corpora, with only 24 dependency labels and 12 POS tags (cf. |
Annotator disagreements across domains and languages | In this study, we had between 2-10 individual annotators with degrees in linguistics annotate different kinds of English text with POS tags , e.g., newswire text (PTB WSJ Section 00), transcripts of spoken language (from a database containing transcripts of conversations, Talkbankl), as well as Twitter posts. |
Annotator disagreements across domains and languages | We instructed annotators to use the 12 universal POS tags of Petrov et al. |
Annotator disagreements across domains and languages | 2Experiments with variation 71- grams on WSJ (Dickinson and Meurers, 2003) and the French data lead us to estimate that the fine-to-coarse mapping of POS tags disregards about 20% of observed tag-pair confusion types, most of which relate to fine-grained verb and noun distinctions, e. g. past participle versus past in “[..] criminal lawyers speculated/VBD vs. VBN that [..]”. |
Related work | (2014) use small samples of doubly-annotated POS data to estimate annotator reliability and show how those metrics can be implemented in the loss function when inducing POS taggers to reflect confidence we can put in annotations. |
Related work | They show that not biasing the theory towards a single annotator but using a cost-sensitive learning scheme makes POS taggers more robust and more applicable for downstream tasks. |
Conclusion | WOE can run in two modes: a CRF extractor (WOEPOS) trained with shallow features like POS tags ; a pattern classfier (WOEparse) learned from dependency path patterns. |
Related Work | Shallow or Deep Parsing: Shallow features, like POS tags , enable fast extraction over large-scale corpora (Davidov et al., 2007; Banko et al., 2007). |
Wikipedia-based Open IE | NLP Annotation: As we discuss fully in Section 4 (Experiments), we consider several variations of our system; one version, WOEparse, uses parser-based features, while another, WOEPOS , uses shallow features like POS tags , which may be more quickly computed. |
Wikipedia-based Open IE | Depending on which version is being trained, the preprocessor uses OpenNLP to supply POS tags and NP—chunk annotations — or uses the Stanford Parser to create a dependency parse. |
Wikipedia-based Open IE | We learn two kinds of extractors, one (WOEparse) using features from dependency-parse trees and the other (WOEPOS) limited to shallow features like POS tags . |
Conclusion | Our approach was superior to previous approaches across 12 multilingual cross-domain POS tagging datasets, with an average error reduction of 4% over a structured perceptron baseline. |
Experiments | POS tagging accuracy is known to be very sensitive to domain shifts. |
Experiments | (2011) report a POS tagging accuracy on social media data of 84% using a tagger that ac-chieves an accuracy of about 97% on newspaper data. |
Experiments | While POS taggers can often recover the part of speech of a previously unseen word from the context it occurs in, this is harder than for previously seen words. |
Introduction | This paper considers the POS tagging problem, i.e. |
Introduction | Several authors have noted how POS tagging performance is sensitive to cross-domain shifts (Blitzer et al., 2006; Daume III, 2007; Jiang and Zhai, 2007), and while most authors have assumed known target distributions and pool unlabeled target data in order to automatically correct cross-domain bias (Jiang and Zhai, 2007; Foster et al., 2010), methods such as feature bagging (Sutton et al., 2006), learning with random adversaries (Globerson and Roweis, 2006) and LOO-regularization (Dekel and Shamir, 2008) have been proposed to improve performance on unknown target distributions. |
Introduction | Section 4 presents experiments on POS tagging and discusses how to evaluate cross-domain performance. |
Learning Algorithm | Features (1—5) are extracted for each role and capture their presence, first POS tag and word, length and position within the roles present for that instance. |
Learning Algorithm | Al—postag is extracted for the following POS tags : DT, JJ, PRP, CD, RB, VB and WP; Al—keywo rd for the following words: any, anybody, anymore, anyone, anything, anytime, anywhere, certain, enough, fall, many, much, other, some, specifics, too and until. |
Learning Algorithm | These lists of POS tags and keywords were extracted after manual examination of training examples and aim at signaling whether this role correspond to the focus. |
Evaluation | Table 5 shows the result of the disambiguation when we only take into account the POS tag of the unknown tokens. |
Introduction | On the one hand, this tagset is much larger than the largest tagset used in English (from 17 tags in most unsupervised POS tagging experiments, to the 46 tags of the WSJ corpus and the about 150 tags of the LOB corpus). |
Introduction | On average, each token in the 42M corpus is given 2.7 possible analyses by the analyzer (much higher than the average 1.41 POS tag ambiguity reported in English (Dermatas and Kokkinakis, 1995)). |
Previous Work | At the word level, a segmented word is attached to a POS, where the character model is based on the observed characters and their classification: Begin of word, In the middle of a word, End of word, the character is a word itself S. They apply Baum-Welch training over a segmented corpus, where the segmentation of each word and its character classification is observed, and the POS tagging is ambiguous. |
Previous Work | (of all words in a given sentence) and the POS tagging (of the known words) is based on a Viterbi search over a lattice composed of all possible word segmentations and the possible classifications of all observed characters. |
Previous Work | They report a very slight improvement on Hebrew and Arabic supervised POS taggers . |
Experiments | predicate cc in w (e.g., (Boston, Boston)), and (iii) predicates for each POS tag in {11, NN, NNS} (e.g., (JJ, size), (JJ, area), etc. |
Experiments | We also define an augmented lexicon L+ which includes a prototype word cc for each predicate appearing in (iii) above (e.g., (large, size)), which cancels the predicates triggered by :c’s POS tag . |
Experiments | SEMRESP requires a lexicon of 1.42 words per non-value predicate, Word-Net features, and syntactic parse trees; DCS requires only words for the domain-independent predicates (overall, around 0.5 words per non-value predicate), POS tags , and very simple indicator features. |
Experiments of Grammar Formalism Conversion | (2008) used POS tag information, dependency structures and dependency tags in test set for conversion. |
Experiments of Grammar Formalism Conversion | Similarly, we used POS tag information in the test set to restrict search space of the parser for generation of better N-best parses. |
Experiments of Parsing | CDT consists of 60k Chinese sentences, annotated with POS tag information and dependency structure information (including 28 P08 tags, and 24 dependency tags) (Liu et al., 2006). |
Experiments of Parsing | We did not use POS tag information as inputs to the parser in our conversion method due to the difficulty of conversion from CDT POS tags to CTB POS tags . |
Experiments of Parsing | We used the POS tagged People Daily corpus9 (Jan. l998~Jun. |
Our Two-Step Solution | ” (a preposition, with “BA” as its POS tag in CTB), and the head of IP-OBJ is 3% [El ” . |
Introduction | - Part-of-speech (POS) tags and morphological features: POS tags indicate (or counter-indicate) the possible presence of a named entity at word level or at word sequence level. |
Related Work | Benajiba and Rosso (2007) improved their system by incorporating POS tags to improve NE boundary detection. |
Related Work | Benajiba and Rosso (2008) used CRF sequence labeling and incorporated many language specific features, namely POS tagging , base-phrase chunking, Arabic tokenization, and adjectives indicating nationality. |
Related Work | Using POS tagging generally improved recall at the expense of precision, leading to overall improvements in F-measure. |
Introduction | For the chunker and POS tagger , the drop-offs are less severe: 94.89 to 91.73, and 97.36 to 94.73. |
Introduction | We use an open source CRF software package to implement our CRF models.1 We use words, POS tags , chunk labels, and the predicate label at the preceding and following nodes as features for our Baseline system. |
Introduction | 0 P05 before, after predicate: the POS tag of the tokens immediately preceding and following the predicate |
Experiments | Performance of POS tagging is an important factor in our methods because they are based on wordfl’OS sequences. |
Experiments | Existing POS taggers might not perform well on nonnative English texts because they are normally developed to analyze native English texts. |
Methods | In this language model, content words in n-grams are replaced with their corresponding POS tags . |
Methods | Finally, words are replaced with their corresponding POS tags; for the following words, word tokens are used as their corresponding POS tags : coordinating conjunctions, determiners, prepositions, modals, predeterminers, possessives, pronouns, question adverbs. |
Methods | At this point, the special POS tags BOS and EOS are added at the beginning and end of each sentence, respectively. |
Dependency parsing experiments | The only features that are not cached are the ones that include contextual POS tags , since their miss rate is relatively high. |
Dependency parsing for machine translation | o a predicted POS tag tj; o a dependency score sj. |
Dependency parsing for machine translation | We write h-word, h-pos, m-word, m-pos to refer to head and modifier words and POS tags , and append a numerical value to shift the word offset either to the left or to the right (e.g., h-pos+1 is the POS to the right of the head word). |
Dependency parsing for machine translation | It is quite similar to the McDonald (2005a) feature set, except that it does not include the set of all POS tags that appear between each candidate head-modifier pair (i , j). |
Evaluation | The part-of-speech tags in all datasets were replaced with the universal POS tags of Petrov et al. |
Model Transfer | This may have a negative effect on the performance of a monolingual model, since most part-of-speech tagsets are more fine-grained than the universal POS tags considered here. |
Model Transfer | Since the finer-grained POS tags often reflect more language-specific phenomena, however, they would only be useful for very closely related languages in the cross-lingual setting. |
Model Transfer | If Synt is enabled too, it also uses the POS tags of the argument’s parent, children and siblings. |
Related Work | Cross-lingual annotation projection (Yarowsky et al., 2001) approaches have been applied extensively to a variety of tasks, including POS tagging (Xi and Hwa, 2005; Das and Petrov, 2011), morphology segmentation (Snyder and Barzilay, 2008), verb classification (Merlo et al., 2002), mention detection (Zitouni and Florian, 2008), LFG parsing (Wroblewska and Frank, 2009), information extraction (Kim et al., 2010), SRL (Pado and Lapata, 2009; van der Plas et al., 2011; Annesi and Basili, 2010; Tonelli and Pi-anta, 2008), dependency parsing (Naseem et al., 2012; Ganchev et al., 2009; Smith and Eisner, 2009; Hwa et al., 2005) or temporal relation pre- |
Training | 2. surrounding: lslex (the previous word / 33:11), rslex (the next word/ fJJ-jill), lspos (lsleX’S POS tag), rspos (rsleX’S POS tag ), lsparent (lsleX’S parent), rsparent |
Training | 3. nonlocal: lanchorslex (thE: pl‘EDVlOLlS anchor’s word) , ranchorslex (the next an-ChOf’S word), lanchorspos (lanchorslex’s POS tag), ranchorspos (ranchorslex’s POS tag ). |
Training | Of mosl_int_spos (mosl_int_sleX’S POS tag ), mosl_ext_spos (mosl_ext_spos’S PQS tag), mosr_int_slex (the actual word. |
Abstract | Many would be better modeled by POS tag unigrams (with no word information) or by longer n-grams consisting of either words, POS tags , or a combination of the two. |
Abstract | Each n-gram is a sequence of words, POS tags or a combination of words and POS tags |
Abstract | or a POS tag . |
Experiments | We used the Tokyo tagger (Tsuruoka and Tsujii, 2005) to POS tag the English tokens, and generated parses using the first-order model of McDonald et al. |
Experiments | For Bulgarian we trained the Stanford POS tagger (Toutanova et al., 2003) on the Bul- |
Experiments | The Spanish Europarl data was POS tagged with the FreeLing language analyzer (Atserias et al., 2006). |
Experiments | Moreover, all POS tag features from English are duplicated with coarse-grained POS tags provided by CoNLL-X. |
Experiments | Before parsing, POS tags were assigned to the training set by using 20-way jackknifing. |
Experiments | For the automatic generation of POS tags , we used the domain-specific model of Choi and Palmer (2012a)’s tagger, which gave 97.5% accuracy on the English evaluation set (0.2% higher than Collins (2002)’s tagger). |
Related work | Bohnet and Nivre (2012) introduced a transition-based system that jointly performed POS tagging and dependency parsing. |
Experiments | Table 2: Domain Adaptation performance in F-measure on Semantic Tagging on Movie Target domain and POS tagging on QBanszuestionBank. |
Related Work and Motivation | In (Subramanya et al., 2010) an efficient iterative SSL method is described for syntactic tagging, using graph-based learning to smooth POS tag posteriors. |
Semi-Supervised Semantic Labeling | In (Subramanya et al., 2010), a new SSL method is described for adapting syntactic POS tagging of sentences in newswire articles along with search queries to a target domain of natural language (NL) questions. |
Semi-Supervised Semantic Labeling | The unlabeled POS tag posteriors are then smoothed using a graph-based learning algorithm. |
Semi-Supervised Semantic Labeling | Later, using Viterbi decoding, they select the l-best POS tag sequence, 33'? |
Decoding | where the first two terms are translation and language model probabilities, 6(0) is the target string (English sentence) for derivation 0, the third and forth items are the dependency language model probabilities on the target side computed with words and POS tags separately, De (0) is the target dependency tree of 0, the fifth one is the parsing probability of the source side tree TC(0) 6 FC, the ill(0) is the penalty for the number of ill-formed dependency structures in 0, and the last two terms are derivation and translation length penalties, respectively. |
Decoding | In order to alleviate the problem of data sparse, we also compute a dependency language model for POS tages over a dependency tree. |
Decoding | the POS tag information on the target side for each constituency-to-dependency rule. |
Experiments | We also store the POS tag information for each word in dependency trees, and compute two different dependency language models for words and POS tags in dependency tree separately. |
CD | CCM learns to predict a set of brackets over a string (in practice, a string of POS tags ) by jointly estimating constituent and distituent strings and contexts using an iterative EM-like procedure (though, as noted by Smith and Eisner (2004), CCM is deficient as a generative model). |
Introduction | Recent work (Headden III et al., 2009; Cohen and Smith, 2009; Hanig, 2010; Spitkovsky et al., 2010) has largely built on the dependency model with valence of Klein and Manning (2004), and is characterized by its reliance on gold-standard part-of—speech (POS) annotations: the models are trained on and evaluated using sequences of POS tags rather than raw tokens. |
Introduction | An exception which learns from raw text and makes no use of POS tags is the common cover links parser (CCL, Seginer 2007). |
Tasks and Benchmark | portantly, until recently it was the only unsupervised raw text constituent parser to produce results competitive with systems which use gold POS tags (Klein and Manning, 2002; Klein and Manning, 2004; Bod, 2006) — and the recent improved raw-text parsing results of Reichart and Rappoport (2010) make direct use of CCL without modification. |
Tasks and Benchmark | Finally, CCL outperforms most published POS-based models when those models are trained on unsupervised word classes rather than gold POS tags . |
Bilingual subtree constraints | For the source part, we replace nouns and verbs using their POS tags (coarse grained tags). |
Bilingual subtree constraints | For example, we have the subtree pair: “H %(society):2-ifl €%(fringe):0” and “fringes(W_2):0-of:1-society(W_1):2”, where “of” does not have a corresponding word, the POS tag of “fiéflsocietyY’ is N, and the POS tag of “53 é%(fringe)” is N. The source part of the rule becomes “N22-N20” and the target part becomes “W_2:0-of:1-W_1:2”. |
Experiments | For Chinese unannotated data, we used the XIN_CMN portion of Chinese Gigaword Version 2.0 (LDC2009T14) (Huang, 2009), which has approximately 311 million words whose segmentation and POS tags are given. |
Experiments | We used the MMA system (Kruengkrai et al., 2009) trained on the training data to perform word segmentation and POS tagging and used the Baseline Parser to parse all the sentences in the data. |
Experiments | The POS tags were assigned by the MXPOST tagger trained on training data. |
Learning Time Constraints | n—gram POS The 4—gram and 3-gram of POS tags that end with the year |
Previous Work | Kanhabua and Norvag (2008; 2009) extended this approach with the same model, but expanded its unigrams with POS tags , collocations, and tf-idf scores. |
Timestamp Classifiers | Word Classes: include only nouns, verbs, and adjectives as labeled by a POS tagger |
Timestamp Classifiers | on POS tags and tf-idf scores. |
Timestamp Classifiers | Typed Dependency POS: Similar to Typed Dependency, this feature uses POS tags of the dependency relation’s governor. |
Baselines | Following is a list of features adopted in the two baselines, for both BaselineC4'5 and BaselineSVM, > Basic features: first token and its part-of-speech (POS) tag of the focus candidate; the number of tokens in the focus candidate; relative position of the focus candidate among all the roles present in the sentence; negated verb and its POS tag of the negative expression; |
Baselines | > Syntactic features: the sequence of words from the beginning of the governing VP to the negated verb; the sequence of POS tags from the beginning of the governing VP to the negated verb; whether the governing VP contains a CC; whether the governing VP contains a RB. |
Baselines | > Semantic features: the syntactic label of semantic role A1; whether A1 contains POS tag DT, JJ, PRP, CD, RB, VB, and WP, as defined in Blanco and Moldovan (2011); whether A1 contains token any, anybody, anymore, anyone, anything, anytime, anywhere, certain, enough, full, many, much, other, some, specifics, too, and until, as defined in Blanco and Moldovan (2011); the syntactic label of the first semantic role in the sentence; the semantic label of the last semantic role in the sentence; the thematic role for AO/Al/AZ/A3/A4 of the negated predicate. |
Experiments | For both English and Chinese data, we used tenfold jackknifing (Collins, 2000) to automatically assign POS tags to the training data. |
Experiments | For English POS tagging, we adopted SVMTool, 3 and for Chinese POS tagging |
Experiments | we employed the Stanford POS tagger . |
Semi-supervised Parsing with Large Data | Word clusters are regarded as lexical intermediaries for dependency parsing (Koo et al., 2008) and POS tagging (Sun and Uszkoreit, 2012). |
Paraphrasing | Deletions Deleted lemma and POS tag |
Paraphrasing | £13,; j and ci/zj/ denote spans from a: and c. pos(:1:¢;j) and lemma(:1:i; j) denote the POS tag and lemma sequence of £13,; 3'. |
Paraphrasing | For a pair (at, c), we also consider as candidate associations the set [3 (represented implicitly), which contains token pairs (510,, ci/) such that at, and oil share the same lemma, the same POS tag , or are linked through a derivation link on WordNet (Fellbaum, 1998). |
Use of external MWE resources | 6We use the version available in the POS tagger MElt (Denis and Sagot, 2009). |
Use of external MWE resources | The MWE analyzer is a CRF-based sequential labeler, which, given a tokenized text, jointly performs MWE segmentation and POS tagging (of simple tokens and of MWEs), both tasks mutually helping each other9. |
Use of external MWE resources | The MWE analyzer integrates, among others, features computed from the external lexicons described in section 5.1, which greatly improve POS tagging (Denis and Sagot, 2009) and MWE segmentation (Constant and Tel-lier, 2012). |
Experiments and Analysis | We build a CRF-based bigram part-of-speech (POS) tagger with the features described in (Li et al., 2012), and produce POS tags for all trairfldevelopment/test/unlabeled sets (10-way jackknifing for training sets). |
Experiments and Analysis | (2012) and Bohnet and Nivre (2012) use joint models for POS tagging and dependency parsing, significantly outperforming their pipeline counterparts. |
Experiments and Analysis | Our approach can be combined with their work to utilize unlabeled data to improve both POS tagging and parsing simultaneously. |
Supervised Dependency Parsing | ti denotes the POS tag of 10,-. b is an index between h and m. dir(z', j) and dist(i, j) denote the direction and distance of the dependency (i, j). |
Argument Identification | 0 bag of words in a 0 bag of POS tags in a |
Argument Identification | o the set of dependency labels of the predicate’s children 0 dependency path conjoined with the POS tag of a’s head |
Experiments | Before parsing the data, it is tagged with a POS tagger trained with a conditional random field (Lafferty et al., 2001) with the following emission features: word, the word cluster, word suffixes of length l, 2 and 3, capitalization, whether it has a hyphen, digit and punctuation. |
Frame Identification with Embeddings | Let the lexical unit (the lemma conjoined with a coarse POS tag ) for the marked predicate be 6. |
Experiments | Examining translation rules extracted from the training data shows that there are 72,366 types of non-terminals with respect to 33 types of POS tags . |
Head-Driven HPB Translation Model | Instead of collapsing all non-terminals in the source language into a single symbol X as in Chiang (2007), given a word sequence f2- from position i to position 3', we first find heads and then concatenate the POS tags of these heads as fé’s nonterminal symbol. |
Head-Driven HPB Translation Model | We look for initial phrase pairs that contain other phrases and then replace sub-phrases with POS tags corresponding to their heads. |
Introduction | Here, each Chinese word is attached with its POS tag and Pinyin. |
Background | The C&C supertagger is similar to the Ratnaparkhi (1996) tagger, using features based on words and POS tags in a five-word window surrounding the target word, and defining a local probability distribution over supertags for each word in the sentence, given the previous two supertags. |
Data | For supertagger evaluation, one thousand sentences were manually annotated with CCG lexical categories and POS tags . |
Introduction | Since the CCG lexical category set used by the supertagger is much larger than the Penn Treebank POS tag set, the accuracy of supertagging is much lower than POS tagging ; hence the CCG supertagger assigns multiple supertags1 to a word, when the local context does not provide enough information to decide on the correct supertag. |
Introduction | (2003) were unable to improve the accuracy of POS tagging using self-training. |
Experiments | For English, we use the automatically-assigned POS tags produced by an implementation of the POS tagger of Collins (2002). |
Experiments | While for Chinese, we just use the gold-standard POS tags following the tradition. |
Experiments | Both English and Chinese sentences are tagged by the implementations of the POS tagger of Collins (2002), which trained on W8] and CTB 5.0 respectively. |
Word-Pair Classification Model | 1 Each feature is composed of some words and POS tags surrounded word 7' and/or word j, as well as an optional distance representations between this two words. |
Introduction | The most ambiguous word has 7 different POS tags associated with it. |
Minimized models for supertagging | We also wish to scale our methods to larger data settings than the 24k word tokens in the test data used in the POS tagging task. |
Minimized models for supertagging | On the simpler task of unsupervised POS tagging with a dictionary, we compared our method versus directly solving I Pongmal and found that the minimization (in terms of grammar size) achieved by our method is close to the optimal solution for the original objective and yields the same tagging accuracy far more efficiently. |
Minimized models for supertagging | Ravi and Knight (2009) exploited this to iteratively improve their POS tag model: since the first minimization procedure is seeded with a noisy grammar and tag dictionary, iterating the IP procedure with progressively better grammars further improves the model. |
Testing SRL Performance | When trained on arguments identified via the unsupervised POS tagger , noun pattern features promoted agent interpretations of tran- |
Unsupervised Parsing | To implement this division into function and content words3, we start with a list of function word POS tags4 and then find words that appear predominantly with these POS tags , using tagged WSJ data (Marcus et al., 1993). |
Unsupervised Parsing | Smaller numbers are better, indicating less information lost in moving from the HMM states to the gold POS tags . |
Unsupervised Parsing | We first evaluate these parsers (the first stage of our SRL system) on unsupervised POS tagging . |
A Motivating Example | POS tags Excellent/JJ and/CC broad/JJ |
Sentiment Sensitive Thesaurus | We then apply a simple word filter based on POS tags to select content words (nouns, verbs, adjectives, and adverbs). |
Sentiment Sensitive Thesaurus | In addition to word-level sentiment features, we replace words with their POS tags to create |
Sentiment Sensitive Thesaurus | POS tags generalize the word-level sentiment features, thereby reducing feature sparseness. |
Composite language model | The SLM is based on statistical parsing techniques that allow syntactic analysis of sentences; it assigns a probability p(VV, T) to every sentence W and every possible binary parse T. The terminals of T are the words of W with POS tags , and the nodes of T are annotated with phrase headwords and nonterminal labels. |
Composite language model | A word-parse k-prefix has a set of exposed heads h_m, - - - , h_1, with each head being a pair (headword, nonterminal label), or in the case of a root-only tree (word, POS tag ). |
Composite language model | An m—th order SLM (m-SLM) has three operators to generate a sentence: WORD-PREDICTOR predicts the next word wk+1 based on the m leftmost exposed headwords bin 2 h_m, - - - , h_1 in the word-parse k-prefix with probability p(wk+1|h:,1n), and then passes control to the TAGGER; the TAGGER predicts the POS tag tk+1 to the next word wk+1 based on the next word wk+1 and the POS tags of the m leftmost exposed headwords hjn in the word-parse k-prefix with probability p(tk+1|wk+1, h_m.tag, - - - ,h_1.tag); the CONSTRUCTOR builds the partial parse Tk, from Tk,_1, wk, and tk, in a series of moves ending with NULL, where a parse move a is made with probability p(a|h:,1,,); a e A={(unary, NTlabel), (adjoin-left, NTlabel), (adjoin-right, NTlabel), null}. |
Training algorithm | The TAGGER and CONSTRUCTOR are conditional probabilistic models of the type p(u|zl, - - - ,2“) where u, 21, - - - ,zn belong to a mixed set of words, POS tags , NTtags, CONSTRUCTOR actions (u only), and 21, - - - ,2“, form a linear Markov chain. |
Abstract | We describe a novel method for the task of unsupervised POS tagging with a dictionary, one that uses integer programming to explicitly search for the smallest model that explains the data, and then uses EM to set parameter values. |
Introduction | The classic Expectation Maximization (EM) algorithm has been shown to perform poorly on POS tagging , when compared to other techniques, such as Bayesian methods. |
Introduction | (2008) depart from the Bayesian framework and show how EM can be used to learn good POS taggers for Hebrew and English, when provided with good initial conditions. |
What goes wrong with EM? | The overall POS tag distribution learnt by EM is relatively uniform, as noted by Johnson (2007), and it tends to assign equal number of tokens to each |
Discussion and Future Work | We can then award partial scores for related words, such as those identified as such by WordNet or those with the same POS tags . |
Experiments | However, its use of POS tags and synonym dictionaries prevents its use at the character-level. |
Experiments | We use the Stanford Chinese word segmenter (Tseng et al., 2005) and POS tagger (Toutanova et al., 2003) for preprocessing and Cilin for synonym |
Introduction | However, many different segmentation standards eXist for different purposes, such as Microsoft Research Asia (MSRA) for Named Entity Recognition (NER), Chinese Treebank (CTB) for parsing and part-of-speech (POS) tagging, and City University of Hong Kong (CITYU) and Academia Sinica (AS) for general word segmentation and POS tagging . |
Co-training strategy for prosodic event detection | As described in Section 4, we use two classifiers for the prosodic event detection task based on two different information sources: one is the acoustic evidence extracted from the speech signal of an utterance; the other is the lexical and syntactic evidence such as syllables, words, POS tags and phrasal boundary information. |
Previous work | (2007) applied co-training method in POS tagging using agreement-based selection strategy. |
Prosodic event detection method | 0 Accent detection: syllable identity, lexical stress (exist or not), word boundary information (boundary or not), and POS tag . |
Prosodic event detection method | 0 IPB and Break index detection: POS tag , the ratio of syntactic phrases the word initiates, and the ratio of syntactic phrases the word terminates. |
Experiments | 4Since the dictionary is not explicitly annotated with PoS tags, we firstly took the intersection of the training corpus and the dictionary words, and assigned all the possible PoS tags to the words which appeared in the corpus. |
Experiments | Proper noun performance for the Stanford segmenter is not shown since it does not assign PoS tags . |
Word Segmentation Model | Here, 111,- and wi_1 denote the current and previous word in question, and ti and til are level-j PoS tags assigned to them. |
Word Segmentation Model | 1The Japanese dictionary and the corpus we used have 6 levels of PoS tag hierarchy, while the Chinese ones have only one level, which is why some of the PoS features are not included in Chinese. |
Introduction | 3.1 and the POS tags come from a PCFG. |
Introduction | The standard RNN essentially ignores all POS tags and syntactic categories and each nonterminal node is associated with the same neural network (i.e., the weights across nodes are fully tied). |
Introduction | While this results in a powerful composition function that essentially depends on the words being combined, the number of model parameters explodes and the composition functions do not capture the syntactic commonalities between similar POS tags or syntactic categories. |
Experimental Evaluation | model is approximate, because we used different preprocessing tools: MX-POST for POS tagging (Ratnaparkhi, 1996), MSTParser for parsing (McDonald et al., 2005), and Dan Bikel’s interface (http: //WWW . |
QG for Paraphrase Modeling | For unobserved cases, the conditional probability is estimated by backing off to the parent POS tag and child direction. |
QG for Paraphrase Modeling | We estimate the distributions over dependency labels, POS tags , and named entity classes using the transformed treebank (footnote 4). |
QG for Paraphrase Modeling | (17) The parameters 9 to be learned include the class priors, the conditional distributions of the dependency labels given the various configurations, the POS tags given POS tags , the NE tags given NE |
Introduction | In one of the first efforts to enrich the source in word-based SMT, Ueffing and Ney (2003) used part-of-speech (POS) tags, in order to deal with the verb conjugation of Spanish and Catalan; so, POS tags were used to identify the pronoun+verb sequence and splice these two words into one term. |
Introduction | In their presentation of the factored SMT models, Koehn and Hoang (2007) describe experiments for translating from English to German, Spanish and Czech, using morphology tags added on the morphologically rich side, along with POS tags . |
Methods for enriching input | The POS tag of this noun is then used to identify if it is plural or singular. |
Methods for enriching input | The word “aspects” is found, which has a POS tag that shows it is a plural noun. |
Clustering-based word representations | Ushioda (1996) presents an extension to the Brown clustering algorithm, and learn hierarchical clusterings of words as well as phrases, which they apply to POS tagging . |
Clustering-based word representations | Li and McCallum (2005) use an HMM-LDA model to improve POS tagging and Chinese Word Segmentation. |
Clustering-based word representations | (2009) use an HMM to assign POS tags to words, which in turns improves the accuracy of the PCFG—based Hebrew parser. |
Conditional Random Fields for Sequence Labeling | Many NLP tasks, such as POS tagging , chunking, or NER, are sequence labeling problems where a sequence of class labels 3] = (3/1,. |
Conditional Random Fields for Sequence Labeling | Input units 553- are usually tokens, class labels yj can be POS tags or entity classes. |
Introduction | When used for sequence labeling tasks such as POS tagging , chunking, or named entity recogni- |
Experiments | For example, one template returns the top category on the stack plus its head word, together with the first word and its POS tag on the queue. |
Experiments | Another template returns the second category on the stack, together with the POS tag of its head word. |
Experiments | We use 10-fold cross validation for POS tagging and supertagging the training data, and automatically assigned POS tags for all experiments. |
A Latent Variable CCG Parser | In the supertagging literature, POS tagging and supertagging are distinguished — POS tags are the traditional Penn treebank tags (e. g. NN, VBZ and DT) and supertags are CCG categories. |
A Latent Variable CCG Parser | However, because the Petrov parser trained on CCGbank has no notion of Penn treebank POS tags , we can only evaluate the accuracy of the supertags. |
A Latent Variable CCG Parser | Despite the lack of POS tags in the Petrov parser, we can see that it performs slightly better than the Clark and Curran parser. |
Setting of the experiment | A decision tree (C4.5, Release 8) is used to detect false starts, trained on the POS tags and trigger-word status of the first and last four words of sentences from a training set. |
Setting of the experiment | For (both WH-and Yesfl\Io) question identification, another C4.5 classifier was trained on 2,000 manually annotated sentences using utterance length, POS bigram occurrences, and the POS tags and trigger-word status of the first and last five words of an utterance. |
Setting of the experiment | Taking ASR transcripts as input, we use the Brill tagger (Brill, 1995) to assign POS tags to each word. |
Conditional Random Fields | 3-grm 10.74% 14.3M 14.59% 0.3M 5-grm 8.48% 132.5M 11.54% 2.5M POS tagging |
Conditional Random Fields | For the POS tagging task, BCD appears to be unpractically slower to train than the others approaches (SGD takes about 40min to train, OWL-QN about 1 hour) due the simultaneous increase in the sequence length and in the number of observations. |
Conditional Random Fields | Based on this observation, we have designed an incremental training strategy for the POS tagging task, where more specific features are progressively incorporated into the model if the corresponding less specific feature is active. |
Experiments | POS tag at beginning and end of the EDU |
Implementation | The dependency structure and POS tags are obtained from MALT-Parser (Nivre et al., 2007). |
Model | While such feature learning approaches have proven to increase robustness for parsing, POS tagging , and NER (Miller et al., 2004; Koo et al., 2008; Turian et al., 2010), they would seem to have an especially promising role for discourse, where training data is relatively sparse and ambiguity is considerable. |
Our Approach | As shown in Table 2, we classify the features used in WikiCiKE into three categories: format features, POS tag features and token features. |
Our Approach | POS tag POS tag of current token features POS tags of previous 5 tokens |
Our Approach | POS tags of |
Abstract | On CCGbank we achieve a labelled dependency F—measure of 88.8% on gold POS tags , and 86.7% on automatic part-of-speeoch tags, the best reported results for this task. |
Conclusion and Future Work | In future work we plan to integrate the POS tagger , which is crucial to parsing accuracy (Clark and Curran, 2004b). |
Experiments | To the best of our knowledge, the results obtained with BP and DD are the best reported results on this task using gold POS tags . |
A Class-based Model of Agreement | The coarse categories are the universal POS tag set described by Petrov et al. |
A Class-based Model of Agreement | For Arabic, we used the coarse POS tags plus definiteness and the so-called phi features (gender, number, and person).4 For example, SJWl ‘the car’ would be tagged “Noun+Def+Sg+Fem”. |
Discussion of Translation Results | For comparison, +POS indicates our class-based model trained on the 11 coarse POS tags only (e.g., “Noun”). |
Experiments | We apply 1-best and k-best sequential decoding algorithms to five NLP tagging tasks: Penn TreeBank (PTB) POS tagging, CoNLLZOOO joint POS tagging and chunking, CoNLL 2003 joint POS tagging , chunking and named entity tagging, HPSG supertag-ging (Matsuzaki et al., 2007) and a search query named entity recognition (NER) dataset. |
Experiments | As in (Kaji et al., 2010), we combine the POS tags and chunk tags to form joint tags for CoNLL 2000 dataset, e.g., NN|B-NP. |
Experiments | Similarly we combine the POS tags , chunk tags, and named entity tags to form joint tags for CoNLL 2003 dataset, e.g., PRP$|I-NP|O. |
Abstract | When tagging with CTAGS, one can use any statistical POS tagging method such as HMMs, Maximum Entropy Classifiers, Bayesian Networks, CRFs, etc., followed by the CTAG to MSD recovery. |
Abstract | Manual+automatic Tmmmg a POS tagger rules for MSD recovery I I I I Tagging i i Input data Labeling with CTAGS —> MSD Recovery Output data |
Abstract | Also, our POS tagger detected cases where the annotation in the Gold Standard was erroneous. |
Error Classification | For this reason, we include POS tag 1, 2, 3, and 4-grams in the set of features we sort in the previous paragraph. |
Error Classification | For each error 6,, we select POS tag n-grams from the top thousand features of the information gain sorted list to count toward the Ap+i and Api aggregation features. |
Error Classification | This feature type may also help with Confusing Phrasing because the list of POS tag n-grams our annotator generated for its Ap+i contains useful features like DT NNS VBZ VBN (e.g., “these signals has been”), which captures noun-verb disagreement. |
Generating reference reordering from parallel sentences | the Model 1 probabilities between pairs of words linked in the alignment a, features that inspect source and target POS tags and parses (if available) and features that inspect the alignments of adjacent words in the source and target sentence. |
Generating reference reordering from parallel sentences | We conjoin the msd (minimum signed distance) with the POS tags to allow the model to capture the fact that the alignment error rate maybe higher for some POS tags than others (e.g., we have observed verbs have a higher error rate in Urdu-English alignments). |
Reordering model | where 6 is a learned vector of weights and (I) is a vector of binary feature functions that inspect the words and POS tags of the source sentence at and around positions m and n. We use the features ((1)) described in Visweswariah et al. |
Experiments | We trained CRFs for opinion entity identification using the following features: indicators for words, POS tags , and lexicon features (the subjectivity strength of the word in the Subjectivity Lexicon). |
Model | Words and POS tags: the words contained in the candidate and their POS tags . |
Model | For features, we use words, POS tags , phrase types, lexicon and semantic frames (see Section 3.2.1 for details) to capture the properties of the opinion expression, and also features that capture the context of the opinion expression: |
Chinese Empty Category Prediction | leftmost child label or POS tag rightmost child label or POS tag label or POS tag of the head child the number of child nodes |
Chinese Empty Category Prediction | left-sibling label or POS tag |
Chinese Empty Category Prediction | 0 right-sibling label or POS tag |