Abstract | Overall, we can say that the improvements are small and not significant using automatic POS tags, contrary to previously published results using gold POS tags (Agirre et al., 2011). |
Experimental Framework | We modified the system in order to add semantic features, combining them with wordforms and POS tags , on the parent and child nodes of each arc. |
Introduction | using MaltParser on gold POS tags . |
Introduction | In this work, we will investigate the effect of semantic information using predicted POS tags . |
Related work | (201 1) successfully introduced WordNet classes in a dependency parser, obtaining improvements on the full PTB using gold POS tags , trying different combinations of semantic classes. |
Results | For all the tests, we used a perceptron POS-tagger (Collins, 2002), trained on WSJ sections 2—21, to assign POS tags automatically to both the training (using 10—way jackknifing) and test data, obtaining a POS tagging accuracy of 97.32% on the test data. |
Results | Overall, we see that the small improvements do not confirm the previous results on Penn2Malt, MaltParser and gold POS tags . |
Results | One of the obstacles of automatic parsers is the presence of incorrect POS tags due to auto- |
Features | 0 Coordination In a coordinate structure, the two adj acent conjuncts usually agree with each other on POS tags and their span lengths. |
Features | Therefore, we add different features to capture POS tag and span length consistency in a coordinate structure. |
Features | 0 Span Length This feature captures the distribution of the binned span length of each POS tag . |
Introduction | When proposing a small move, i.e., sampling a head of the word, we can also jointly sample its POS tag from a set of alternatives provided by the tagger. |
Sampling-Based Dependency Parsing with Global Features | For instance, we can sample the POS tag , the dependency relation or morphology information. |
Sampling-Based Dependency Parsing with Global Features | POS correction scenario in which only the predicted POS tags are provided in the testing phase, while both gold and predicted tags are available for the training set. |
Sampling-Based Dependency Parsing with Global Features | We extend our model such that it jointly learns how to predict a parse tree and also correct the predicted POS tags for a better parsing performance. |
Distribution Prediction | As we go on to show in Section 6, this enables us to use the same distribution prediction method for both POS tagging and sentiment classification. |
Domain Adaptation | We consider two DA tasks: (a) cross-domain POS tagging (Section 4.1), and (b) cross-domain sentiment classification (Section 4.2). |
Domain Adaptation | 4.1 Cross-Domain POS Tagging |
Domain Adaptation | manually POS tagged ) sentence, we select its neighbours 7N) in the source domain as additional features. |
Introduction | 0 Using the learnt distribution prediction model, we propose a method to learn a cross-domain POS tagger . |
Related Work | words that appear in both the source and target domains) to adapt a POS tagger to a target domain. |
Related Work | Choi and Palmer (2012) propose a cross-domain POS tagging method by training two separate models: a generalised model and a domain-specific model. |
Related Work | Adding latent states to the smoothing model further improves the POS tagging accuracy (Huang and Yates, 2012). |
Abstract | First, to resolve the error propagation problem of the traditional pipeline approach, we incorporate POS tagging into the syntactic parsing process. |
Introduction | First, POS tagging is typically performed separately as a preliminary step, and POS tagging errors will propagate to the parsing process. |
Introduction | This problem is especially severe for languages where the POS tagging accuracy is relatively low, and this is the case for Chinese where there are fewer contextual clues that can be used to inform the tagging process and some of the tagging decisions are actually influenced by the syntactic structure of the sentence. |
Introduction | First, we integrate POS tagging into the parsing process and jointly optimize these two processes simultaneously. |
Joint POS Tagging and Parsing with Nonlocal Features | To address the drawbacks of the standard transition-based constituent parsing model (described in Section 1), we propose a model to jointly solve POS tagging and constituent parsing with nonlocal features. |
Joint POS Tagging and Parsing with Nonlocal Features | 3.1 Joint POS Tagging and Parsing |
Joint POS Tagging and Parsing with Nonlocal Features | POS tagging is often taken as a preliminary step for transition-based constituent parsing, therefore the accuracy of POS tagging would greatly affect parsing performance. |
Transition-based Constituent Parsing | Figure 1: Two constituent trees for an example sentence wowlwg with POS tags abc. |
Transition-based Constituent Parsing | For example, in Figure l, for the input sentence wowlwg and its POS tags abc, our parser can construct two parse trees using action sequences given below these trees. |
Discussion | Except in Row 8 and Row 11, when two head nouns of entity pair were combined as semantic pair and when POS tag were combined with the entity type, the performances are decreased. |
Discussion | Comparing the reference set (5) with the reference set (3), the Head noan and adjacent entity POS tag get a better performance when used as singletons. |
Discussion | In this paper, for a better demonstration of the constraint condition, we still use the Position Sensitive as the default setting to use the Head noan and the adjacent entity POS tag . |
Feature Construction | All the employed features are simply classified into five categories: Entity Type and Subtype, Head Noun, Position Feature, POS Tag and Omni-word Feature. |
Feature Construction | POS Tag: In our model, we use only the adjacent entity POS tags , which lie in two sides of the entity mention. |
Feature Construction | These POS tags are labelled by the ICTCLAS packagez. |
Approaches | A typical pipeline consists of a POS tagger , dependency parser, and semantic role labeler. |
Approaches | Brown Clusters We use fully unsupervised Brown clusters (Brown et al., 1992) in place of POS tags . |
Approaches | We define the DMV such that it generates sequences of word classes: either POS tags or Brown clusters as in Spitkovsky et al. |
Experiments | Our experiments are subtractive, beginning with all supervision available and then successively removing (a) dependency syntax, (b) morphological features, (c) POS tags , and (d) lemmas. |
Experiments | The CoNLL-2009 Shared Task (Hajic et al., 2009) dataset contains POS tags , lemmas, morphological features, syntactic dependencies, predicate senses, and semantic roles annotations for 7 languages: Catalan, Chinese, Czech, English, German, Japanese,4 Spanish. |
Experiments | We first compare our models trained as a pipeline, using all available supervision (syntax, morphology, POS tags , lemmas) from the CoNLL-2009 data. |
Introduction | 0 Use of Brown clusters in place of POS tags for low-resource SRL. |
Related Work | (2012) limit their exploration to a small set of basic features, and included high-resource supervision in the form of lemmas, POS tags , and morphology available from the CoNLL 2009 data. |
Related Work | Our experiments also consider ‘longer’ pipelines that include earlier stages: a morphological analyzer, POS tagger , lemmatizer. |
Abstract | We propose the first tagset designed for the task of character-level POS tagging . |
Abstract | We propose a method that performs character-level POS tagging jointly with word segmentation and word-level POS tagging . |
Character-level POS Tagset | We propose a tagset for the task of character-level POS tagging . |
Chinese Morphological Analysis with Character-level POS | Previous studies have shown that jointly processing word segmentation and POS tagging is preferable to pipeline processing, which can propagate errors (Nakagawa and Uchimoto, 2007; Kruengkrai et a1., 2009). |
Chinese Morphological Analysis with Character-level POS | Baseline features: For word-level nodes that represent known words, we use the symbols w, p and l to denote the word form, POS tag and length of the word, respectively. |
Chinese Morphological Analysis with Character-level POS | Proposed features: For word-level nodes, the function CPpal-T (w) returns the pair of the char-acter-level POS tags of the first and last characters of w, and CPau(w) returns the sequence of character-level POS tags of w. If either the pair or the sequence of character-level P08 is ambiguous, which means there are multiple paths in the sub-lattice of the word-level node, then the values on the current best path (with local context) during the Viterbi search will be returned. |
Evaluation | To evaluate our proposed method, we have conducted two sets of experiments on CTB5: word segmentation, and joint word segmentation and word-level POS tagging . |
Evaluation | The results of the word segmentation experiment and the joint experiment of segmentation and POS tagging are shown in Table 5(a) and Table 5(b), respectively. |
Introduction | ith Character-level POS Tagging |
Introduction | We propose the first tagset designed for the task of character-level POS tagging , based on which we manually annotate the entire CTB5. |
Introduction | We propose a method that performs character-level POS tagging jointly with word segmentation and word-level POS tagging . |
Abstract | :0, is the POS tag of 21),). |
Abstract | The word embeddings are used during the leam-ing process, but the final decoder that the learning algorithm outputs maps a POS tag sequence a: to a parse tree. |
Abstract | While ideally we would want to use the word information in decoding as well, much of the syntax of a sentence is determined by the POS tags, and relatively high level of accuracy can be achieved by learning, for example, a supervised parser from POS tag sequences. |
Abstract | In this paper, we address the problem of web-domain POS tagging using a two-phase approach. |
Abstract | The representation is integrated as features into a neural network that serves as a scorer for an easy-first POS tagger . |
Introduction | However, state-of-the-art POS taggers in the literature (Collins, 2002; Shen et al., 2007) are mainly optimized on the the Penn Treebank (PTB), and when shifted to web data, tagging accuracies drop significantly (Petrov and McDonald, 2012). |
Introduction | We integrate the learned encoder with a set of well-established features for POS tagging (Ratnaparkhi, 1996; Collins, 2002) in a single neural network, which is applied as a scorer to an easy-first POS tagger . |
Introduction | We choose the easy-first tagging approach since it has been demonstrated to give higher accuracies than the standard left-to-right POS tagger (Shen et al., 2007; Ma et al., 2013). |
Learning from Web Text | This may partly be due to the fact that unlike computer vision tasks, the input structure of POS tagging or other sequential labelling tasks is relatively simple, and a single nonlinear layer is enough to model the interactions within the input (Wang and Manning, 2013). |
Neural Network for POS Disambiguation | The main challenge to designing the neural network structure is: on the one hand, we hope that the model can take the advantage of information provided by the learned WRRBM, which reflects general properties of web texts, so that the model generalizes well in the web domain; on the other hand, we also hope to improve the model’s discriminative power by utilizing well-established POS tagging features, such as those of Ratnaparkhi (1996). |
Neural Network for POS Disambiguation | Under the output layer, the network consists of two modules: the web-feature module, which incorporates knowledge from the pre-trained WRRBM, and the sparse-feature module, which makes use of other POS tagging features. |
Neural Network for POS Disambiguation | For POS tagging , we found that a simple linear layer yields satisfactory accuracies. |
Experimental Setup | These datasets include manually annotated dependency trees, POS tags and morphological information. |
Experimental Setup | In contrast, assume we take the crossproduct of the auxiliary word vector values, POS tags and lexical items of a word and its context, and add the crossed values into a normal model (in gbhmm). |
Introduction | This low dimensional syntactic abstraction can be thought of as a proxy to manually constructed POS tags . |
Introduction | For instance, on the English dataset, the low-rank model trained without POS tags achieves 90.49% on first-order parsing, while the baseline gets 86.70% if trained under the same conditions, and 90.58% if trained with 12 core POS tags . |
Problem Formulation | pos, form, lemma and morph stand for the fine POS tag , word form, word lemma and the morphology feature (provided in CoNLL format file) of the current word. |
Problem Formulation | For example, pos-p means the POS tag to the left of the current word in the sentence. |
Problem Formulation | Other possible features include, for example, the label of the arc h —> m, the POS tags between the head and the modifier, boolean flags which indicate the occurence of in-between punctutations or conjunctions, etc. |
Results | The rationale is that given all other features, the model would induce representations that play a similar role to POS tags . |
Results | Table 4: The first three columns show parsing results when models are trained without POS tags . |
Results | the performance of a parser trained with 12 Core POS tags . |
Character-Level Dependency Tree | system, each word is initialized by the action SHW with a POS tag , before being incrementally modified by a sequence of intra-word actions, and finally being completed by the action PW. |
Character-Level Dependency Tree | L and R denote the two elements over which the dependencies are built; the subscripts lcl and r01 denote the leftmost and rightmost children, respectively; the subscripts 102 and r02 denote the second leftmost and second rightmost children, respectively; w denotes the word; t denotes the POS tag ; 9 denotes the head character; ls_w and w denote the smallest left and right subwords respectively, as shown in Figure 2. |
Character-Level Dependency Tree | Since the first element of the queue can be shifted onto the stack by either SH or AR, it is more difficult to assign a POS tag to each word by using a single action. |
Data and Tools | The set of POS tags needs to be consistent across languages and treebanks. |
Data and Tools | For this reason we use the universal POS tag set of Petrov et al. |
Data and Tools | POS tags are not available for parallel data in the Europarl and Kaist corpus, so we need to pro- |
Experimental Setup | The first stage, ASR, yields an automatic transcription, which is followed by the POS tagging stage. |
Experimental Setup | The steps for automatic assessment of overall proficiency follow an analogous process (either including the POS tagger or not), depending on the objective measure being evaluated. |
Experimental Setup | 5.3.2 POS tagger |
Related Work | The idea of capturing differences in POS tag distributions for classification has been explored in several previous studies. |
Related Work | In the area of text-genre classification, POS tag distributions have been found to capture genre differences in text (Feldman et al., 2009; Marin et al., 2009); in a language testing context, it has been used in grammatical error detection and essay scoring (Chodorow and Leacock, 2000; Tetreault and Chodorow, 2008). |
Shallow-analysis approach to measuring syntactic complexity | Consider the two sentence fragments below taken from actual responses (the bigrams of interest and their associated POS tags are boldfaced). |
Experimental Setup | Relations were extracted using regular expressions over the output of a POS tagger and an NP chunker. |
Experimental Setup | We use a Maximum Entropy POS Tagger , trained on the Penn Treebank, and the WordNet lemmatizer, both implemented within the NLTK package (Loper and Bird, 2002). |
Experimental Setup | To obtain a coarse-grained set of POS tags , we collapse the tag set to 7 categories: nouns, verbs, adjectives, adverbs, prepositions, the word “to” and a category that includes all other words. |
Our Proposal: A Latent LC Approach | 1We use a POS tagger to identify content words. |
Our Proposal: A Latent LC Approach | In addition, we use POS-based features that encode the most frequent POS tag for the word lemma and the second most frequent POS tag (according to R). |
Our Proposal: A Latent LC Approach | Information about the second most frequent POS tag can be important in identifying light verb constructions, such as “take a swim” or “give a smile”, where the object is derived from a verb. |
Abstract | The method is almost free of linguistic resources (except POS tags ), and requires no elaborated linguistic rules. |
Conclusion | almost knowledge-free (except POS tags ) framework. |
Conclusion | The method is almost free of linguistic resources (except POS tags ), and does not rely on elaborated linguistic rules. |
Introduction | This framework is fully unsupervised and purely data-driven, and requires very lightweight linguistic resources (i.e., only POS tags ). |
Methodology | In order to obtain lexical patterns, we can define regular expressions with POS tags 2 and apply the regular expressions on POS tagged texts. |
Methodology | 2Such expressions are very simple and easy to write because we only need to consider POS tags of adverbial and auxiliary word. |
Methodology | Our algorithm is in spirit to double propagation (Qiu et al., 2011), however, the differences are apparent in that: firstly, we use very lightweight linguistic information (except POS tags ); secondly, our major contributions are to propose statistical measures to address the following key issues: first, to measure the utility of lexical patterns; second, to measure the possibility of a candidate word being a new word. |
Annotator disagreements across domains and languages | In this study, we had between 2-10 individual annotators with degrees in linguistics annotate different kinds of English text with POS tags , e.g., newswire text (PTB WSJ Section 00), transcripts of spoken language (from a database containing transcripts of conversations, Talkbankl), as well as Twitter posts. |
Annotator disagreements across domains and languages | We instructed annotators to use the 12 universal POS tags of Petrov et al. |
Annotator disagreements across domains and languages | 2Experiments with variation 71- grams on WSJ (Dickinson and Meurers, 2003) and the French data lead us to estimate that the fine-to-coarse mapping of POS tags disregards about 20% of observed tag-pair confusion types, most of which relate to fine-grained verb and noun distinctions, e. g. past participle versus past in “[..] criminal lawyers speculated/VBD vs. VBN that [..]”. |
Related work | (2014) use small samples of doubly-annotated POS data to estimate annotator reliability and show how those metrics can be implemented in the loss function when inducing POS taggers to reflect confidence we can put in annotations. |
Related work | They show that not biasing the theory towards a single annotator but using a cost-sensitive learning scheme makes POS taggers more robust and more applicable for downstream tasks. |
Experiments and Analysis | We build a CRF-based bigram part-of-speech (POS) tagger with the features described in (Li et al., 2012), and produce POS tags for all trairfldevelopment/test/unlabeled sets (10-way jackknifing for training sets). |
Experiments and Analysis | (2012) and Bohnet and Nivre (2012) use joint models for POS tagging and dependency parsing, significantly outperforming their pipeline counterparts. |
Experiments and Analysis | Our approach can be combined with their work to utilize unlabeled data to improve both POS tagging and parsing simultaneously. |
Supervised Dependency Parsing | ti denotes the POS tag of 10,-. b is an index between h and m. dir(z', j) and dist(i, j) denote the direction and distance of the dependency (i, j). |
Argument Identification | 0 bag of words in a 0 bag of POS tags in a |
Argument Identification | o the set of dependency labels of the predicate’s children 0 dependency path conjoined with the POS tag of a’s head |
Experiments | Before parsing the data, it is tagged with a POS tagger trained with a conditional random field (Lafferty et al., 2001) with the following emission features: word, the word cluster, word suffixes of length l, 2 and 3, capitalization, whether it has a hyphen, digit and punctuation. |
Frame Identification with Embeddings | Let the lexical unit (the lemma conjoined with a coarse POS tag ) for the marked predicate be 6. |
Use of external MWE resources | 6We use the version available in the POS tagger MElt (Denis and Sagot, 2009). |
Use of external MWE resources | The MWE analyzer is a CRF-based sequential labeler, which, given a tokenized text, jointly performs MWE segmentation and POS tagging (of simple tokens and of MWEs), both tasks mutually helping each other9. |
Use of external MWE resources | The MWE analyzer integrates, among others, features computed from the external lexicons described in section 5.1, which greatly improve POS tagging (Denis and Sagot, 2009) and MWE segmentation (Constant and Tel-lier, 2012). |
Paraphrasing | Deletions Deleted lemma and POS tag |
Paraphrasing | £13,; j and ci/zj/ denote spans from a: and c. pos(:1:¢;j) and lemma(:1:i; j) denote the POS tag and lemma sequence of £13,; 3'. |
Paraphrasing | For a pair (at, c), we also consider as candidate associations the set [3 (represented implicitly), which contains token pairs (510,, ci/) such that at, and oil share the same lemma, the same POS tag , or are linked through a derivation link on WordNet (Fellbaum, 1998). |
Baselines | Following is a list of features adopted in the two baselines, for both BaselineC4'5 and BaselineSVM, > Basic features: first token and its part-of-speech (POS) tag of the focus candidate; the number of tokens in the focus candidate; relative position of the focus candidate among all the roles present in the sentence; negated verb and its POS tag of the negative expression; |
Baselines | > Syntactic features: the sequence of words from the beginning of the governing VP to the negated verb; the sequence of POS tags from the beginning of the governing VP to the negated verb; whether the governing VP contains a CC; whether the governing VP contains a RB. |
Baselines | > Semantic features: the syntactic label of semantic role A1; whether A1 contains POS tag DT, JJ, PRP, CD, RB, VB, and WP, as defined in Blanco and Moldovan (2011); whether A1 contains token any, anybody, anymore, anyone, anything, anytime, anywhere, certain, enough, full, many, much, other, some, specifics, too, and until, as defined in Blanco and Moldovan (2011); the syntactic label of the first semantic role in the sentence; the semantic label of the last semantic role in the sentence; the thematic role for AO/Al/AZ/A3/A4 of the negated predicate. |
Experiments | POS tag at beginning and end of the EDU |
Implementation | The dependency structure and POS tags are obtained from MALT-Parser (Nivre et al., 2007). |
Model | While such feature learning approaches have proven to increase robustness for parsing, POS tagging , and NER (Miller et al., 2004; Koo et al., 2008; Turian et al., 2010), they would seem to have an especially promising role for discourse, where training data is relatively sparse and ambiguity is considerable. |
Experiments | For example, one template returns the top category on the stack plus its head word, together with the first word and its POS tag on the queue. |
Experiments | Another template returns the second category on the stack, together with the POS tag of its head word. |
Experiments | We use 10-fold cross validation for POS tagging and supertagging the training data, and automatically assigned POS tags for all experiments. |