Algorithm | To estimate this joint distribution, PSH samples are extracted from the training corpus using unsupervised POS taggers (Clark, 2003; Abend et al., 2010) and an unsupervised parser (Seginer, 2007). |
Algorithm | This parser is unique in its ability to induce a bracketing (unlabeled parsing) from raw text (without even using POS tags ) with strong results. |
Algorithm | We continue by tagging the corpus using Clark’s unsupervised POS tagger (Clark, 2003) and the unsupervised Prototype Tagger (Abend et al., 2010)2. |
Conclusion | The algorithm applies state-of-the-art unsupervised parser and POS tagger to collect statistics from a large raw text corpus. |
Core-Adjunct in Previous Work | In addition, supervised models utilize supervised parsers and POS taggers, while the current state-of-the-art in unsupervised parsing and POS tagging is considerably worse than their supervised counterparts. |
Core-Adjunct in Previous Work | First, all works use manual or supervised syntactic annotations, usually including a POS tagger . |
Experimental Setup | This scenario decouples the accuracy of the algorithm from the quality of the unsupervised POS tagging . |
Experimental Setup | Finally, we experiment on a scenario where even argument identification on the test set is not provided, but performed by the algorithm of (Abend et al., 2009), which uses neither syntactic nor SRL annotation but does utilize a supervised POS tagger . |
Introduction | However, no work has tack-lwnyamammemmfimdmmmm.Umw pervised models reduce reliance on the costly and error prone manual multilayer annotation ( POS tagging , parsing, core-adjunct tagging) commonly used for this task. |
Abstract | Experiments on three tasks (POS tagging, joint POS tagging and chunking, and supertagging) show that the new algorithm is several orders of magnitude faster than the basic Viterbi and a state-of-the-art algorithm, CARPEDIEM (Esposito and Radicioni, 2009). |
Introduction | Now they are indispensable in a wide range of NLP tasks including chunking, POS tagging , NER and so on (Sha and Pereira, 2003; Tsuruoka and Tsujii, 2005; Lin and Wu, 2009). |
Introduction | For example, there are more than 40 and 2000 labels in POS tagging and supertagging, respectively (Brants, 2000; Matsuzaki et al., 2007). |
Introduction | As we shall see later, we need over 300 labels to reduce joint POS tagging and chunking into the single sequence labeling problem. |
Parsing experiments | For example, fdep contains lexicalized “in-between” features that depend on the head and modifier words as well as a word lying in between the two; in contrast, previous work has generally defined in-between features for POS tags only. |
Parsing experiments | 8A3 in previous work, English evaluation ignores any token whose gold-standard POS tag is one of { ‘ ‘ ' ' : |
Parsing experiments | First, we define 4-gram features that characterize the four relevant indices using words and POS tags; examples include POS 4-grams and mixed 4-grams with one word and three POS tags . |
Related work | These indices allow the use of arbitrary features predicated on the position of the grandparent (e. g., word identity, POS tag, contextual POS tags ) without affecting the asymptotic complexity of the parsing algorithm. |
Ad hoc rule detection | Units of comparison To determine similarity, one can compare dependency relations, POS tags , or both. |
Ad hoc rule detection | Thus, we use the pairs of dependency relations and POS tags as the units of comparison. |
Additional information | One method which does not have this problem of overflagging uses a “lexicon” of POS tag pairs, examining relations between POS, irrespective of position. |
Evaluation | We use the gold standard POS tags for all experiments. |
Evaluation | For example, the parsed rule TA —> IG:IG RO has a correct dependency relation (IG) between the POS tags IG and its head RO, yet is assigned a whole rule score of 2 and a bigram score of 20. |
Evaluation | This is likely due to the fact that Alpino has the smallest label set of any of the corpora, with only 24 dependency labels and 12 POS tags (cf. |
Conclusion | WOE can run in two modes: a CRF extractor (WOEPOS) trained with shallow features like POS tags ; a pattern classfier (WOEparse) learned from dependency path patterns. |
Related Work | Shallow or Deep Parsing: Shallow features, like POS tags , enable fast extraction over large-scale corpora (Davidov et al., 2007; Banko et al., 2007). |
Wikipedia-based Open IE | NLP Annotation: As we discuss fully in Section 4 (Experiments), we consider several variations of our system; one version, WOEparse, uses parser-based features, while another, WOEPOS , uses shallow features like POS tags , which may be more quickly computed. |
Wikipedia-based Open IE | Depending on which version is being trained, the preprocessor uses OpenNLP to supply POS tags and NP—chunk annotations — or uses the Stanford Parser to create a dependency parse. |
Wikipedia-based Open IE | We learn two kinds of extractors, one (WOEparse) using features from dependency-parse trees and the other (WOEPOS) limited to shallow features like POS tags . |
Bilingual subtree constraints | For the source part, we replace nouns and verbs using their POS tags (coarse grained tags). |
Bilingual subtree constraints | For example, we have the subtree pair: “H %(society):2-ifl €%(fringe):0” and “fringes(W_2):0-of:1-society(W_1):2”, where “of” does not have a corresponding word, the POS tag of “fiéflsocietyY’ is N, and the POS tag of “53 é%(fringe)” is N. The source part of the rule becomes “N22-N20” and the target part becomes “W_2:0-of:1-W_1:2”. |
Experiments | For Chinese unannotated data, we used the XIN_CMN portion of Chinese Gigaword Version 2.0 (LDC2009T14) (Huang, 2009), which has approximately 311 million words whose segmentation and POS tags are given. |
Experiments | We used the MMA system (Kruengkrai et al., 2009) trained on the training data to perform word segmentation and POS tagging and used the Baseline Parser to parse all the sentences in the data. |
Experiments | The POS tags were assigned by the MXPOST tagger trained on training data. |
Introduction | For the chunker and POS tagger , the drop-offs are less severe: 94.89 to 91.73, and 97.36 to 94.73. |
Introduction | We use an open source CRF software package to implement our CRF models.1 We use words, POS tags , chunk labels, and the predicate label at the preceding and following nodes as features for our Baseline system. |
Introduction | 0 P05 before, after predicate: the POS tag of the tokens immediately preceding and following the predicate |
Decoding | where the first two terms are translation and language model probabilities, 6(0) is the target string (English sentence) for derivation 0, the third and forth items are the dependency language model probabilities on the target side computed with words and POS tags separately, De (0) is the target dependency tree of 0, the fifth one is the parsing probability of the source side tree TC(0) 6 FC, the ill(0) is the penalty for the number of ill-formed dependency structures in 0, and the last two terms are derivation and translation length penalties, respectively. |
Decoding | In order to alleviate the problem of data sparse, we also compute a dependency language model for POS tages over a dependency tree. |
Decoding | the POS tag information on the target side for each constituency-to-dependency rule. |
Experiments | We also store the POS tag information for each word in dependency trees, and compute two different dependency language models for words and POS tags in dependency tree separately. |
Testing SRL Performance | When trained on arguments identified via the unsupervised POS tagger , noun pattern features promoted agent interpretations of tran- |
Unsupervised Parsing | To implement this division into function and content words3, we start with a list of function word POS tags4 and then find words that appear predominantly with these POS tags , using tagged WSJ data (Marcus et al., 1993). |
Unsupervised Parsing | Smaller numbers are better, indicating less information lost in moving from the HMM states to the gold POS tags . |
Unsupervised Parsing | We first evaluate these parsers (the first stage of our SRL system) on unsupervised POS tagging . |
Experiments | For English, we use the automatically-assigned POS tags produced by an implementation of the POS tagger of Collins (2002). |
Experiments | While for Chinese, we just use the gold-standard POS tags following the tradition. |
Experiments | Both English and Chinese sentences are tagged by the implementations of the POS tagger of Collins (2002), which trained on W8] and CTB 5.0 respectively. |
Word-Pair Classification Model | 1 Each feature is composed of some words and POS tags surrounded word 7' and/or word j, as well as an optional distance representations between this two words. |
Background | The C&C supertagger is similar to the Ratnaparkhi (1996) tagger, using features based on words and POS tags in a five-word window surrounding the target word, and defining a local probability distribution over supertags for each word in the sentence, given the previous two supertags. |
Data | For supertagger evaluation, one thousand sentences were manually annotated with CCG lexical categories and POS tags . |
Introduction | Since the CCG lexical category set used by the supertagger is much larger than the Penn Treebank POS tag set, the accuracy of supertagging is much lower than POS tagging ; hence the CCG supertagger assigns multiple supertags1 to a word, when the local context does not provide enough information to decide on the correct supertag. |
Introduction | (2003) were unable to improve the accuracy of POS tagging using self-training. |
Introduction | The most ambiguous word has 7 different POS tags associated with it. |
Minimized models for supertagging | We also wish to scale our methods to larger data settings than the 24k word tokens in the test data used in the POS tagging task. |
Minimized models for supertagging | On the simpler task of unsupervised POS tagging with a dictionary, we compared our method versus directly solving I Pongmal and found that the minimization (in terms of grammar size) achieved by our method is close to the optimal solution for the original objective and yields the same tagging accuracy far more efficiently. |
Minimized models for supertagging | Ravi and Knight (2009) exploited this to iteratively improve their POS tag model: since the first minimization procedure is seeded with a noisy grammar and tag dictionary, iterating the IP procedure with progressively better grammars further improves the model. |
A Latent Variable CCG Parser | In the supertagging literature, POS tagging and supertagging are distinguished — POS tags are the traditional Penn treebank tags (e. g. NN, VBZ and DT) and supertags are CCG categories. |
A Latent Variable CCG Parser | However, because the Petrov parser trained on CCGbank has no notion of Penn treebank POS tags , we can only evaluate the accuracy of the supertags. |
A Latent Variable CCG Parser | Despite the lack of POS tags in the Petrov parser, we can see that it performs slightly better than the Clark and Curran parser. |
Conditional Random Fields | 3-grm 10.74% 14.3M 14.59% 0.3M 5-grm 8.48% 132.5M 11.54% 2.5M POS tagging |
Conditional Random Fields | For the POS tagging task, BCD appears to be unpractically slower to train than the others approaches (SGD takes about 40min to train, OWL-QN about 1 hour) due the simultaneous increase in the sequence length and in the number of observations. |
Conditional Random Fields | Based on this observation, we have designed an incremental training strategy for the POS tagging task, where more specific features are progressively incorporated into the model if the corresponding less specific feature is active. |
Clustering-based word representations | Ushioda (1996) presents an extension to the Brown clustering algorithm, and learn hierarchical clusterings of words as well as phrases, which they apply to POS tagging . |
Clustering-based word representations | Li and McCallum (2005) use an HMM-LDA model to improve POS tagging and Chinese Word Segmentation. |
Clustering-based word representations | (2009) use an HMM to assign POS tags to words, which in turns improves the accuracy of the PCFG—based Hebrew parser. |