Index of papers in Proc. ACL 2010 that mention
  • POS tags
Abend, Omri and Rappoport, Ari
Algorithm
To estimate this joint distribution, PSH samples are extracted from the training corpus using unsupervised POS taggers (Clark, 2003; Abend et al., 2010) and an unsupervised parser (Seginer, 2007).
Algorithm
This parser is unique in its ability to induce a bracketing (unlabeled parsing) from raw text (without even using POS tags ) with strong results.
Algorithm
We continue by tagging the corpus using Clark’s unsupervised POS tagger (Clark, 2003) and the unsupervised Prototype Tagger (Abend et al., 2010)2.
Conclusion
The algorithm applies state-of-the-art unsupervised parser and POS tagger to collect statistics from a large raw text corpus.
Core-Adjunct in Previous Work
In addition, supervised models utilize supervised parsers and POS taggers, while the current state-of-the-art in unsupervised parsing and POS tagging is considerably worse than their supervised counterparts.
Core-Adjunct in Previous Work
First, all works use manual or supervised syntactic annotations, usually including a POS tagger .
Experimental Setup
This scenario decouples the accuracy of the algorithm from the quality of the unsupervised POS tagging .
Experimental Setup
Finally, we experiment on a scenario where even argument identification on the test set is not provided, but performed by the algorithm of (Abend et al., 2009), which uses neither syntactic nor SRL annotation but does utilize a supervised POS tagger .
Introduction
However, no work has tack-lwnyamammemmfimdmmmm.Umw pervised models reduce reliance on the costly and error prone manual multilayer annotation ( POS tagging , parsing, core-adjunct tagging) commonly used for this task.
POS tags is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Kaji, Nobuhiro and Fujiwara, Yasuhiro and Yoshinaga, Naoki and Kitsuregawa, Masaru
Abstract
Experiments on three tasks (POS tagging, joint POS tagging and chunking, and supertagging) show that the new algorithm is several orders of magnitude faster than the basic Viterbi and a state-of-the-art algorithm, CARPEDIEM (Esposito and Radicioni, 2009).
Introduction
Now they are indispensable in a wide range of NLP tasks including chunking, POS tagging , NER and so on (Sha and Pereira, 2003; Tsuruoka and Tsujii, 2005; Lin and Wu, 2009).
Introduction
For example, there are more than 40 and 2000 labels in POS tagging and supertagging, respectively (Brants, 2000; Matsuzaki et al., 2007).
Introduction
As we shall see later, we need over 300 labels to reduce joint POS tagging and chunking into the single sequence labeling problem.
POS tags is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Koo, Terry and Collins, Michael
Parsing experiments
For example, fdep contains lexicalized “in-between” features that depend on the head and modifier words as well as a word lying in between the two; in contrast, previous work has generally defined in-between features for POS tags only.
Parsing experiments
8A3 in previous work, English evaluation ignores any token whose gold-standard POS tag is one of { ‘ ‘ ' ' :
Parsing experiments
First, we define 4-gram features that characterize the four relevant indices using words and POS tags; examples include POS 4-grams and mixed 4-grams with one word and three POS tags .
Related work
These indices allow the use of arbitrary features predicated on the position of the grandparent (e. g., word identity, POS tag, contextual POS tags ) without affecting the asymptotic complexity of the parsing algorithm.
POS tags is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Dickinson, Markus
Ad hoc rule detection
Units of comparison To determine similarity, one can compare dependency relations, POS tags , or both.
Ad hoc rule detection
Thus, we use the pairs of dependency relations and POS tags as the units of comparison.
Additional information
One method which does not have this problem of overflagging uses a “lexicon” of POS tag pairs, examining relations between POS, irrespective of position.
Evaluation
We use the gold standard POS tags for all experiments.
Evaluation
For example, the parsed rule TA —> IG:IG RO has a correct dependency relation (IG) between the POS tags IG and its head RO, yet is assigned a whole rule score of 2 and a bigram score of 20.
Evaluation
This is likely due to the fact that Alpino has the smallest label set of any of the corpora, with only 24 dependency labels and 12 POS tags (cf.
POS tags is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Wu, Fei and Weld, Daniel S.
Conclusion
WOE can run in two modes: a CRF extractor (WOEPOS) trained with shallow features like POS tags ; a pattern classfier (WOEparse) learned from dependency path patterns.
Related Work
Shallow or Deep Parsing: Shallow features, like POS tags , enable fast extraction over large-scale corpora (Davidov et al., 2007; Banko et al., 2007).
Wikipedia-based Open IE
NLP Annotation: As we discuss fully in Section 4 (Experiments), we consider several variations of our system; one version, WOEparse, uses parser-based features, while another, WOEPOS , uses shallow features like POS tags , which may be more quickly computed.
Wikipedia-based Open IE
Depending on which version is being trained, the preprocessor uses OpenNLP to supply POS tags and NP—chunk annotations — or uses the Stanford Parser to create a dependency parse.
Wikipedia-based Open IE
We learn two kinds of extractors, one (WOEparse) using features from dependency-parse trees and the other (WOEPOS) limited to shallow features like POS tags .
POS tags is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Chen, Wenliang and Kazama, Jun'ichi and Torisawa, Kentaro
Bilingual subtree constraints
For the source part, we replace nouns and verbs using their POS tags (coarse grained tags).
Bilingual subtree constraints
For example, we have the subtree pair: “H %(society):2-ifl €%(fringe):0” and “fringes(W_2):0-of:1-society(W_1):2”, where “of” does not have a corresponding word, the POS tag of “fiéflsocietyY’ is N, and the POS tag of “53 é%(fringe)” is N. The source part of the rule becomes “N22-N20” and the target part becomes “W_2:0-of:1-W_1:2”.
Experiments
For Chinese unannotated data, we used the XIN_CMN portion of Chinese Gigaword Version 2.0 (LDC2009T14) (Huang, 2009), which has approximately 311 million words whose segmentation and POS tags are given.
Experiments
We used the MMA system (Kruengkrai et al., 2009) trained on the training data to perform word segmentation and POS tagging and used the Baseline Parser to parse all the sentences in the data.
Experiments
The POS tags were assigned by the MXPOST tagger trained on training data.
POS tags is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Huang, Fei and Yates, Alexander
Introduction
For the chunker and POS tagger , the drop-offs are less severe: 94.89 to 91.73, and 97.36 to 94.73.
Introduction
We use an open source CRF software package to implement our CRF models.1 We use words, POS tags , chunk labels, and the predicate label at the preceding and following nodes as features for our Baseline system.
Introduction
0 P05 before, after predicate: the POS tag of the tokens immediately preceding and following the predicate
POS tags is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Mi, Haitao and Liu, Qun
Decoding
where the first two terms are translation and language model probabilities, 6(0) is the target string (English sentence) for derivation 0, the third and forth items are the dependency language model probabilities on the target side computed with words and POS tags separately, De (0) is the target dependency tree of 0, the fifth one is the parsing probability of the source side tree TC(0) 6 FC, the ill(0) is the penalty for the number of ill-formed dependency structures in 0, and the last two terms are derivation and translation length penalties, respectively.
Decoding
In order to alleviate the problem of data sparse, we also compute a dependency language model for POS tages over a dependency tree.
Decoding
the POS tag information on the target side for each constituency-to-dependency rule.
Experiments
We also store the POS tag information for each word in dependency trees, and compute two different dependency language models for words and POS tags in dependency tree separately.
POS tags is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Connor, Michael and Gertner, Yael and Fisher, Cynthia and Roth, Dan
Testing SRL Performance
When trained on arguments identified via the unsupervised POS tagger , noun pattern features promoted agent interpretations of tran-
Unsupervised Parsing
To implement this division into function and content words3, we start with a list of function word POS tags4 and then find words that appear predominantly with these POS tags , using tagged WSJ data (Marcus et al., 1993).
Unsupervised Parsing
Smaller numbers are better, indicating less information lost in moving from the HMM states to the gold POS tags .
Unsupervised Parsing
We first evaluate these parsers (the first stage of our SRL system) on unsupervised POS tagging .
POS tags is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Jiang, Wenbin and Liu, Qun
Experiments
For English, we use the automatically-assigned POS tags produced by an implementation of the POS tagger of Collins (2002).
Experiments
While for Chinese, we just use the gold-standard POS tags following the tradition.
Experiments
Both English and Chinese sentences are tagged by the implementations of the POS tagger of Collins (2002), which trained on W8] and CTB 5.0 respectively.
Word-Pair Classification Model
1 Each feature is composed of some words and POS tags surrounded word 7' and/or word j, as well as an optional distance representations between this two words.
POS tags is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Kummerfeld, Jonathan K. and Roesner, Jessika and Dawborn, Tim and Haggerty, James and Curran, James R. and Clark, Stephen
Background
The C&C supertagger is similar to the Ratnaparkhi (1996) tagger, using features based on words and POS tags in a five-word window surrounding the target word, and defining a local probability distribution over supertags for each word in the sentence, given the previous two supertags.
Data
For supertagger evaluation, one thousand sentences were manually annotated with CCG lexical categories and POS tags .
Introduction
Since the CCG lexical category set used by the supertagger is much larger than the Penn Treebank POS tag set, the accuracy of supertagging is much lower than POS tagging ; hence the CCG supertagger assigns multiple supertags1 to a word, when the local context does not provide enough information to decide on the correct supertag.
Introduction
(2003) were unable to improve the accuracy of POS tagging using self-training.
POS tags is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Ravi, Sujith and Baldridge, Jason and Knight, Kevin
Introduction
The most ambiguous word has 7 different POS tags associated with it.
Minimized models for supertagging
We also wish to scale our methods to larger data settings than the 24k word tokens in the test data used in the POS tagging task.
Minimized models for supertagging
On the simpler task of unsupervised POS tagging with a dictionary, we compared our method versus directly solving I Pongmal and found that the minimization (in terms of grammar size) achieved by our method is close to the optimal solution for the original objective and yields the same tagging accuracy far more efficiently.
Minimized models for supertagging
Ravi and Knight (2009) exploited this to iteratively improve their POS tag model: since the first minimization procedure is seeded with a noisy grammar and tag dictionary, iterating the IP procedure with progressively better grammars further improves the model.
POS tags is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Fowler, Timothy A. D. and Penn, Gerald
A Latent Variable CCG Parser
In the supertagging literature, POS tagging and supertagging are distinguished — POS tags are the traditional Penn treebank tags (e. g. NN, VBZ and DT) and supertags are CCG categories.
A Latent Variable CCG Parser
However, because the Petrov parser trained on CCGbank has no notion of Penn treebank POS tags , we can only evaluate the accuracy of the supertags.
A Latent Variable CCG Parser
Despite the lack of POS tags in the Petrov parser, we can see that it performs slightly better than the Clark and Curran parser.
POS tags is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Lavergne, Thomas and Cappé, Olivier and Yvon, François
Conditional Random Fields
3-grm 10.74% 14.3M 14.59% 0.3M 5-grm 8.48% 132.5M 11.54% 2.5M POS tagging
Conditional Random Fields
For the POS tagging task, BCD appears to be unpractically slower to train than the others approaches (SGD takes about 40min to train, OWL-QN about 1 hour) due the simultaneous increase in the sequence length and in the number of observations.
Conditional Random Fields
Based on this observation, we have designed an incremental training strategy for the POS tagging task, where more specific features are progressively incorporated into the model if the corresponding less specific feature is active.
POS tags is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Turian, Joseph and Ratinov, Lev-Arie and Bengio, Yoshua
Clustering-based word representations
Ushioda (1996) presents an extension to the Brown clustering algorithm, and learn hierarchical clusterings of words as well as phrases, which they apply to POS tagging .
Clustering-based word representations
Li and McCallum (2005) use an HMM-LDA model to improve POS tagging and Chinese Word Segmentation.
Clustering-based word representations
(2009) use an HMM to assign POS tags to words, which in turns improves the accuracy of the PCFG—based Hebrew parser.
POS tags is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: