Experimental Setup | We also compare our model against a discriminative reranker . |
Experimental Setup | The reranker operates over the |
Experimental Setup | We then train the reranker by running 10 epochs of cost-augmented MIRA. |
Features | Global Features We used feature shown promising in prior reranking work Chamiak and Johnson (2005), Collins (2000) and Huang (2008). |
Introduction | They first appeared in the context of reranking (Collins, 2000), where a simple parser is used to generate a candidate list which is then reranked according to the scoring function. |
Introduction | Our method provides a more effective mechanism for handling global features than reranking , outperforming it by 1.3%. |
Related Work | The first successful approach in this arena was reranking (Collins, 2000; Charniak and J ohn-son, 2005) on constituency parsing. |
Related Work | Reranking can be combined with an arbitrary scoring function, and thus can easily incorporate global features over the entire parse tree. |
Related Work | Its main disadvantage is that the output parse can only be one of the few parses passed to the reranker . |
Results | 4The MST parser is trained in projective mode for reranking because generating top-k list from second-order non-projective model is intractable. |
Abstract | To do so we follow an n-best list reranking approach that exploits recent advances in learning to rank techniques. |
Discriminative Reranking for OCR | 2.2 Ensemble reranking |
Discriminative Reranking for OCR | In addition to the above mentioned approaches, we couple simple feature selection and reranking models combination via a straightforward ensemble learning method similar to stacked generalization (Wolpert, 1992) and Combiner (Chan and Stolfo, 1993). |
Discriminative Reranking for OCR | These features are used by the baseline system5 as well as by the various reranking methods. |
Experiments | Table 2 presents the WER for our baseline hypothesis, the best hypothesis in the list (our oracle) and our best reranking results which we describe in details in §3.2. |
Experiments | on the reranking performance for one of our best reranking models, namely RankSVM. |
Experiments | 3.2 Reranking results |
Introduction | A straightforward alternative which we advocate in this paper is to use the available information to rerank the hypotheses in the n-best lists. |
Introduction | Discriminative reranking allows each hypothesis to be represented as an arbitrary set of features without the need to explicitly model their interactions. |
Introduction | We describe our features and reranking approach in §2, and we present our experiments and results in §3. |
Abstract | We adapt discriminative reranking to improve the performance of grounded language acquisition, specifically the task of learning to follow navigation instructions from observation. |
Abstract | Unlike conventional reranking used in syntactic and semantic parsing, gold-standard reference trees are not naturally available in a grounded setting. |
Background | The baseline generative model we use for reranking employs the unsupervised PCFG induction approach introduced by Kim and Mooney (2012). |
Introduction | Since their system employs a generative model, discriminative reranking (Collins, 2000) could p0-tentially improve its performance. |
Introduction | By training a discriminative classifier that uses global features of complete parses to identify correct interpretations, a reranker can significantly improve the accuracy of a generative model. |
Introduction | Reranking has been successfully employed to improve syntactic parsing (Collins, 2002b), semantic parsing (Lu et al., 2008; Ge and Mooney, 2006), semantic role labeling (Toutanova et al., 2005), and named entity recognition (Collins, 2002c). |
Experiments and evaluation | Figure 1 gives the results of the reranked thesaurus for these entries in terms of R-precision and MAP against reference W5 for various values of G. Although the level of these measures does not change a lot for G > 5, the graph of Figure 1 shows that G = 15 appears to be an optimal value. |
Experiments and evaluation | 4.3 Evaluation of the reranked thesaurus |
Experiments and evaluation | Table 4 gives the evaluation of the application of our reranking method to the initial thesaurus according to the same principles as in section 4.1. |
Improving a distributional thesaurus | o reranking of entry’s neighbors according to bad neighbors. |
Improving a distributional thesaurus | As mentioned in section 3.1, the starting point of our reranking process is the definition of a model for determining to what extent a word in a sentence, which is not supposed to be known in the context of this task, corresponds or not to a reference word E. This task can also be viewed as a tagging task in which the occurrences of a target word T are labeled with two tags: E and notE. |
Improving a distributional thesaurus | 3.4 Identification of bad neighbors and thesaurus reranking |
Abstract | Using parse accuracy in a simple reranking strategy for self-monitoring, we find that with a state-of-the-art averaged perceptron realization ranking model, BLEU scores cannot be improved with any of the well-known Treebank parsers we tested, since these parsers too often make errors that human readers would be unlikely to make. |
Abstract | Moreover, via a targeted manual analysis, we demonstrate that the SVM reranker frequently manages to avoid vicious ambiguities, while its ranking errors tend to affect fluency much more often than adequacy. |
Introduction | To do so—in a nutshell—we enumerate an n-best list of realizations and rerank them if necessary to avoid vicious ambiguities, as determined by one or more automatic parsers. |
Introduction | Consequently, we examine two reranking strategies, one a simple baseline approach and the other using an SVM reranker (J oachims, 2002). |
Introduction | Our simple reranking strategy for self-monitoring is to rerank the realizer’s n-best list by parse accuracy, preserving the original order in case of ties. |
Abstract | Conventional n-best reranking techniques often suffer from the limited scope of the n-best list, which rules out many potentially good alternatives. |
Abstract | We instead propose forest reranking, a method that reranks a packed forest of exponentially many parses. |
Abstract | Our final result, an F—score of 91.7, outperforms both 50-best and 100-best reranking baselines, and is better than any previously reported systems trained on the Treebank. |
Introduction | Discriminative reranking has become a popular technique for many NLP problems, in particular, parsing (Collins, 2000) and machine translation (Shen et al., 2005). |
Introduction | Typically, this method first generates a list of top-n candidates from a baseline system, and then reranks this n-best list with arbitrary features that are not computable or intractable to compute within the baseline system. |
Introduction | conventional reranking only at the root DP-based discrim. |
Packed Forests as Hypergraphs | Such a Treebank-style forest is easier to work with for reranking , since many features can be directly expressed in it. |
Abstract | We propose a robust answer reranking model for non-factoid questions that integrates lexical semantics with discourse information, driven by two representations of discourse: a shallow representation centered around discourse markers, and a deep one based on Rhetorical Structure Theory. |
Approach | The proposed answer reranking component is embedded in the QA framework illustrated in Figure 1. |
Approach | CQA: In this scenario, the task is defined as reranking all the user-posted answers for a particular question to boost the community-selected best answer to the top position. |
Approach | These answer candidates are then passed to the answer reranking component, the focus of this work. |
Introduction | We propose a novel answer reranking (AR) model that combines lexical semantics (LS) with discourse information, driven by two representations of discourse: a shallow representation centered around discourse markers and surface text information, and a deep one based on the Rhetorical Structure Theory (RST) discourse framework (Mann and Thompson, 1988). |
Related Work | First, most NF QA approaches tend to use multiple similarity models (information retrieval or alignment) as features in discriminative rerankers (Riezler et al., 2007; Higashinaka and Isozaki, 2008; Verberne et al., 2010; Surdeanu et al., 2011). |
Related Work | (2011) extracted 47 cue phrases such as because from a small collection of web documents, and used the cosine similarity between an answer candidate and a bag of words containing these cue phrases as a single feature in their reranking model for non-factoid why QA. |
Related Work | This classifier was then used to extract instances of causal relations in answer candidates, which were turned into features in a reranking model for J apanense why QA. |
Abstract | The hypergraph structure encodes exponentially many derivations, which we rerank discriminatively using local and global features. |
Introduction | ia Discriminative Reranking |
Introduction | The performance of this baseline system could be potentially further improved using discriminative reranking (Collins, 2000). |
Introduction | Typically, this method first creates a list of n-best candidates from a generative model, and then reranks them with arbitrary features (both local and global) that are either not computable or intractable to compute within the |
Problem Formulation | The hypergraph representation allows us to decompose the feature functions and compute them piecemeal at each hyperarc (or sub-derivation), rather than at the root node as in conventional n-best list reranking . |
Related Work | Discriminative reranking has been employed in many NLP tasks such as syntactic parsing (Char-niak and Johnson, 2005; Huang, 2008), machine translation (Shen et al., 2004; Li and Khudanpur, 2009) and semantic parsing (Ge and Mooney, 2006). |
Related Work | Our model is closest to Huang (2008) who also performs forest reranking on a hypergraph, using both local and nonlocal features, whose weights are tuned with the averaged perceptron algorithm (Collins, 2002). |
Related Work | We adapt forest reranking to generation and introduce several task-specific features that boost performance. |
Abstract | Secondly, integrating multiword expressions in the parser grammar followed by a reranker specific to such expressions slightly improves all evaluation metrics. |
Introduction | Our proposal is to evaluate two discriminative strategies in a real constituency parsing context: (a) pre-grouping MWE before parsing; this would be done with a state-of-the-art recognizer based on Conditional Random Fields; (b) parsing with a grammar including MWE identification and then reranking the output parses thanks to a Maximum Entropy model integrating MWE-dedicated features. |
MWE-dedicated Features | In order to make these models comparable, we use two comparable sets of feature templates: one adapted to sequence labelling (CRF—based MWER) and the other one adapted to reranking (MaXEnt-based reranker ). |
MWE-dedicated Features | The reranker templates are instantiated only for the nodes of the candidate parse tree, which are leaves dominated by a MWE node (i.e. |
MWE-dedicated Features | o RERANKER : for each leaf (in position 77.) |
Two strategies, two discriminative models | 3.2 Reranking |
Two strategies, two discriminative models | Discriminative reranking consists in reranking the n-best parses of a baseline parser with a discriminative model, hence integrating features associated with each node of the candidate parses. |
Two strategies, two discriminative models | Formally, given a sentence 8, the reranker selects the best candidate parse p among a set of candidates P (s) with respect to a scoring function V9: |
Quantitative Evaluation of Lexicons | approach is to rerank the results from stage 1, instead of doing actual binary classification. |
Quantitative Evaluation of Lexicons | 6.1 Reranking using a lexicon |
Quantitative Evaluation of Lexicons | To rerank a list of posts retrieved for a given topic, we opt to use the method that showed best performance at TREC 2008. |
Related Work | In stage (2) one commonly uses either a binary classifier to distinguish between opinionated and non-opinionated documents or applies reranking of the initial result list using some opinion score. |
Related Work | The best performing opinion finding system at TREC 2008 is a two-stage approach using reranking in stage (2) (Lee et al., 2008). |
Related Work | This opinion score is combined with the relevance score, and posts are reranked according to this new score. |
A Distributional Model for Argument Classification | We thus propose to model the reranking phase (RR) as a HMM sequence labeling task. |
Conclusions | the estimation of lexico-grammatical preferences through distributional analysis over unlabeled data), estimation (through syntactic or lexical backoff where necessary) and reranking . |
Empirical Analysis | In these experiments we evaluate the quality of the argument classification step against the lexical knowledge acquired from unlabeled texts and the reranking step. |
Empirical Analysis | The Global Prior model is obtained by applying reranking (Section 3.2) to the best n = 10 candidates provided by the Local Prior model. |
Empirical Analysis | 6) and the HMM-based reranking characterize the final two configurations. |
Related Work | This approach effectively introduces a new step in SRL, also called Joint Reranking , (RR), e.g. |
Abstract | LSH accounts for neighbor candidate pruning, while ITQ provides an efficient and effective reranking over the neighbor pool captured by LSH. |
Document Retrieval with Hashing | In this section, we first provide an overview of applying hashing techniques to a document retrieval task, and then introduce two unsupervised hashing algorithms: LSH acts as a neighbor-candidate filter, while ITQ works towards precise reranking over the candidate pool returned by LSH. |
Document Retrieval with Hashing | Hamming Reranking |
Document Retrieval with Hashing | In this framework, LSH accounts for neighbor candidate pruning, while ITQ provides an efficient and effective reranking over the neighbor pool captured by LSH. |
Experiments | Another crucial observation is that with ITQ reranking , a small number of LSH hash tables is needed in the pruning step. |
Experiments | Since the LSH pruning time can be ignored, the search time of the two-stage hashing scheme equals to the time of hamming distance reranking in ITQ codes for all candidates produced from LSH pruning step, e.g., LSH(48bits, 4 tables) + |
Experiments | 2 (f) shows the ITQ data reranking percentage for different LSH bit lengths and table numbers. |
Experiments of Parsing | We used Charniak’s maximum entropy inspired parser and their reranker (Charniak and Johnson, 2005) for target grammar parsing, called a generative parser (GP) and a reranking parser (RP) respectively. |
Experiments of Parsing | Table 5: Results of the generative parser (GP) and the reranking parser (RP) on the test set, when trained on only CTB training set or an optimal combination of CTB training set and CDTPS . |
Experiments of Parsing | Finally we evaluated two parsing models, the generative parser and the reranking parser, on the test set, with results shown in Table 5. |
Introduction | When coupled with self-training technique, a reranking parser with CTB and converted CDT as labeled data achieves 85.2% f-score on CTB test set, an absolute 1.0% improvement (6% error reduction) over the previous best result for Chinese parsing. |
Morphology-based Vocabulary Expansion | Reranking Models Given that the size of the expanded vocabulary can be quite large and it may include a lot of over-generation, we rerank the expanded set of words before taking the top n words to use in downstream processes. |
Morphology-based Vocabulary Expansion | We consider four reranking conditions which we describe below. |
Morphology-based Vocabulary Expansion | Reranked Expansion |
Conclusion and Future Work | We further examined the results of doing a simple reranking process, constraining the output parse to put paired punctuation in the same clause. |
Conclusion and Future Work | This reranking was found to result in a minor performance gain. |
Experiments | 4.4 Reranking for Paired Punctuation |
Experiments | To rectify this problem, we performed a simple post-hoc reranking of the 50-best parses produced by the best parameter settings (+ Gold tags, - Edge labels), selecting the first parse that places paired punctuation in the same clause, or retum-ing the best parse if none of the 50 parses satisfy the constraint. |
Experiments | Overall, 38 sentences were parsed with paired punctuation in different clauses, of which 16 were reranked . |
Introduction | A further reranking of the parser output based on a constraint involving paired punctuation produces a slight additional performance gain. |
Abstract | (2006), and 3.4% over a nonlocal constituent reranker . |
Analysis | Table 3: Parsing results for reranking 50-best lists of Berkeley parser (Dev is WSJ section 22 and Test is WSJ section 23, all lengths). |
Introduction | For constituent parsing, we rerank the output of the Berkeley parser (Petrov et al., 2006). |
Introduction | For constituent parsing, we use a reranking framework (Charniak and Johnson, 2005; Collins and Koo, 2005; Collins, 2000) and show 9.2% relative error reduction over the Berkeley parser baseline. |
Parsing Experiments | We then add them to a constituent parser in a reranking approach. |
Parsing Experiments | We also verify that our features contribute on top of standard reranking features.3 |
Parsing Experiments | Because the underlying parser does not factor along lexical attachments, we instead adopt the discriminative reranking framework, where we generate the top-k candidates from the baseline system and then rerank this k-best list using (generally nonlocal) features. |
Abstract | We take a maximum entropy reranking approach to the problem which admits arbitrary features on a permutation of modifiers, exploiting hundreds of thousands of features in total. |
Conclusion | The straightforward maximum entropy reranking approach is able to significantly outperform preVious computational approaches by allowing for a richer model of the prenominal modifier ordering process. |
Introduction | By mapping a set of features across the training data and using a maximum entropy reranking model, we can learn optimal weights for these features and then order each set of modifiers in the test data according to our features and the learned weights. |
Introduction | In Section 3 we present the details of our maximum entropy reranking approach. |
Model | We treat the problem of prenominal modifier ordering as a reranking problem. |
Model | At test time, we choose an ordering cc 6 7r(B) using a maximum entropy reranking approach (Collins and Koo, 2005). |
Related Work | In this next section, we describe our maximum entropy reranking approach that tries to develop a more comprehensive model of the modifier ordering process to avoid the sparsity issues that previous ap- |
Evaluation Methodology | KSDEP 1%CONLL RERANK NO—RERANK BERKELEY STANFORD ENJU ENJU—GENIA |
Evaluation Methodology | For the other parsers, we input the concatenation of W8] and GENIA for the retraining, while the reranker of RERANK was not retrained due to its cost. |
Evaluation Methodology | Since the parsers other than NO-RERANK and RERANK require an external POS tagger, a WSJ-trained POS tagger is used with WSJ-trained parsers, and geniatagger (Tsuruoka et al., 2005) is used with GENIA-retrained parsers. |
Experiments | Among these parsers, RERANK performed slightly better than the other parsers, although the difference in the f-score is small, while it requires much higher parsing cost. |
Experiments | retraining yielded only slight improvements for RERANK , BERKELEY, and STANFORD, while larger improvements were observed for MST, KSDEP, NO-RERANK, and ENJU. |
Syntactic Parsers and Their Representations | RERANK Charniak and Johnson (2005)’s rerank-ing parser. |
Syntactic Parsers and Their Representations | The reranker of this parser receives n-best4 parse results from NO-RERANK, and selects the most likely result by using a maximum entropy model with manually engineered features. |
Abstract | Our SR-TSG parser achieves an F 1 score of 92.4% in the Wall Street Journal (WSJ) English Penn Treebank parsing task, which is a 7.7 point improvement over a conventional Bayesian TSG parser, and better than state-of-the-art discriminative reranking parsers. |
Experiment | It should be noted that discriminative reranking parsers such as (Char-niak and Johnson, 2005) and (Huang, 2008) are constructed on a generative parser. |
Experiment | The reranking parser takes the k-best lists of candidate trees or a packed forest produced by a baseline parser (usually a generative model), and then reranks the candidates using arbitrary features. |
Experiment | Hence, we can expect that combining our SR-TSG model with a discriminative reranking parser would provide better performance than SR-TSG alone. |
Introduction | Our SR-TSG parser achieves an F1 score of 92.4% in the WSJ English Penn Treebank parsing task, which is a 7.7 point improvement over a conventional Bayesian TSG parser, and superior to state-of-the-art discriminative reranking parsers. |
Experiments | We report machine translation reranking results in Section 5.4. |
Experiments | The latter report results for two binary classifiers: RERANK uses the reranking features of Charniak and Johnson (2005), and TSG uses |
Experiments | All generative models improve, but TREELET-RULE remains the best, now outperforming the RERANK system, though of course it is likely that RERANK would improve if it could be scaled up to more training data. |
Introduction | We also show fluency improvements in a preliminary machine translation reranking experiment. |
Abstract | Using a reranking parser and a Lexical-Functional Grammar (LFG) annotation, we produce a set of dependency triples for each summary. |
Discussion and future work | Its core modules were updated as well: Minipar was replaced with the Charniak—Johnson reranking parser (Charniak and Johnson, 2005), Named Entity identification was added, and the BE extraction is conducted using a set of Tregex rules (Levy and Andrew, 2006). |
Discussion and future work | Since our method, presented in this paper, also uses the reranking parser, as well as WordNet, it would be interesting to compare both methods directly in terms of the performance of the dependency extraction procedure. |
Introduction | (2004) applied to the output of the reranking parser of Chamiak and Johnson (2005), whereas in BE (in the version presented here) dependencies are generated by the Minipar parser (Lin, 1995). |
Lexical-Functional Grammar and the LFG parser | First, a summary is parsed with the Charniak—Johnson reranking parser (Chamiak and Johnson, 2005) to obtain the phrase-structure tree. |
Experiment | We group parsing systems into three categories: single systems, reranking systems and semi-supervised systems. |
Experiment | Our N0nlocal&Cluster system further improved the parsing F1 to 86.3%, and it outperforms all reranking systems and semi-supervised systems. |
Experiment | *Huang (2009) adapted the parse reranker to CTB5. |
Joint POS Tagging and Parsing with Nonlocal Features | But almost all previous work considered nonlocal features only in parse reranking frameworks. |
Related Work | However, almost all of the previous work use nonlocal features at the parse reranking stage. |
Introduction | Therefore, we use hypergraph reranking (Huang and Chiang, 2007; Huang, 2008), which proves to be effective for integrating nonlocal features into dynamic programming, to alleviate this problem. |
Introduction | 3 In the second pass, we use the hypergraph reranking algorithm (Huang, 2008) to find promising translations using additional dependency features (i.e., features 8-10 in the list). |
Introduction | Table 3 shows the effect of hypergraph reranking . |
Experiments | We used the Charniak parser (Charniak et al., 2005) for our experiment, and we used the proposed algorithm to train the reranking feature weights. |
Experiments | For comparison, we also investigated training the reranker with Perceptron and MIRA. |
Experiments | There are around V = 1.33 million features in all defined for reranking, and the n-best size for reranking is set to 50. |
Introduction | In the reranking framework: in principle, all |
Introduction | the models in previous category can be used in the reranking framework, because in the reranking we have all the information (source and target words/phrases, alignment) about the translation process. |
Introduction | One disadvantage of carrying out reordering in reranking is the representativeness of the N-best list is often a question mark. |
Introduction | There have been nonlocal approaches as well, such as tree-substitution parsers (Bod, 1993; Sima’an, 2000), neural net parsers (Henderson, 2003), and rerankers (Collins and Koo, 2005; Charniak and Johnson, 2005; Huang, 2008). |
Other Languages | it does not use a reranking step or post-hoc combination of parser results. |
Other Languages | 5 Their best parser, and the best overall parser from the shared task, is a reranked product of “Replaced” Berkeley parsers. |
Conclusion and Future work | Reranking could also potentially improve the results (Ge and Mooney, 2006; Lu et al., 2008). |
Experimental Evaluation | available): SCISSOR (Ge and Mooney, 2005), an integrated syntactic-semantic parser; KRISP (Kate and Mooney, 2006), an SVM-based parser using string kernels; WASP (Wong and Mooney, 2006; Wong and Mooney, 2007), a system based on synchronous grammars; Z&C (Zettlemoyer and Collins, 2007)3, a probabilistic parser based on relaxed CCG grammars; and LU (Lu et a1., 2008), a generative model with discriminative reranking . |
Experimental Evaluation | Note that some of these approaches require additional human supervision, knowledge, or engineered features that are unavailable to the other systems; namely, SCISSOR requires gold-standard SAPTs, Z&C requires hand-built template grammar rules, LU requires a reranking model using specially designed global features, and our approach requires an existing syntactic parser. |
Related Work | (2006), who applied a reranked parser to a large unsupervised corpus in order to obtain additional training data for the parser; this self-training appraoch was shown to be quite effective in practice. |
Related Work | However, their approach depends on the usage of a high-quality parse reranker , whereas the method described here simply augments the features of an existing parser. |
Related Work | Note that our two approaches are compatible in that we could also design a reranker and apply self-training techniques on top of the cluster-based features. |