Abstract | We investigate different ways of learning structured perceptron models for coreference resolution when using nonlocal features and beam search . |
Introducing Nonlocal Features | In order to keep some options around during search, we extend the best-first decoder with beam search . |
Introducing Nonlocal Features | Beam search works incrementally by keeping an agenda of state items. |
Introducing Nonlocal Features | The beam search decoder can be plugged into the training algorithm, replacing the calls to arg max. |
Introduction | We show that for the task of coreference resolution the straightforward combination of beam search and early update (Collins and Roark, 2004) falls short of more limited feature sets that allow for exact search. |
Related Work | (2004) also apply beam search at test time, but use a static assignment of antecedents and learns log-linear model using batch learning. |
Experimental Setup | Parameters and Solver In our experiments we set k in our beam search algorithm (Section 5) to 200, and l to 20. |
Inference | Therefore, we approximate this computation using beam search . |
Inference | During learning we compute the second term in the gradient (Equation 2) using our beam search approximation. |
Learning | Section 5 describes how we approximate the two terms of the gradient using beam search . |
Mapping Word Problems to Equations | We use a beam search inference procedure to approximately compute Equation 1, as described in Section 5. |
Problem Description | During composition we retain intermediate paths like M 33 utilizing the ability to do lazy composition (Mohri and Pereira, 1998) in order to facilitate beam search through the multi-alignment. |
Problem Description | However, performing a beam search over the composed WFST in equation 2 allows us to accommodate such constraints across multiple sequences. |
Problem Description | The accuracy of the WFST—based representation and beam search across all sequences using the coreference and temporal relation scores to obtain the combined aligned sequence is 78.9%. |
RNN-based Alignment Model | Thus, the Viterbi alignment is computed approximately using heuristic beam search . |
Training | To reduce computation, we employ NCE, which uses randomly sampled sentences from all target language sentences in Q as e‘, and calculate the expected values by a beam search with beam width W to truncate alignments with low scores. |
Training | GEN is a subset of all possible word alignments (I), which is generated by beam search . |