Experiments | For selectional branching, the margin threshold m and the beam size I) need to be tuned (Section 3.3). |
Experiments | For this development set, the beam size of 64 and 80 gave the exact same result, so we kept the one with a larger beam size (I) = 80). |
Experiments | Figure 3: Parsing accuracies with respect to margins and beam sizes on the English development set. |
Introduction | Thus, it is preferred if the beam size is not fixed but proportional to the number of low confidence predictions that a greedy parser makes, in which case, fewer transition sequences need to be explored to produce the same or similar parse output. |
Selectional branching | Thus, it is preferred if the beam size is not fixed but proportional to the number of low confidence predictions made for the one-best sequence. |
Selectional branching | The selectional branching method presented here performs at most d - t — 6 transitions, where t is the maximum number of transitions performed to generate a transition sequence, d = min(b, |A|+1), bis the beam size , |A| is the number of low confidence predictions made for the one-best sequence, and e = @. |
Selectional branching | Compared to beam search that always performs b - t transitions, selectional branching is guaranteed to perform fewer transitions given the same beam size because d g b and e > 0 except for d = 1, in which case, no branching happens. |
Experiments | Figure 6 shows the training curves of the averaged perceptron with respect to the performance on the development set when the beam size is 4. |
Experiments | 4.4 Impact of beam size |
Experiments | The beam size is an important hyper parameter in both training and test. |
Joint Framework for Event Extraction | In Section 4.5 we will show that the standard perceptron introduces many invalid updates especially with smaller beam sizes , also observed by Huang et al. |
Joint Framework for Event Extraction | Then the K -best partial configurations are selected to the beam, assuming the beam size is K. |
Joint Framework for Event Extraction | K: Beam size . |
Experiments | Figure 6: Accuracies against the training epoch for joint segmentation and tagging as well as joint phrase-structure parsing using beam sizes 1, 4, l6 and 64, respectively. |
Experiments | Figure 6 shows the accuracies of our model using different beam sizes with respect to the training epoch. |
Experiments | The performance of our model increases as the beam size increases. |
Introduction | With linear-time complexity, our parser is highly efficient, processing over 30 sentences per second with a beam size of 16. |
Experimental Setup | Beam size is fixed at 2000.4 Sentence compressions are evaluated by a 5-gram language model trained on Gigaword (Graff, 2003) by SRILM (Stolcke, 2002). |
Results | 4We looked at various beam sizes on the heldout data, and observed that the performance peaks around this value. |
Sentence Compression | postorder) as a sequence of nodes in T, the set L of possible node labels, a scoring function 8 for evaluating each sentence compression hypothesis, and a beam size N. Specifically, O is a permutation on the set {0, l, . |
Sentence Compression | Om, L ={RET, REM, PAR}, hypothesis scorer S, beam size N |