Experiments | For selectional branching, the margin threshold m and the beam size I) need to be tuned (Section 3.3). |
Experiments | For this development set, the beam size of 64 and 80 gave the exact same result, so we kept the one with a larger beam size (I) = 80). |
Experiments | Figure 3: Parsing accuracies with respect to margins and beam sizes on the English development set. |
Introduction | Thus, it is preferred if the beam size is not fixed but proportional to the number of low confidence predictions that a greedy parser makes, in which case, fewer transition sequences need to be explored to produce the same or similar parse output. |
Selectional branching | Thus, it is preferred if the beam size is not fixed but proportional to the number of low confidence predictions made for the one-best sequence. |
Selectional branching | The selectional branching method presented here performs at most d - t — 6 transitions, where t is the maximum number of transitions performed to generate a transition sequence, d = min(b, |A|+1), bis the beam size , |A| is the number of low confidence predictions made for the one-best sequence, and e = @. |
Selectional branching | Compared to beam search that always performs b - t transitions, selectional branching is guaranteed to perform fewer transitions given the same beam size because d g b and e > 0 except for d = 1, in which case, no branching happens. |
Experiments | Figure 6 shows the training curves of the averaged perceptron with respect to the performance on the development set when the beam size is 4. |
Experiments | 4.4 Impact of beam size |
Experiments | The beam size is an important hyper parameter in both training and test. |
Joint Framework for Event Extraction | In Section 4.5 we will show that the standard perceptron introduces many invalid updates especially with smaller beam sizes , also observed by Huang et al. |
Joint Framework for Event Extraction | Then the K -best partial configurations are selected to the beam, assuming the beam size is K. |
Joint Framework for Event Extraction | K: Beam size . |
Model | We have some parameters to tune: parsing feature weight 0p, beam size , and training epoch. |
Model | In this experiment, the external dictionaries are not used, and the beam size of 32 is used. |
Model | Table 3 shows the performance and speed of the full joint model (with no dictionaries) on CTB-Sc-l with respect to the beam size . |
Experiments | length, comparing with top-down, 2nd-MST and shift-reduce parsers ( beam size : 8, pred size: 5) |
Experiments | During training, we fixed the prediction size and beam size to 5 and 16, respectively, judged by pre- |
Experiments | After 25 iterations of perceptron training, we achieved 92.94 unlabeled accuracy for top-down parser with the FIRST function and 93.01 unlabeled accuracy for shift-reduce parser on development data by setting the beam size to 8 for both parsers and the prediction size to 5 in top-down parser. |
Introduction | The complexity becomes 0(n2 >x< b) where b is the beam size . |
Experiments | Figure 6: Accuracies against the training epoch for joint segmentation and tagging as well as joint phrase-structure parsing using beam sizes 1, 4, l6 and 64, respectively. |
Experiments | Figure 6 shows the accuracies of our model using different beam sizes with respect to the training epoch. |
Experiments | The performance of our model increases as the beam size increases. |
Introduction | With linear-time complexity, our parser is highly efficient, processing over 30 sentences per second with a beam size of 16. |
Evaluation | The results for W-DITG are listed in Table l. Tests 1 and 2 show that with the same beam size (i.e. |
Evaluation | To enable TTT achieving similar F-score or F-score upper bound, the beam size has to be doubled and the time cost is more than twice the original (c.f. |
Evaluation | This also explains (in Tables 2 and 3) why DPDI with beam size 10 leads to higher Bleu than TTT with beam size 20, even though both pruning methods lead to roughly the same alignment F-score |
Pruning in ITG Parsing | The third type of pruning is equivalent to minimizing the beam size of alignment hypotheses in each hypernode. |
The DPDI Framework | Note that the beam size (max number of E-spans) for each F-span is 10. |
Experimental Setup | Unless otherwise stated we use 25 iterations of perceptron training and a beam size of 20. |
Introducing Nonlocal Features | The subset of size k (the beam size ) of the highest scoring expansions are retained and put back into the agenda for the next step. |
Introducing Nonlocal Features | Algorithm 2 Beam search and early update Input: Data set D, epochs T, beam size k: Output: weight vector to a 1: w = 0 2: fort 6 LT do |
Introducing Nonlocal Features | Input: Data set D, iterations T, beam size k: Output: weight vector to 1: w = 6) 2: fort 6 LT do ~ for <M¢,Ai,¢4¢> E D do Agendaa = Agendap = Aacc : 6) lossacc = 0 for j 6 Ln do ~ Agendaa = EXPAND(AgendaG, Aj, mj, k) Agendap = EXPAND(Agendap, Aj, mj, k) if fl CONTAINsCORRECT(Agendap) then 3] = EXTRACTBEST(AgendaG) g = EXTRACTBEST(Agendap) Aacc = Aacc + _ lossacc = lossacc + LOSS(jI]) Agendap = Agendaa g = EXTRACTBEST(Agendap) if fl CORRECTQQ) then g = EXTRACTBEST(AgendaG) Aace = Aacc + _ lossacc = lossacc + LOSS(:IQ) if A... 7e 6’ then update w.r.t. |
Results | the English development set as a function of number of training iterations with two different beam sizes , 20 and 100, over the local and nonlocal feature sets. |
Related Work | HHMM was run with beam size of 2000. |
Related Work | HHMM parser beam sizes are indicated for the syntactic LM. |
Related Work | Figure 9: Results for Ur—En devtest (only sentences with 1-20 words) with HHMM beam size of 2000 and Moses settings of distortion limit 10, stack size 200, and ttable_limit 20. |
Corporate Acquisitions | We used a default beam size 1:: = 10. |
Seminar Extraction Task | Table 1 shows the results of our full model using beam size 1:: = 10, as well as model variants. |
Structured Learning | As detailed, only a set of top scoring tuples of size 1:: ( beam size ) is maintained per relation 7“ E T during candidate generation. |
Structured Learning | The beam size 1:: allows controlling the tradeoff between performance and cost. |
Experimental Setup | Beam size is fixed at 2000.4 Sentence compressions are evaluated by a 5-gram language model trained on Gigaword (Graff, 2003) by SRILM (Stolcke, 2002). |
Results | 4We looked at various beam sizes on the heldout data, and observed that the performance peaks around this value. |
Sentence Compression | postorder) as a sequence of nodes in T, the set L of possible node labels, a scoring function 8 for evaluating each sentence compression hypothesis, and a beam size N. Specifically, O is a permutation on the set {0, l, . |
Sentence Compression | Om, L ={RET, REM, PAR}, hypothesis scorer S, beam size N |
Experiments | By default, the beam size of 20 is used for all decoders in the experiments. |
Experiments | For partial hypothesis re-ranking, obtaining more top-ranked results requires increasing the beam size , which is not affordable for large numbers in experiments. |
Experiments | We work around this issue by approximating beam sizes larger than 20 by only enlarging the beam size for the span covering the entire source sentence. |
Background | The lower bound can be initialized to the best sequence score using a beam search (with beam size being 1). |
Proposed Algorithms | The search is similar to the beam search with beam size being 1. |
Proposed Algorithms | We initialize the lower bound lb with the k-th best score from beam search (with beam size being 11:) at line 1. |
Introduction | As the performance of our KB-QA system relies heavily on the k-best beam approximation, we evaluate the impact of the beam size and list the comparison results in Figure 6. |
Introduction | Figure 6: Impacts of beam size on accuracy. |
Introduction | We can see that using a small k can achieve better results than baseline, where the beam size is set to be 200. |
Algorithm 3.1 The Model | k: beam size . |
Experiments | In general a larger beam size can yield better performance but increase training and decoding time. |
Experiments | As a tradeoff, we set the beam size as 8 throughout the experiments. |
Experiment | We set the beam size k to 16, which brings a good balance between efficiency and accuracy. |
Joint POS Tagging and Parsing with Nonlocal Features | Input: A word-segmented sentence, beam size k. Output: A constituent parse tree. |
Transition-based Constituent Parsing | Input: A POS-tagged sentence, beam size k. Output: A constituent parse tree. |