Index of papers in Proc. ACL that mention
  • beam size
Choi, Jinho D. and McCallum, Andrew
Experiments
For selectional branching, the margin threshold m and the beam size I) need to be tuned (Section 3.3).
Experiments
For this development set, the beam size of 64 and 80 gave the exact same result, so we kept the one with a larger beam size (I) = 80).
Experiments
Figure 3: Parsing accuracies with respect to margins and beam sizes on the English development set.
Introduction
Thus, it is preferred if the beam size is not fixed but proportional to the number of low confidence predictions that a greedy parser makes, in which case, fewer transition sequences need to be explored to produce the same or similar parse output.
Selectional branching
Thus, it is preferred if the beam size is not fixed but proportional to the number of low confidence predictions made for the one-best sequence.
Selectional branching
The selectional branching method presented here performs at most d - t — 6 transitions, where t is the maximum number of transitions performed to generate a transition sequence, d = min(b, |A|+1), bis the beam size , |A| is the number of low confidence predictions made for the one-best sequence, and e = @.
Selectional branching
Compared to beam search that always performs b - t transitions, selectional branching is guaranteed to perform fewer transitions given the same beam size because d g b and e > 0 except for d = 1, in which case, no branching happens.
beam size is mentioned in 21 sentences in this paper.
Topics mentioned in this paper:
Li, Qi and Ji, Heng and Huang, Liang
Experiments
Figure 6 shows the training curves of the averaged perceptron with respect to the performance on the development set when the beam size is 4.
Experiments
4.4 Impact of beam size
Experiments
The beam size is an important hyper parameter in both training and test.
Joint Framework for Event Extraction
In Section 4.5 we will show that the standard perceptron introduces many invalid updates especially with smaller beam sizes , also observed by Huang et al.
Joint Framework for Event Extraction
Then the K -best partial configurations are selected to the beam, assuming the beam size is K.
Joint Framework for Event Extraction
K: Beam size .
beam size is mentioned in 15 sentences in this paper.
Topics mentioned in this paper:
Hatori, Jun and Matsuzaki, Takuya and Miyao, Yusuke and Tsujii, Jun'ichi
Model
We have some parameters to tune: parsing feature weight 0p, beam size , and training epoch.
Model
In this experiment, the external dictionaries are not used, and the beam size of 32 is used.
Model
Table 3 shows the performance and speed of the full joint model (with no dictionaries) on CTB-Sc-l with respect to the beam size .
beam size is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Hayashi, Katsuhiko and Watanabe, Taro and Asahara, Masayuki and Matsumoto, Yuji
Experiments
length, comparing with top-down, 2nd-MST and shift-reduce parsers ( beam size : 8, pred size: 5)
Experiments
During training, we fixed the prediction size and beam size to 5 and 16, respectively, judged by pre-
Experiments
After 25 iterations of perceptron training, we achieved 92.94 unlabeled accuracy for top-down parser with the FIRST function and 93.01 unlabeled accuracy for shift-reduce parser on development data by setting the beam size to 8 for both parsers and the prediction size to 5 in top-down parser.
Introduction
The complexity becomes 0(n2 >x< b) where b is the beam size .
beam size is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Zhang, Meishan and Zhang, Yue and Che, Wanxiang and Liu, Ting
Experiments
Figure 6: Accuracies against the training epoch for joint segmentation and tagging as well as joint phrase-structure parsing using beam sizes 1, 4, l6 and 64, respectively.
Experiments
Figure 6 shows the accuracies of our model using different beam sizes with respect to the training epoch.
Experiments
The performance of our model increases as the beam size increases.
Introduction
With linear-time complexity, our parser is highly efficient, processing over 30 sentences per second with a beam size of 16.
beam size is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Liu, Shujie and Li, Chi-Ho and Zhou, Ming
Evaluation
The results for W-DITG are listed in Table l. Tests 1 and 2 show that with the same beam size (i.e.
Evaluation
To enable TTT achieving similar F-score or F-score upper bound, the beam size has to be doubled and the time cost is more than twice the original (c.f.
Evaluation
This also explains (in Tables 2 and 3) why DPDI with beam size 10 leads to higher Bleu than TTT with beam size 20, even though both pruning methods lead to roughly the same alignment F-score
Pruning in ITG Parsing
The third type of pruning is equivalent to minimizing the beam size of alignment hypotheses in each hypernode.
The DPDI Framework
Note that the beam size (max number of E-spans) for each F-span is 10.
beam size is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Björkelund, Anders and Kuhn, Jonas
Experimental Setup
Unless otherwise stated we use 25 iterations of perceptron training and a beam size of 20.
Introducing Nonlocal Features
The subset of size k (the beam size ) of the highest scoring expansions are retained and put back into the agenda for the next step.
Introducing Nonlocal Features
Algorithm 2 Beam search and early update Input: Data set D, epochs T, beam size k: Output: weight vector to a 1: w = 0 2: fort 6 LT do
Introducing Nonlocal Features
Input: Data set D, iterations T, beam size k: Output: weight vector to 1: w = 6) 2: fort 6 LT do ~ for <M¢,Ai,¢4¢> E D do Agendaa = Agendap = Aacc : 6) lossacc = 0 for j 6 Ln do ~ Agendaa = EXPAND(AgendaG, Aj, mj, k) Agendap = EXPAND(Agendap, Aj, mj, k) if fl CONTAINsCORRECT(Agendap) then 3] = EXTRACTBEST(AgendaG) g = EXTRACTBEST(Agendap) Aacc = Aacc + _ lossacc = lossacc + LOSS(jI]) Agendap = Agendaa g = EXTRACTBEST(Agendap) if fl CORRECTQQ) then g = EXTRACTBEST(AgendaG) Aace = Aacc + _ lossacc = lossacc + LOSS(:IQ) if A... 7e 6’ then update w.r.t.
Results
the English development set as a function of number of training iterations with two different beam sizes , 20 and 100, over the local and nonlocal feature sets.
beam size is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Schwartz, Lane and Callison-Burch, Chris and Schuler, William and Wu, Stephen
Related Work
HHMM was run with beam size of 2000.
Related Work
HHMM parser beam sizes are indicated for the syntactic LM.
Related Work
Figure 9: Results for Ur—En devtest (only sentences with 1-20 words) with HHMM beam size of 2000 and Moses settings of distortion limit 10, stack size 200, and ttable_limit 20.
beam size is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Minkov, Einat and Zettlemoyer, Luke
Corporate Acquisitions
We used a default beam size 1:: = 10.
Seminar Extraction Task
Table 1 shows the results of our full model using beam size 1:: = 10, as well as model variants.
Structured Learning
As detailed, only a set of top scoring tuples of size 1:: ( beam size ) is maintained per relation 7“ E T during candidate generation.
Structured Learning
The beam size 1:: allows controlling the tradeoff between performance and cost.
beam size is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Wang, Lu and Raghavan, Hema and Castelli, Vittorio and Florian, Radu and Cardie, Claire
Experimental Setup
Beam size is fixed at 2000.4 Sentence compressions are evaluated by a 5-gram language model trained on Gigaword (Graff, 2003) by SRILM (Stolcke, 2002).
Results
4We looked at various beam sizes on the heldout data, and observed that the performance peaks around this value.
Sentence Compression
postorder) as a sequence of nodes in T, the set L of possible node labels, a scoring function 8 for evaluating each sentence compression hypothesis, and a beam size N. Specifically, O is a permutation on the set {0, l, .
Sentence Compression
Om, L ={RET, REM, PAR}, hypothesis scorer S, beam size N
beam size is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Li, Mu and Duan, Nan and Zhang, Dongdong and Li, Chi-Ho and Zhou, Ming
Experiments
By default, the beam size of 20 is used for all decoders in the experiments.
Experiments
For partial hypothesis re-ranking, obtaining more top-ranked results requires increasing the beam size , which is not affordable for large numbers in experiments.
Experiments
We work around this issue by approximating beam sizes larger than 20 by only enlarging the beam size for the span covering the entire source sentence.
beam size is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Huang, Zhiheng and Chang, Yi and Long, Bo and Crespo, Jean-Francois and Dong, Anlei and Keerthi, Sathiya and Wu, Su-Lin
Background
The lower bound can be initialized to the best sequence score using a beam search (with beam size being 1).
Proposed Algorithms
The search is similar to the beam search with beam size being 1.
Proposed Algorithms
We initialize the lower bound lb with the k-th best score from beam search (with beam size being 11:) at line 1.
beam size is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Bao, Junwei and Duan, Nan and Zhou, Ming and Zhao, Tiejun
Introduction
As the performance of our KB-QA system relies heavily on the k-best beam approximation, we evaluate the impact of the beam size and list the comparison results in Figure 6.
Introduction
Figure 6: Impacts of beam size on accuracy.
Introduction
We can see that using a small k can achieve better results than baseline, where the beam size is set to be 200.
beam size is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Li, Qi and Ji, Heng
Algorithm 3.1 The Model
k: beam size .
Experiments
In general a larger beam size can yield better performance but increase training and decoding time.
Experiments
As a tradeoff, we set the beam size as 8 throughout the experiments.
beam size is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Wang, Zhiguo and Xue, Nianwen
Experiment
We set the beam size k to 16, which brings a good balance between efficiency and accuracy.
Joint POS Tagging and Parsing with Nonlocal Features
Input: A word-segmented sentence, beam size k. Output: A constituent parse tree.
Transition-based Constituent Parsing
Input: A POS-tagged sentence, beam size k. Output: A constituent parse tree.
beam size is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: