Index of papers in Proc. ACL that mention

beam size

Seen in text as:

beam size (82)
beam sizes (15)
Beam size (3)

Seen in 93 sentences in 15 papers.

1. Transition-based Dependency Parsing with Selectional Branching

Choi, Jinho D. and McCallum, Andrew

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	For selectional branching, the margin threshold m and the beam size I) need to be tuned (Section 3.3).
Experiments	For this development set, the beam size of 64 and 80 gave the exact same result, so we kept the one with a larger beam size (I) = 80).
Experiments	Figure 3: Parsing accuracies with respect to margins and beam sizes on the English development set.
Introduction	Thus, it is preferred if the beam size is not fixed but proportional to the number of low confidence predictions that a greedy parser makes, in which case, fewer transition sequences need to be explored to produce the same or similar parse output.
Selectional branching	Thus, it is preferred if the beam size is not fixed but proportional to the number of low confidence predictions made for the one-best sequence.
Selectional branching	The selectional branching method presented here performs at most d - t — 6 transitions, where t is the maximum number of transitions performed to generate a transition sequence, d = min(b, \|A\|+1), bis the beam size , \|A\| is the number of low confidence predictions made for the one-best sequence, and e = @.
Selectional branching	Compared to beam search that always performs b - t transitions, selectional branching is guaranteed to perform fewer transitions given the same beam size because d g b and e > 0 except for d = 1, in which case, no branching happens.

beam size is mentioned in 21 sentences in this paper.

Topics mentioned in this paper:

2. Joint Event Extraction via Structured Prediction with Global Features

Li, Qi and Ji, Heng and Huang, Liang

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Figure 6 shows the training curves of the averaged perceptron with respect to the performance on the development set when the beam size is 4.
Experiments	4.4 Impact of beam size
Experiments	The beam size is an important hyper parameter in both training and test.
Joint Framework for Event Extraction	In Section 4.5 we will show that the standard perceptron introduces many invalid updates especially with smaller beam sizes , also observed by Huang et al.
Joint Framework for Event Extraction	Then the K -best partial configurations are selected to the beam, assuming the beam size is K.
Joint Framework for Event Extraction	K: Beam size .

beam size is mentioned in 15 sentences in this paper.

Topics mentioned in this paper:

3. Incremental Joint Approach to Word Segmentation, POS Tagging, and Dependency Parsing in Chinese

Hatori, Jun and Matsuzaki, Takuya and Miyao, Yusuke and Tsujii, Jun'ichi

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Model	We have some parameters to tune: parsing feature weight 0p, beam size , and training epoch.
Model	In this experiment, the external dictionaries are not used, and the beam size of 32 is used.
Model	Table 3 shows the performance and speed of the full joint model (with no dictionaries) on CTB-Sc-l with respect to the beam size .

beam size is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

4. Head-driven Transition-based Parsing with Top-down Prediction

Hayashi, Katsuhiko and Watanabe, Taro and Asahara, Masayuki and Matsumoto, Yuji

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	length, comparing with top-down, 2nd-MST and shift-reduce parsers ( beam size : 8, pred size: 5)
Experiments	During training, we fixed the prediction size and beam size to 5 and 16, respectively, judged by pre-
Experiments	After 25 iterations of perceptron training, we achieved 92.94 unlabeled accuracy for top-down parser with the FIRST function and 93.01 unlabeled accuracy for shift-reduce parser on development data by setting the beam size to 8 for both parsers and the prediction size to 5 in top-down parser.
Introduction	The complexity becomes 0(n2 >x< b) where b is the beam size .

beam size is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

5. Chinese Parsing Exploiting Characters

Zhang, Meishan and Zhang, Yue and Che, Wanxiang and Liu, Ting

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Figure 6: Accuracies against the training epoch for joint segmentation and tagging as well as joint phrase-structure parsing using beam sizes 1, 4, l6 and 64, respectively.
Experiments	Figure 6 shows the accuracies of our model using different beam sizes with respect to the training epoch.
Experiments	The performance of our model increases as the beam size increases.
Introduction	With linear-time complexity, our parser is highly efficient, processing over 30 sentences per second with a beam size of 16.

beam size is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

6. Discriminative Pruning for Discriminative ITG Alignment

Liu, Shujie and Li, Chi-Ho and Zhou, Ming

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Evaluation	The results for W-DITG are listed in Table l. Tests 1 and 2 show that with the same beam size (i.e.
Evaluation	To enable TTT achieving similar F-score or F-score upper bound, the beam size has to be doubled and the time cost is more than twice the original (c.f.
Evaluation	This also explains (in Tables 2 and 3) why DPDI with beam size 10 leads to higher Bleu than TTT with beam size 20, even though both pruning methods lead to roughly the same alignment F-score
Pruning in ITG Parsing	The third type of pruning is equivalent to minimizing the beam size of alignment hypotheses in each hypernode.
The DPDI Framework	Note that the beam size (max number of E-spans) for each F-span is 10.

beam size is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

7. Learning Structured Perceptrons for Coreference Resolution with Latent Antecedents and Non-local Features

Björkelund, Anders and Kuhn, Jonas

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Setup	Unless otherwise stated we use 25 iterations of perceptron training and a beam size of 20.
Introducing Nonlocal Features	The subset of size k (the beam size ) of the highest scoring expansions are retained and put back into the agenda for the next step.
Introducing Nonlocal Features	Algorithm 2 Beam search and early update Input: Data set D, epochs T, beam size k: Output: weight vector to a 1: w = 0 2: fort 6 LT do
Introducing Nonlocal Features	Input: Data set D, iterations T, beam size k: Output: weight vector to 1: w = 6) 2: fort 6 LT do ~ for <M¢,Ai,¢4¢> E D do Agendaa = Agendap = Aacc : 6) lossacc = 0 for j 6 Ln do ~ Agendaa = EXPAND(AgendaG, Aj, mj, k) Agendap = EXPAND(Agendap, Aj, mj, k) if fl CONTAINsCORRECT(Agendap) then 3] = EXTRACTBEST(AgendaG) g = EXTRACTBEST(Agendap) Aacc = Aacc + _ lossacc = lossacc + LOSS(jI]) Agendap = Agendaa g = EXTRACTBEST(Agendap) if fl CORRECTQQ) then g = EXTRACTBEST(AgendaG) Aace = Aacc + _ lossacc = lossacc + LOSS(:IQ) if A... 7e 6’ then update w.r.t.
Results	the English development set as a function of number of training iterations with two different beam sizes , 20 and 100, over the local and nonlocal feature sets.

beam size is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

8. Incremental Syntactic Language Models for Phrase-based Translation

Schwartz, Lane and Callison-Burch, Chris and Schuler, William and Wu, Stephen

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Related Work	HHMM was run with beam size of 2000.
Related Work	HHMM parser beam sizes are indicated for the syntactic LM.
Related Work	Figure 9: Results for Ur—En devtest (only sentences with 1-20 words) with HHMM beam size of 2000 and Moses settings of distortion limit 10, stack size 200, and ttable_limit 20.

beam size is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

9. Discriminative Learning for Joint Template Filling

Minkov, Einat and Zettlemoyer, Luke

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Corporate Acquisitions	We used a default beam size 1:: = 10.
Seminar Extraction Task	Table 1 shows the results of our full model using beam size 1:: = 10, as well as model variants.
Structured Learning	As detailed, only a set of top scoring tuples of size 1:: ( beam size ) is maintained per relation 7“ E T during candidate generation.
Structured Learning	The beam size 1:: allows controlling the tradeoff between performance and cost.

beam size is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

10. A Sentence Compression Based Framework to Query-Focused Multi-Document Summarization

Wang, Lu and Raghavan, Hema and Castelli, Vittorio and Florian, Radu and Cardie, Claire

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Setup	Beam size is fixed at 2000.4 Sentence compressions are evaluated by a 5-gram language model trained on Gigaword (Graff, 2003) by SRILM (Stolcke, 2002).
Results	4We looked at various beam sizes on the heldout data, and observed that the performance peaks around this value.
Sentence Compression	postorder) as a sequence of nodes in T, the set L of possible node labels, a scoring function 8 for evaluating each sentence compression hypothesis, and a beam size N. Specifically, O is a permutation on the set {0, l, .
Sentence Compression	Om, L ={RET, REM, PAR}, hypothesis scorer S, beam size N

beam size is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

11. Collaborative Decoding: Partial Hypothesis Re-ranking Using Translation Consensus between Decoders

Li, Mu and Duan, Nan and Zhang, Dongdong and Li, Chi-Ho and Zhou, Ming

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	By default, the beam size of 20 is used for all decoders in the experiments.
Experiments	For partial hypothesis re-ranking, obtaining more top-ranked results requires increasing the beam size , which is not affordable for large numbers in experiments.
Experiments	We work around this issue by approximating beam sizes larger than 20 by only enlarging the beam size for the span covering the entire source sentence.

beam size is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

12. Iterative Viterbi A* Algorithm for K-Best Sequential Decoding

Huang, Zhiheng and Chang, Yi and Long, Bo and Crespo, Jean-Francois and Dong, Anlei and Keerthi, Sathiya and Wu, Su-Lin

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Background	The lower bound can be initialized to the best sequence score using a beam search (with beam size being 1).
Proposed Algorithms	The search is similar to the beam search with beam size being 1.
Proposed Algorithms	We initialize the lower bound lb with the k-th best score from beam search (with beam size being 11:) at line 1.

beam size is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

Viterbi (72)
CoNLL (15)
beam search (9)

13. Knowledge-Based Question Answering as Machine Translation

Bao, Junwei and Duan, Nan and Zhou, Ming and Zhao, Tiejun

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	As the performance of our KB-QA system relies heavily on the k-best beam approximation, we evaluate the impact of the beam size and list the comparison results in Figure 6.
Introduction	Figure 6: Impacts of beam size on accuracy.
Introduction	We can see that using a small k can achieve better results than baseline, where the beam size is set to be 200.

beam size is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

14. Incremental Joint Extraction of Entity Mentions and Relations

Li, Qi and Ji, Heng

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Algorithm 3.1 The Model	k: beam size .
Experiments	In general a larger beam size can yield better performance but increase training and decoding time.
Experiments	As a tradeoff, we set the beam size as 8 throughout the experiments.

beam size is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

15. Joint POS Tagging and Transition-based Constituent Parsing in Chinese with Non-local Features

Wang, Zhiguo and Xue, Nianwen

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiment	We set the beam size k to 16, which brings a good balance between efficiency and accuracy.
Joint POS Tagging and Parsing with Nonlocal Features	Input: A word-segmented sentence, beam size k. Output: A constituent parse tree.
Transition-based Constituent Parsing	Input: A POS-tagged sentence, beam size k. Output: A constituent parse tree.

beam size is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: