Index of papers in Proc. ACL that mention

overfitting

Seen in text as:

overfitting (80)
overfit (8)

Seen in 90 sentences in 18 papers.

1. Max-Margin Tensor Neural Network for Chinese Word Segmentation

Pei, Wenzhe and Ge, Tao and Chang, Baobao

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Furthermore, a new tensor factorization approach is proposed to speed up the model and avoid overfitting .
Conclusion	Moreover, we propose a tensor factorization approach that effectively improves the model efficiency and avoids the risk of overfitting .
Introduction	by the design of features and the number of features could be so large that the result models are too large for practical use and prone to overfit on training corpus.
Introduction	Moreover, we propose a tensor factorization approach that effectively improves the model efficiency and prevents from overfitting .
Introduction	Not only does this approach improve the efficiency of our model but also it avoids the risk of overfitting .
Max-Margin Tensor Neural Network	Moreover, the additional tensor could bring millions of parameters to the model which makes the model suffer from the risk of overfitting .
Max-Margin Tensor Neural Network	As long as 7“ is small enough, the factorized tensor operation would be much faster than the un-factorized one and the number of free parameters would also be much smaller, which prevent the model from overfitting .
Related Work	However, given the small size of their tensor matrix, they do not have the problem of high time cost and overfitting problem as we faced in modeling a sequence labeling task like Chinese word segmentation.
Related Work	That’s why we propose to decrease computational cost and avoid overfitting with tensor factorization.
Related Work	By introducing tensor factorization into the neural network model for sequence labeling tasks, the model training and inference are speeded up and overfitting is prevented.

overfitting is mentioned in 10 sentences in this paper.

Topics mentioned in this paper:

2. Learning Hierarchical Translation Structure with Linguistic Annotations

Mylonakis, Markos and Sima'an, Khalil

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusions	We address overfitting issues by cross-validating climbing the likelihood of the training data and propose solutions to increase the efficiency and accuracy of decoding.
Introduction	Estimating such grammars under a Maximum Likelihood criterion is known to be plagued by strong overfitting leading to degenerate estimates (DeNero et al., 2006).
Introduction	In contrast, our learning objective not only avoids overfitting the training data but, most importantly, learns joint stochastic synchronous grammars which directly aim at generalisation towards yet unseen instances.
Learning Translation Structure	On the other hand, estimating the parameters under Maximum-Likelihood Estimation (MLE) for the latent translation structure model 19(0) is bound to overfit towards memorising whole sentence-pairs as discussed in (Mylonakis and Sima’an, 2010), with the resulting grammar estimate not being able to
Learning Translation Structure	However, apart from overfitting towards long phrase-pairs, a grammar with millions of structural rules is also liable to overfit towards degenerate latent structures which, while fitting the training data well, have limited applicability to unseen sentences.
Learning Translation Structure	The CV—criterion, apart from avoiding overfitting , results in discarding the structural rules which are only found in a single part of the training corpus, leading to a more compact grammar while still retaining millions of structural rules that are more hopeful to generalise.
Related Work	We show that a translation system based on such a joint model can perform competitively in comparison with conditional probability models, when it is augmented with a rich latent hierarchical structure trained adequately to avoid overfitting .
Related Work	Cohn and Blunsom (2009) sample rules of the form proposed in (Galley et al., 2004) from a Bayesian model, employing Dirichlet Process priors favouring smaller rules to avoid overfitting .

overfitting is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

3. Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing

Zhang, Hao and Quirk, Chris and Moore, Robert C. and Gildea, Daniel

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion	On top of these hard constraints, the sparse prior of VB helps make the model less prone to overfitting to infrequent phrase pairs, and thus improves the quality of the phrase pairs the model learns.
Experiments	Using EM, because of overfitting , AER drops first and increases again as the number of iterations varies from 1 to 10.
Experiments	The gain is especially large on the test data set, indicating VB is less prone to overfitting .
Introduction	In this direction, Expectation Maximization at the phrase level was proposed by Marcu and Wong (2002), who, however, experienced two major difficulties: computational complexity and controlling overfitting .
Introduction	Computational complexity arises from the exponentially large number of decompositions of a sentence pair into phrase pairs; overfitting is a problem because as EM attempts to maximize the likelihood of its training data, it prefers to directly explain a sentence pair with a single phrase pair.
Introduction	We address the tendency of EM to overfit by using Bayesian methods, where sparse priors assign greater mass to parameter vectors with fewer nonzero values therefore favoring shorter, more frequent phrases.
Variational Bayes for ITG	If we do not put any constraint on the distribution of phrases, EM overfits the data by memorizing every sentence pair.

overfitting is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

4. Bayesian Synchronous Tree-Substitution Grammar Induction and Its Application to Sentence Compression

Yamangil, Elif and Shieber, Stuart M.

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion	Our investigation with variational Bayes showed that the improvement is due both to finding sparse grammars (mitigating overfitting ) and to searching over the space of all grammars (mitigating narrowness).
Evaluation	EM gives a strong baseline since it already uses rules that are limited in depth and number of frontier nodes by stipulation, helping with the overfitting we have mentioned, surprisingly outperforming its discriminative counterpart in both precision and recall (and consequently RelFl).
Evaluation	We conclude that the mitigation of the two factors (narrowness and overfitting ) both contribute to the performance gain of GS.5
Introduction	In summary, previous methods suffer from problems of narrowness of search, having to restrict the space of possible rules, and overfitting in preferring overly specific grammars.
Introduction	We pursue the use of hierarchical probabilistic models incorporating sparse priors to simultaneously solve both the narrowness and overfitting problems.
Introduction	Segmentation is achieved by introducing a prior bias towards grammars that are compact representations of the data, namely by enforcing simplicity and sparsity: preferring simple rules (smaller segments) unless the use of a complex rule is evidenced by the data (through repetition), and thus mitigating the overfitting problem.
The STSG Model	(Eisner, 2003) However, as noted earlier, EM is subject to the narrowness and overfitting problems.

overfitting is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

5. Medical Relation Extraction with Manifold Models

Wang, Chang and Fan, James

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	the approaches that completely depend on the labeled data are likely to run into overfitting .
Experiments	Linear SVM performed better than the other two, since the large-margin constraint together with the linear model constraint can alleviate overfitting .
Introduction	When we build a naive model to detect relations, the model tends to overfit for the labeled data.
Relation Extraction with Manifold Models	Integration of the unlabeled data can help solve overfitting problems when the labeled data is not sufficient.
Relation Extraction with Manifold Models	The second term is useful to bound the mapping function f and prevents overfitting from happening.
Relation Extraction with Manifold Models	0 The algorithm exploits unlabeled data, which helps prevent “overfitting” from happening.

overfitting is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

6. Learning 5000 Relational Extractors

Hoffmann, Raphael and Zhang, Congle and Weld, Daniel S.

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Without cross-training we observe a reduction in performance, due to overfitting .
Extraction with Lexicons	However, there is a danger of overfitting , which we discuss in Section 4.2.4.
Extraction with Lexicons	4.2.4 Preventing Lexicon Overfitting
Extraction with Lexicons	If we now train the CRF on the same examples that generated the lexicon features, then the CRF will likely overfit , and weight the lexicon features too highly!
Related Work	Crucual to LUCHS’s different setting is also the need to avoid overfitting .

overfitting is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

CRF (10)
F1 score (8)
overfitting (6)

7. Word Clustering and Word Selection Based Feature Reduction for MaxEnt Based Hindi NER

Saha, Sujan Kumar and Mitra, Pabitra and Sarkar, Sudeshna

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	These methods tend to overfit when the available training corpus is limited especially if the number of features is large or the number of values for a feature is large.
Conclusion	This is probably due to reduction of overfitting .
Introduction	In an effort to reduce overfitting , they use a combination of a Gaussian prior and early-stopping.
Introduction	This is due to overfitting which is a serious problem in most of the NLP tasks in resource poor languages where annotated data is scarce.
Maximum Entropy Based Model for Hindi NER	From the above discussion it is clear that the system suffers from overfitting if a large number of features are used to train the system.

overfitting is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

8. A Semiparametric Gaussian Copula Regression Model for Predicting Financial Risks from Earnings Calls

Wang, William Yang and Hua, Zhenhao

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Copula Models for Text Regression	On the other hand, once such assumptions are removed, another problem arises — they might be prone to errors, and suffer from the overfitting issue.
Copula Models for Text Regression	Therefore, coping with the tradeoff between expressiveness and overfitting , seems to be rather important in statistical approaches that capture stochastic dependency.
Copula Models for Text Regression	This is of crucial importance to modeling text data: instead of using the classic bag-of-words representation that uses raw counts, we are now working with uniform marginal CDFs, which helps coping with the overfitting issue due to noise and data sparsity.
Discussions	The second issue is about overfitting .
Experiments	On the pre-2009 dataset, we see that the linear regression and linear SVM perform reasonably well, but the Gaussian kernel SVM performs less well, probably due to overfitting .

overfitting is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

9. A user-centric model of voting intention from Social Media

Lampos, Vasileios and Preoţiuc-Pietro, Daniel and Cohn, Trevor

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Notice that there is a large performance improvement after the first step (which alone is a linear solver), but overfitting occurs after step 11.
Experiments	This might be a result of overfitting the model to a single response variable which usually has a smooth behaviour.
Experiments	On the contrary, the multitask learning property of BGL reduces this type of overfitting providing more statistical evidence for the terms and users and thus, yielding not only a better inference performance, but also a more accurate model.
Methods	Although flexible, this approach would be doomed to failure due to the sheer size of the resulting feature set, and the propensity to overfit all but the largest of training sets.
Methods	The El-norm regularisation has found many applications in several scientific fields as it encourages sparse solutions which reduce the possibility of overfitting and enhance the interpretability of the inferred model (Hastie et al., 2009).

overfitting is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

10. Kneser-Ney Smoothing on Expected Counts

Zhang, Hui and Chiang, David

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	In speech and language processing, smoothing is essential to reduce overfitting , and Kneser-Ney (KN) smoothing (Kneser and Ney, 1995; Chen and Goodman, 1999) has consistently proven to be among the best-performing and most widely used methods.
Word Alignment	It also contains most of the model’s parameters and is where overfitting occurs most.
Word Alignment	However, MLE is prone to overfitting , one symptom of which is the “garbage collection” phenomenon where a rare English word is wrongly aligned to many French words.
Word Alignment	To reduce overfitting , we use expected KN smoothing during the M step.

overfitting is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

11. Topic Modeling Based Classification of Clinical Reports

Sarioglu, Efsun and Yadav, Kabir and Choi, Hyeong-Ah

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Background	PLSA solves the polysemy problem; however it is not considered a fully generative model of documents and it is known to be overfitting (Blei et al., 2003).
Background	LDA performs better than PLSA for small datasets since it avoids overfitting and it supports polysemy (Blei et al., 2003).
Experiments	LDA was chosen to generate the topic models of clinical reports due to its being a generative probabilistic system for documents and its robustness to overfitting .
Experiments	SVM was chosen as the classification algorithm as it was shown that it performs well in text classification tasks (J oachims, 1998; Yang and Liu, 1999) and it is robust to overfitting (Sebastiani, 2002).

overfitting is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

topic model (23)
SVM (14)
LDA (6)

12. Smaller Alignment Models for Better Translations: Unsupervised Word Alignment with the l0-norm

Vaswani, Ashish and Huang, Liang and Chiang, David

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion	We have extended the IBM models and HMM model by the addition of an (0 prior to the word-to-word translation model, which compacts the word-to-word translation table, reducing overfitting , and, in particular, the “garbage collection” effect.
Method	Maximum likelihood training is prone to overfitting , especially in models with many parameters.
Method	In word alignment, one well-known manifestation of overfitting is that rare words can act as “garbage collectors”
Method	We have previously proposed another simple remedy to overfitting in the context of unsupervised part-of-speech tagging (Vaswani et al., 2010), which is to minimize the size of the model using a smoothed (0 prior.

overfitting is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

13. Towards Open-Domain Semantic Role Labeling

Croce, Danilo and Giannone, Cristina and Annesi, Paolo and Basili, Roberto

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

A Distributional Model for Argument Classification	First, we propose a model that does not depend on complex syntactic information in order to minimize the risk of overfitting .
Abstract	The resulting argument classification model promotes a simpler feature space that limits the potential overfitting effects.
Introduction	Notice how this is also a general problem of statistical learning processes, as large fine grain feature sets are more exposed to the risks of overfitting .
Related Work	While these approaches increase the expressive power of the models to capture more general linguistic properties, they rely on complex feature sets, are more demanding about the amount of training information and increase the overall exposure to overfitting effects.

overfitting is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

14. Fast Online Training with Frequency-Adaptive Learning Rates for Chinese Word Segmentation and New Word Detection

Sun, Xu and Wang, Houfeng and Li, Wenjie

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

System Architecture	The second term is a regularizer for reducing overfitting .
System Architecture	To avoid overfitting , we only collect the word unigrams and bigrams whose frequency is larger than 2 in the training set.
System Architecture	To reduce overfitting , we employed an L2 Gaussian weight prior (Chen and Rosenfeld, 1999) for all training methods.

overfitting is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

15. Joint Feature Selection in Distributed Stochastic Learning for Large-Scale Discriminative Training in SMT

Simianer, Patrick and Riezler, Stefan and Dyer, Chris

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	Another possible reason why large training data did not yet show the expected improvements in discriminative SMT is a special overfitting problem of current popular online learning techniques.
Introduction	Selecting features jointly across shards and averaging does counter the overfitting effect that is inherent to stochastic updating.
Joint Feature Selection in Distributed Stochastic Learning	Our algorithm 4 (IterSelSGD) introduces feature selection into distributed learning for increased efficiency and as a more radical measure against overfitting .

overfitting is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

16. Recurrent Neural Networks for Word Alignment Model

Tamura, Akihiro and Watanabe, Taro and Sumita, Eiichiro

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	This constraint prevents each model from overfitting to a particular direction and leads to global optimization across alignment directions.
Training	In addition, an [2 regularization term is added to the objective to prevent the model from overfitting the training data.
Training	The proposed constraint penalizes overfitting to a particular direction and enables two directional models to optimize across alignment directions globally.

overfitting is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

17. Robust Approach to Abbreviating Terms: A Discriminative Latent Variable Model with Global Information

Sun, Xu and Okazaki, Naoaki and Tsujii, Jun'ichi

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abbreviator with Nonlocal Information	The first term expresses the conditional log-likelihood of the training data, and the second term represents a regularizer that reduces the overfitting problem in parameter estimation.
Abbreviator with Nonlocal Information	Since the number of letters in Chinese (more than 10K characters) is much larger than the number of letters in English (26 letters), in order to avoid a possible overfitting problem, we did not apply these feature templates to Chinese abbreviations.
Experiments	To reduce overfitting , we employed a L2 Gaussian weight prior (Chen and Rosenfeld, 1999), with the objective function: MG) = 221:110gP(yz\|Xi,@)-\|\|@\|\|2/02-Dur-ing training and validation, we set 0 = 1 for the DPLVM generators.

overfitting is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

18. Reducing the Annotation Effort for Letter-to-Phoneme Conversion

Dwyer, Kenneth and Kondrak, Grzegorz

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Context ordering	By biasing the decision tree learner toward questions that are intuitively of greater utility, we make it less prone to overfitting on small data samples.
Results	5 The idea of lowering the specificity of letter class questions as the context length increases is due to Kienappel and Kneser (2001), and is intended to avoid overfitting .
Results	Our expectation was that context ordering would be particularly helpful during the early rounds of active learning, when there is a greater risk of overfitting on the small training sets.

overfitting is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: