Conclusion | On top of these hard constraints, the sparse prior of VB helps make the model less prone to overfitting to infrequent phrase pairs, and thus improves the quality of the phrase pairs the model learns. |
Experiments | Using EM, because of overfitting , AER drops first and increases again as the number of iterations varies from 1 to 10. |
Experiments | The gain is especially large on the test data set, indicating VB is less prone to overfitting . |
Introduction | In this direction, Expectation Maximization at the phrase level was proposed by Marcu and Wong (2002), who, however, experienced two major difficulties: computational complexity and controlling overfitting . |
Introduction | Computational complexity arises from the exponentially large number of decompositions of a sentence pair into phrase pairs; overfitting is a problem because as EM attempts to maximize the likelihood of its training data, it prefers to directly explain a sentence pair with a single phrase pair. |
Introduction | We address the tendency of EM to overfit by using Bayesian methods, where sparse priors assign greater mass to parameter vectors with fewer nonzero values therefore favoring shorter, more frequent phrases. |
Variational Bayes for ITG | If we do not put any constraint on the distribution of phrases, EM overfits the data by memorizing every sentence pair. |
Abstract | These methods tend to overfit when the available training corpus is limited especially if the number of features is large or the number of values for a feature is large. |
Conclusion | This is probably due to reduction of overfitting . |
Introduction | In an effort to reduce overfitting , they use a combination of a Gaussian prior and early-stopping. |
Introduction | This is due to overfitting which is a serious problem in most of the NLP tasks in resource poor languages where annotated data is scarce. |
Maximum Entropy Based Model for Hindi NER | From the above discussion it is clear that the system suffers from overfitting if a large number of features are used to train the system. |