Conclusion | Our investigation with variational Bayes showed that the improvement is due both to finding sparse grammars (mitigating overfitting ) and to searching over the space of all grammars (mitigating narrowness). |
Evaluation | EM gives a strong baseline since it already uses rules that are limited in depth and number of frontier nodes by stipulation, helping with the overfitting we have mentioned, surprisingly outperforming its discriminative counterpart in both precision and recall (and consequently RelFl). |
Evaluation | We conclude that the mitigation of the two factors (narrowness and overfitting ) both contribute to the performance gain of GS.5 |
Introduction | In summary, previous methods suffer from problems of narrowness of search, having to restrict the space of possible rules, and overfitting in preferring overly specific grammars. |
Introduction | We pursue the use of hierarchical probabilistic models incorporating sparse priors to simultaneously solve both the narrowness and overfitting problems. |
Introduction | Segmentation is achieved by introducing a prior bias towards grammars that are compact representations of the data, namely by enforcing simplicity and sparsity: preferring simple rules (smaller segments) unless the use of a complex rule is evidenced by the data (through repetition), and thus mitigating the overfitting problem. |
The STSG Model | (Eisner, 2003) However, as noted earlier, EM is subject to the narrowness and overfitting problems. |
Experiments | Without cross-training we observe a reduction in performance, due to overfitting . |
Extraction with Lexicons | However, there is a danger of overfitting , which we discuss in Section 4.2.4. |
Extraction with Lexicons | 4.2.4 Preventing Lexicon Overfitting |
Extraction with Lexicons | If we now train the CRF on the same examples that generated the lexicon features, then the CRF will likely overfit , and weight the lexicon features too highly! |
Related Work | Crucual to LUCHS’s different setting is also the need to avoid overfitting . |
A Distributional Model for Argument Classification | First, we propose a model that does not depend on complex syntactic information in order to minimize the risk of overfitting . |
Abstract | The resulting argument classification model promotes a simpler feature space that limits the potential overfitting effects. |
Introduction | Notice how this is also a general problem of statistical learning processes, as large fine grain feature sets are more exposed to the risks of overfitting . |
Related Work | While these approaches increase the expressive power of the models to capture more general linguistic properties, they rely on complex feature sets, are more demanding about the amount of training information and increase the overall exposure to overfitting effects. |