Abstract | The model outperforms most systems participating in the English track of the CoNLL’ 12 shared task . |
Evaluation | We use the data provided for the English track of the CoNLL’ l2 shared task on multilingual coreference resolution (Pradhan et al., 2012) which is a subset of the upcoming OntoNotes 5.0 release and comes with various annotation layers provided by state-of-the-art NLP tools. |
Evaluation | We evaluate the model in a setting that corresponds to the shared task’s closed track, i.e. |
Evaluation | We evaluate our system with the coreference resolution evaluation metrics that were used for the CoNLL shared tasks on coreference, which are MUC (Vilain et al., 1995), B3 (Bagga and Baldwin, 1998) and CEAFe (Luo, 2005). |
Introduction | Quite recently, however, rule-based approaches regained popularity due to Stanford’s multi-pass sieve approach which exhibits state-of-the-art performance on many standard coreference data sets (Raghunathan et al., 2010) and also won the CoNLL-2011 shared task on coreference resolution (Lee et al., 2011; Pradhan et al., 2011). |
Introduction | On the English data of the CoNLL’ 12 shared task the model outperforms most systems which participated in the shared task . |
Related Work | These approaches participated in the recent CoNLL’ll shared task (Pradhan et al., 2011; Sapena et al., 2011; Cai et al., 2011b) with excellent results. |
Related Work | (2012) and ranked second in the English track at the CoNLL’ 12 shared task (Pradhan et al., 2012). |
Related Work | The top performing system at the CoNLL’ 12 shared task (Femandes et al., 2012) |
Abstract | Experimental results on the Helping Our Own shared task show that our method is competitive with state-of-the-art systems. |
Conclusion | Experiments on the H00 2011 shared task show that ILP inference achieves state-of—the-art performance on grammatical error correction. |
Experiments | We follow the evaluation setup in the H00 2011 shared task on grammatical error correction (Dale and Kilgarriff, 2011). |
Experiments | The development set and test set in the shared task consist of conference and workshop papers taken from the Association for Computational Linguistics (ACL). |
Experiments | In the H00 2011 shared task , participants can submit system edits directly or the corrected plain—text system output. |
Inference with Second Order Variables | Corrections are called edits in the H00 2011 shared task . |
Introduction | The task has received much attention in recent years, and was the focus of two shared tasks on grammatical error correction in 2011 and 2012 (Dale and Kilgarriff, 2011; Dale et al., 2012). |
Introduction | We evaluate our proposed ILP approach on the test data from the Helping Our Own (H00) 2011 shared task (Dale and Kilgarriff, 2011). |
Multitask Quality Estimation 4.1 Experimental Setup | These were used by a highly competitive baseline entry in the WMT12 shared task , and were extracted here using the system provided by that shared task.6 They include simple counts, e.g., the tokens in sentences, as well as source and target language model probabilities. |
Multitask Quality Estimation 4.1 Experimental Setup | This is generally a very strong baseline: in the WMT12 QE shared task , only five out of 19 submissions were able to significantly outperform it, and only by including many complex additional features, tree kernels, etc. |
Multitask Quality Estimation 4.1 Experimental Setup | WMT12: Single task We start by comparing GP regression with alternative approaches using the WMT12 dataset on the standard task of predicting a weighted mean quality rating (as it was done in the WMT12 QE shared task ). |
Quality Estimation | For an overview of various algorithms and features we refer the reader to the WMT12 shared task on QE (Callison-Burch et al., 2012). |
Quality Estimation | WMT12: This dataset was distributed as part of the WMT12 shared task on QE (Callison-Burch et al., 2012). |
Conclusion | Our transitive system is more effective at using properties than a pairwise system and a previous entity-level system, and it achieves performance comparable to that of the Stanford coreference resolution system, the winner of the CoNLL 2011 shared task . |
Experiments | We use the datasets, experimental setup, and scoring program from the CoNLL 2011 shared task (Pradhan et al., 2011), based on the OntoNotes corpus (Hovy et al., 2006). |
Experiments | 5 Unfortunately, their publicly-available system is closed-source and performs poorly on the CoNLL shared task dataset, so direct comparison is difficult. |
Introduction | We evaluate our system on the dataset from the CoNLL 2011 shared task using three different types of properties: synthetic oracle properties, entity phi features (number, gender, animacy, and NER type), and properties derived from unsupervised clusters targeting semantic type information. |
Introduction | Our final system is competitive with the winner of the CoNLL 2011 shared task (Lee et al., 2011). |
Abstract | Our experiments on the C0NLL-2012 Shared Task English datasets (gold mentions) indicate that our method is robust relative to different clustering strategies and evaluation metrics, showing large and consistent improvements over a single pairwise model using the same base features. |
Experiments | We evaluated the system on the English part of the corpus provided in the CoNLL-2012 Shared Task (Pradhan et al., 2012), referred to as CoNLL-2012 here. |
Experiments | These metrics were recently used in the CoNLL-2011 and -2012 Shared Tasks . |
Experiments | The best classifier-decoder combination reaches a score of 67.19, which would place it above the mean score (66.41) of the systems that took part in the C0NLL—2012 Shared Task (gold mentions track). |
Introduction | As will be shown based on a variety of experiments on the CoNLL-2012 Shared Task English datasets, these improvements are consistent across different evaluation metrics and for the most part independent of the clustering decoder that was used. |
Experiments | To do so, we use one of the top performing systems from the CoNLL 2012 shared task (Martschat et al., 2012). |
Experiments | These two tasks were performed on documents extracted from the English test part of the CoNLL 2012 shared task (Pradhan et al., 2012). |
Experiments | The system was trained on the English training part of the CoNLL 2012 shared task filtered in the same way as the test part. |