Index of papers in Proc. ACL 2014 that mention
  • CoNLL
Björkelund, Anders and Kuhn, Jonas
Background
Nevertheless, the two best systems in the latest CoNLL Shared Task on coreference resolution (Pradhan et al., 2012) were both variants of the mention-pair model.
Experimental Setup
We apply our model to the CoNLL 2012 Shared Task data, which includes a training, development, and test set split for three languages: Arabic, Chinese and English.
Experimental Setup
We evaluate our system using the CoNLL 2012 scorer, which computes several coreference metrics: MUC (Vilain et al., 1995), B3 (Bagga and Baldwin, 1998), and CEAFe and CEAFm (Luo, 2005).
Experimental Setup
We also report the CoNLL average (also known as MELA; Denis and Baldridge (2009)), i.e., the arithmetic mean of MUC, B3, and CEAFe.
Features
As a baseline we use the features from Bjorkelund and Farkas (2012), who ranked second in the 2012 CoNLL shared task and is publicly available.
Features
Feature templates were incrementally added or removed in order to optimize the mean of MUC, B3, and CEAFe (i.e., the CoNLL average).
Introduction
The combination of this modification with nonlocal features leads to further improvements in the clustering accuracy, as we show in evaluation results on all languages from the CoNLL 2012 Shared Task —Arabic, Chinese, and English.
Results
Figure 3 shows the CoNLL average on
Results
8Available at http: //conll .
Results
Table 1 displays the differences in F-measures and CoNLL average between the local and nonlocal systems when applied to the development sets for each language.
CoNLL is mentioned in 19 sentences in this paper.
Topics mentioned in this paper:
Ma, Xuezhe and Xia, Fei
Abstract
We perform experiments on three Data sets — Version 1.0 and version 2.0 of Google Universal Dependency Treebanks and Treebanks from CoNLL shared-tasks, across ten languages.
Data and Tools
The treebanks from CoNLL shared-tasks on dependency parsing (Buchholz and Marsi, 2006; Nivre et al., 2007) appear to be another reasonable choice.
Data and Tools
However, previous studies (McDonald et al., 2011; McDonald et al., 2013) have demonstrated that a homogeneous representation is critical for multilingual language technologies that require consistent cross-lingual analysis for downstream components, and the heterogenous representations used in CoNLL shared-tasks treebanks weaken any conclusion that can be drawn.
Data and Tools
For comparison with previous studies, nevertheless, we also run experiments on CoNLL treebanks (see Section 4.4 for more details).
Experiments
4.4 Experiments on CoNLL Treebanks
Experiments
To make a thorough empirical comparison with previous studies, we also evaluate our system without unlabeled data (-U) on treebanks from CoNLL shared task on dependency parsing (Buchholz and Marsi, 2006; Nivre et al., 2007).
Experiments
Table 6: Parsing results on treebanks from CoNLL shared tasks for eight target languages.
CoNLL is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Lei, Tao and Xin, Yu and Zhang, Yuan and Barzilay, Regina and Jaakkola, Tommi
Experimental Setup
Datasets We test our dependency model on 14 languages, including the English dataset from CoNLL 2008 shared tasks and all 13 datasets from CoNLL 2006 shared tasks (Buchholz and Marsi, 2006; Surdeanu et al., 2008).
Introduction
The model was evaluated on 14 languages, using dependency data from CoNLL 2008 and CoNLL 2006.
Problem Formulation
pos, form, lemma and morph stand for the fine POS tag, word form, word lemma and the morphology feature (provided in CoNLL format file) of the current word.
Results
Overall Performance Table 2 shows the performance of our model and the baselines on 14 CoNLL datasets.
Results
Figure 1 shows the average UAS on CoNLL test datasets after each training epoch.
Results
Figure 1: Average UAS on CoNLL testsets after different epochs.
CoNLL is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Luo, Xiaoqiang and Pradhan, Sameer and Recasens, Marta and Hovy, Eduard
Abstract
The proposed BLANC falls back seamlessly to the original one if system mentions are identical to gold mentions, and it is shown to strongly correlate with existing metrics on the 2011 and 2012 CoNLL data.
BLANC for Imperfect Response Mentions
We have updated the publicly available CoNLL coreference scorer1 with the proposed BLANC, and used it to compute the proposed BLANC scores for all the CoNLL 2011 (Pradhan et al., 2011) and 2012 (Pradhan et al., 2012) participants in the official track, where participants had to automatically predict the mentions.
BLANC for Imperfect Response Mentions
Table 3: Pearson’s r correlation coefficients between the proposed BLANC and the other coreference measures based on the CoNLL 2011/2012 results.
BLANC for Imperfect Response Mentions
Figure 1: Correlation plot between the proposed BLANC and the other measures based on the CoNLL 2011/2012 results.
Introduction
The proposed BLANC is applied to the CoNLL 2011 and 2012 shared task participants, and the scores and its correlations with existing metrics are shown in Section 5.
CoNLL is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Zhang, Yuan and Lei, Tao and Barzilay, Regina and Jaakkola, Tommi and Globerson, Amir
Abstract
The model outperforms state-of-the-art results when evaluated on 14 languages of non-projective CoNLL datasets.
Experimental Setup
Datasets We evaluate our model on standard benchmark corpora — CoNLL 2006 and CoNLL 2008 (Buchholz and Marsi, 2006; Surdeanu et al., 2008) — which include dependency treebanks for 14 different languages.
Experimental Setup
We use all sentences in CoNLL datasets during training and testing.
Experimental Setup
We report UAS excluding punctuation on CoNLL datasets, following Martins et al.
CoNLL is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Gormley, Matthew R. and Mitchell, Margaret and Van Durme, Benjamin and Dredze, Mark
Experiments
4We do not report results on Japanese as that data was only made freely available to researchers that competed in CoNLL 2009.
Experiments
6This covers all CoNLL languages but Czech, where feature sets were not made publicly available in either work.
Experiments
Table 5: F1 for SRL approaches (without sense disambiguation) in matched and mismatched trairfltest settings for CoNLL 2005 span and 2008 head supervision.
Related Work
(2012) limit their exploration to a small set of basic features, and included high-resource supervision in the form of lemmas, POS tags, and morphology available from the CoNLL 2009 data.
CoNLL is mentioned in 5 sentences in this paper.
Topics mentioned in this paper: