Index of papers in Proc. ACL that mention

shared task

Seen in text as:

shared task (98)
Shared Task (56)
shared tasks (18)
Shared Tasks (5)

Seen in 177 sentences in 27 papers.

1. Grammatical Error Correction Using Integer Linear Programming

Wu, Yuanbin and Ng, Hwee Tou

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Experimental results on the Helping Our Own shared task show that our method is competitive with state-of-the-art systems.
Conclusion	Experiments on the H00 2011 shared task show that ILP inference achieves state-of—the-art performance on grammatical error correction.
Experiments	We follow the evaluation setup in the H00 2011 shared task on grammatical error correction (Dale and Kilgarriff, 2011).
Experiments	The development set and test set in the shared task consist of conference and workshop papers taken from the Association for Computational Linguistics (ACL).
Experiments	In the H00 2011 shared task , participants can submit system edits directly or the corrected plain—text system output.
Inference with Second Order Variables	Corrections are called edits in the H00 2011 shared task .
Introduction	The task has received much attention in recent years, and was the focus of two shared tasks on grammatical error correction in 2011 and 2012 (Dale and Kilgarriff, 2011; Dale et al., 2012).
Introduction	We evaluate our proposed ILP approach on the test data from the Helping Our Own (H00) 2011 shared task (Dale and Kilgarriff, 2011).

shared task is mentioned in 11 sentences in this paper.

Topics mentioned in this paper:

2. Negation Focus Identification with Contextual Discourse Information

Zou, Bowei and Zhou, Guodong and Zhu, Qiaoming

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Evaluation on the *SEM 2012 shared task corpus indicates the usefulness of contextual discourse information in negation focus identification and justifies the effectiveness of our graph model in capturing such global information.
Baselines	Negation focus identification in *SEM’2012 shared tasks is restricted to verbal negations annotated with MNEG in PropBank, with only the constituent belonging to a semantic role selected as negation focus.
Baselines	1 In *SEM’2013, the shared task is changed with focus on "Semantic Textual Similarity".
Baselines	For better illustration of the importance of contextual discourse information, Table 1 shows the statistics of intra- and inter-sentence information necessary for manual negation focus identification with 100 instances randomly extracted from the held-out dataset of *SEM'2012 shared task corpus.
Introduction	Evaluation on the *SEM 2012 shared task corpus (Morante and Blanco, 2012) justifies our approach over several strong baselines.
Related Work	Due to the increasing demand on deep understanding of natural language text, negation recognition has been drawing more and more attention in recent years, with a series of shared tasks and workshops, however, with focus on cue detection and scope resolution, such as the BioNLP 2009 shared task for negative event detection (Kim et al., 2009) and the ACL 2010 Workshop for scope resolution of negation and speculation (Morante and Sporleder, 2010), followed by a special issue of Computational Linguistics (Morante and Sporleder, 2012) for modality and negation.
Related Work	However, although Morante and Blanco (2012) proposed negation focus identification as one of the *SEM’2012 shared tasks , only one team (Rosenberg and Bergler, 2012)1 participated in this task.

shared task is mentioned in 15 sentences in this paper.

Topics mentioned in this paper:

3. Simple Negation Scope Resolution through Deep Parsing: A Semantic Solution to a Semantic Problem

Packard, Woodley and Bender, Emily M. and Read, Jonathon and Oepen, Stephan and Dridan, Rebecca

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	In this work, we revisit Shared Task 1 from the 2012 *SEM Conference: the automated analysis of negation.
Introduction	Owing to its immediate utility in the cura-tion of scholarly results, the analysis of negation and so-called hedges in biomedical research literature has been the focus of several workshops, as well as the Shared Task at the 2011 Conference on Computational Language Learning (CoNLL).
Introduction	1Our running example is a truncated variant of an item from the Shared Task training data.
Introduction	Though the task-specific concept of scope of negation is not the same as the notion of quantifier and operator scope in mainstream underspecified semantics, we nonetheless find that reviewing the 2012 *SEM Shared Task annotations with reference to an explicit encoding of semantic predicate-argument structure suggests a simple and straightforward operationalization of their concept of negation scope.
Related Work	(2012) describe some amount of tailoring of the Boxer lexicon to include more of the Shared Task scope cues among those that produce the negation operator in the DRSs, but otherwise the system appears to directly take the notion of scope of negation from the DRS and project it out to the string, with one caveat: As with the logical-forms representations we use, the DRS logical forms do not include function words as predicates in the semantics.
Related Work	Since the Shared Task gold standard annotations included such arguably semantically vacuous (see Bender, 2013, p. 107) words in the scope, further heuristics are needed to repair the string-based annotations coming from the DRS-based system.
System Description	From these underspecified representations of possible scopal configurations, a scope resolution component can spell out the full range of fully-connected logical forms (Koller and Thater, 2005), but it turns out that such enumeration is not relevant here: the notion of scope encoded in the Shared Task annotations is not concerned with the relative scope of quantifiers and negation, such as the two possible readings of (2) represented informally below:5
System Description	However, as shown below, the information about fixed scopal elements in an underspecified MRS is sufficient to model the Shared Task annotations.
System Description	5 In other words, a possible semantic interpretation of the (string-based) Shared Task annotation guidelines and data is in terms of a quantifier-free approach to meaning representation, or in terms of one where quantifier scope need not be made explicit (as once suggested by, among others, Alshawi, 1992).

shared task is mentioned in 20 sentences in this paper.

Topics mentioned in this paper:

4. Strategies for Contiguous Multiword Expression Analysis and Dependency Parsing

Candito, Marie and Constant, Matthieu

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	In this paper, we investigate various strategies to predict both syntactic dependency parsing and contiguous multiword expression (MWE) recognition, testing them on the dependency version of French Treebank (Abeille and Barrier, 2004), as instantiated in the SPMRL Shared Task (Seddah et al., 2013).
Architectures for MWE Analysis and Parsing	We compare these four architectures between them and also with two simpler architectures used by (Constant et al., 2013) within the SPMRL 13 Shared Task , in which regular and irregular MWEs are not distinguished:
Conclusion	We experimented strategies to predict both MWE analysis and dependency structure, and tested them on the dependency version of French Treebank (Abeille and Barrier, 2004), as instantiated in the SPMRL Shared Task (Seddah et al., 2013).
Data: MWEs in Dependency Trees	The Shared Task used an enhanced version of the constituency-to-dependency conversion of Candito et al.
Experiments	Moreover, we provide in table 5 a comparison of our best architecture with reg/irregular MWE distinction with other architectures that do not make this distinction, namely the two best comparable systems designed for the SPMRL Shared Task (Seddah et a1., 2013): the pipeline simple parser based on Mate-tools of Constant et al.
Introduction	While the realistic scenario of syntactic parsing with automatic MWE recognition (either done jointly or in a pipeline) has already been investigated in constituency parsing (Green et al., 2011; Constant et al., 2012; Green et al., 2013), the French dataset of the SPMRL 2013 Shared Task (Seddah et al., 2013) only recently provided the opportunity to evaluate this scenario within the framework of dependency syntax.2 In such a scenario, a system predicts dependency trees with marked groupings of tokens into MWEs.
Introduction	In this paper, we investigate various strategies for predicting from a tokenized sentence both MWEs and syntactic dependencies, using the French dataset of the SPMRL 13 Shared Task .
Introduction	2The main focus of the Shared Task was on predicting both morphological and syntactic analysis for morphologically-rich languages.
Related work	To our knowledge, the first works3 on predicting both MWEs and dependency trees are those presented to the SPMRL 2013 Shared Task that provided scores for French (which is the only dataset containing MWEs).
Related work	(2013) proposed to combine pipeline and joint systems in a reparser (Sagae and Lavie, 2006), and ranked first at the Shared Task .
Related work	It uses no feature nor treatment specific to MWEs as it focuses on the general aim of the Shared Task , namely coping with prediction of morphological and syntactic analysis.

shared task is mentioned in 11 sentences in this paper.

Topics mentioned in this paper:

5. Multigraph Clustering for Unsupervised Coreference Resolution

Martschat, Sebastian

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	The model outperforms most systems participating in the English track of the CoNLL’ 12 shared task .
Evaluation	We use the data provided for the English track of the CoNLL’ l2 shared task on multilingual coreference resolution (Pradhan et al., 2012) which is a subset of the upcoming OntoNotes 5.0 release and comes with various annotation layers provided by state-of-the-art NLP tools.
Evaluation	We evaluate the model in a setting that corresponds to the shared task’s closed track, i.e.
Evaluation	We evaluate our system with the coreference resolution evaluation metrics that were used for the CoNLL shared tasks on coreference, which are MUC (Vilain et al., 1995), B3 (Bagga and Baldwin, 1998) and CEAFe (Luo, 2005).
Introduction	Quite recently, however, rule-based approaches regained popularity due to Stanford’s multi-pass sieve approach which exhibits state-of-the-art performance on many standard coreference data sets (Raghunathan et al., 2010) and also won the CoNLL-2011 shared task on coreference resolution (Lee et al., 2011; Pradhan et al., 2011).
Introduction	On the English data of the CoNLL’ 12 shared task the model outperforms most systems which participated in the shared task .
Related Work	These approaches participated in the recent CoNLL’ll shared task (Pradhan et al., 2011; Sapena et al., 2011; Cai et al., 2011b) with excellent results.
Related Work	(2012) and ranked second in the English track at the CoNLL’ 12 shared task (Pradhan et al., 2012).
Related Work	The top performing system at the CoNLL’ 12 shared task (Femandes et al., 2012)

shared task is mentioned in 16 sentences in this paper.

Topics mentioned in this paper:

6. Cross-Domain Dependency Parsing Using a Deep Linguistic Grammar

Zhang, Yi and Wang, Rui

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Dependency Parsing with HPSG	For these rules, we refer to the conversion of the Penn Treebank into dependency structures used in the CoNLL 2008 Shared Task , and mark the heads of these rules in a way that will arrive at a compatible dependency backbone.
Dependency Parsing with HPSG	the CoNLL shared task dependency structures, minor systematic differences still exist for some phenomena.
Experiment Results & Error Analyses	To evaluate the performance of our different dependency parsing models, we tested our approaches on several dependency treebanks for English in a similar spirit to the CoNLL 2006-2008 Shared Tasks .
Experiment Results & Error Analyses	In previous years of CoNLL Shared Tasks , several datasets have been created for the purpose of dependency parser evaluation.
Experiment Results & Error Analyses	The same dataset has been used for the domain adaptation track of the CoNLL 2007 Shared Task .
Introduction	In the meantime, successful continuation of CoNLL Shared Tasks since 2006 (Buchholz and Marsi, 2006; Nivre et al., 2007a; Surdeanu et al., 2008) have witnessed how easy it has become to train a statistical syntactic dependency parser provided that there is annotated treebank.
Parser Domain Adaptation	In recent years, two statistical dependency parsing systems, MaltParser (Nivre et al., 2007b) and MS TParser (McDonald et al., 2005b), representing different threads of research in data-driven machine learning approaches have obtained high publicity, for their state-of-the-art performances in open competitions such as CoNLL Shared Tasks .

shared task is mentioned in 12 sentences in this paper.

Topics mentioned in this paper:

7. Learning Structured Perceptrons for Coreference Resolution with Latent Antecedents and Non-local Features

Björkelund, Anders and Kuhn, Jonas

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Our model obtains the best results to date on recent shared task data for Arabic, Chinese, and English.
Background	Nevertheless, the two best systems in the latest CoNLL Shared Task on coreference resolution (Pradhan et al., 2012) were both variants of the mention-pair model.
Conclusion	We evaluated our system on all three languages from the CoNLL 2012 Shared Task and present the best results to date on these data sets.
Experimental Setup	We apply our model to the CoNLL 2012 Shared Task data, which includes a training, development, and test set split for three languages: Arabic, Chinese and English.
Features	As a baseline we use the features from Bjorkelund and Farkas (2012), who ranked second in the 2012 CoNLL shared task and is publicly available.
Introduction	The combination of this modification with nonlocal features leads to further improvements in the clustering accuracy, as we show in evaluation results on all languages from the CoNLL 2012 Shared Task —Arabic, Chinese, and English.
Related Work	Latent antecedents have recently gained popularity and were used by two systems in the CoNLL 2012 Shared Task , including the winning system (Femandes et al., 2012; Chang et al., 2012).
Results	As a general baseline, we also include Bjorkelund and Farkas’ (2012) system (denoted B&F), which was the second best system in the shared task .

shared task is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

8. Low-Resource Semantic Role Labeling

Gormley, Matthew R. and Mitchell, Margaret and Van Durme, Benjamin and Dredze, Mark

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Approaches	Our feature template definitions build from those used by the top performing systems in the CoNLL—2009 Shared Task , Zhao et al.
Experiments	To compare to prior work (i.e., submissions to the CoNLL-2009 Shared Task ), we also consider the joint task of semantic role labeling and predicate sense disambiguation.
Experiments	The CoNLL-2009 Shared Task (Hajic et al., 2009) dataset contains POS tags, lemmas, morphological features, syntactic dependencies, predicate senses, and semantic roles annotations for 7 languages: Catalan, Chinese, Czech, English, German, Japanese,4 Spanish.
Experiments	The CoNLL-2005 and -2008 Shared Task datasets provide English SRL annotation, and for cross dataset comparability we consider only verbal predicates (more details in § 4.4).

shared task is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

9. An Algorithm for Unsupervised Transliteration Mining with an Application to Word Alignment

Sajjad, Hassan and Fraser, Alexander and Schmid, Helmut

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We conduct experiments on data sets from the NEWS 2010 shared task on transliteration mining and achieve an F-measure of up to 92%, outperforming most of the semi-supervised systems that were submitted.
Experiments	For English/Arabic, English/Hindi and English/Tamil, our system is better than most of the semi-supervised systems presented at the NEWS 2010 shared task for transliteration mining.
Experiments	On the English/Russian data set, our system achieves 76% F-measure which is not good compared with the systems that participated in the shared task .
Experiments	The Wikipedia InterLanguage Links shared task data contains a much larger proportion of transliterations than a parallel corpus.
Introduction	We compare our unsupervised transliteration mining method with the semi-supervised systems presented at the NEWS 2010 shared task on transliteration mining (Kumaran et al., 2010) using four language pairs.

shared task is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

10. Adaptive Quality Estimation for Machine Translation

Turchi, Marco and Anastasopoulos, Antonios and C. de Souza, José G. and Negri, Matteo

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Evaluation framework	0 One artificial setting (§5) obtained from the WMT12 QE shared task data, in which train-ing/test instances are arranged to reflect homogeneous distributions of the HTER labels.
Evaluation framework	To measure the adaptability of our model to a given test set we compute the Mean Absolute Error (MAE), a metric for regression problems also used in the WMT QE shared tasks .
Evaluation framework	The results of previous WMT QE shared tasks have shown that these baseline features are particularly competitive in the regression task (with only few systems able to beat them at WMT12).
Online QE for CAT environments	The tool, which implements a large number of features proposed by participants in the WMT QE shared tasks , has been modified to process one sentence at a time as requested for integration in a CAT environment;
Related work	In the last couple of years, research in the field received a strong boost by the shared tasks organized within the WMT workshop on SMT,2 which is also the framework of our first experiment in §5.
Related work	3For a comprehensive overview of the QE approaches proposed so far we refer the reader to the WMT12 and WMT13 QE shared task reports (Callison-Burch et al., 2012; Bojar et al., 2013).

shared task is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

11. Integrating Graph-Based and Transition-Based Dependency Parsers

Nivre, Joakim and McDonald, Ryan

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	By letting one model generate features for the other, we consistently improve accuracy for both models, resulting in a significant improvement of the state of the art when evaluated on data sets from the CoNLL-X shared task .
Experiments	In this section, we present an experimental evaluation of the two guided models based on data from the CoNLL—X shared task , followed by a comparative error analysis including both the base models and the guided models.
Experiments	The data for the experiments are training and test sets for all thirteen languages from the CoNLL-X shared task on multilingual dependency parsing with training sets ranging in size from from 29,000 tokens (Slovene) to 1,249,000 tokens (Czech).
Experiments	Models are evaluated by their labeled attachment score (LAS) on the test set, i.e., the percentage of tokens that are assigned both the correct head and the correct label, using the evaluation software from the CoNLL-X shared task with default settings.4 Statistical significance was assessed using Dan Bikel’s randomized parsing evaluation comparator with the default setting of 10,000 iterations.5
Introduction	Both models have been used to achieve state-of-the-art accuracy for a wide range of languages, as shown in the CoNLL shared tasks on dependency parsing (Buchholz and Marsi, 2006; Nivre et al., 2007), but McDonald and Nivre (2007) showed that a detailed error analysis reveals important differences in the distribution of errors associated with the two models.
Related Work	(2007) to combine six transition-based parsers in the best performing system in the CoNLL 2007 shared task .

shared task is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

12. Modelling Annotator Bias with Multi-task Gaussian Processes: An Application to Machine Translation Quality Estimation

Cohn, Trevor and Specia, Lucia

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Multitask Quality Estimation 4.1 Experimental Setup	These were used by a highly competitive baseline entry in the WMT12 shared task , and were extracted here using the system provided by that shared task.6 They include simple counts, e.g., the tokens in sentences, as well as source and target language model probabilities.
Multitask Quality Estimation 4.1 Experimental Setup	This is generally a very strong baseline: in the WMT12 QE shared task , only five out of 19 submissions were able to significantly outperform it, and only by including many complex additional features, tree kernels, etc.
Multitask Quality Estimation 4.1 Experimental Setup	WMT12: Single task We start by comparing GP regression with alternative approaches using the WMT12 dataset on the standard task of predicting a weighted mean quality rating (as it was done in the WMT12 QE shared task ).
Quality Estimation	For an overview of various algorithms and features we refer the reader to the WMT12 shared task on QE (Callison-Burch et al., 2012).
Quality Estimation	WMT12: This dataset was distributed as part of the WMT12 shared task on QE (Callison-Burch et al., 2012).

shared task is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

13. Decentralized Entity-Level Modeling for Coreference Resolution

Durrett, Greg and Hall, David and Klein, Dan

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion	Our transitive system is more effective at using properties than a pairwise system and a previous entity-level system, and it achieves performance comparable to that of the Stanford coreference resolution system, the winner of the CoNLL 2011 shared task .
Experiments	We use the datasets, experimental setup, and scoring program from the CoNLL 2011 shared task (Pradhan et al., 2011), based on the OntoNotes corpus (Hovy et al., 2006).
Experiments	5 Unfortunately, their publicly-available system is closed-source and performs poorly on the CoNLL shared task dataset, so direct comparison is difficult.
Introduction	We evaluate our system on the dataset from the CoNLL 2011 shared task using three different types of properties: synthetic oracle properties, entity phi features (number, gender, animacy, and NER type), and properties derived from unsupervised clusters targeting semantic type information.
Introduction	Our final system is competitive with the winner of the CoNLL 2011 shared task (Lee et al., 2011).

shared task is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

14. Improving pairwise coreference models through feature space hierarchy learning

Lassalle, Emmanuel and Denis, Pascal

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Our experiments on the C0NLL-2012 Shared Task English datasets (gold mentions) indicate that our method is robust relative to different clustering strategies and evaluation metrics, showing large and consistent improvements over a single pairwise model using the same base features.
Experiments	We evaluated the system on the English part of the corpus provided in the CoNLL-2012 Shared Task (Pradhan et al., 2012), referred to as CoNLL-2012 here.
Experiments	These metrics were recently used in the CoNLL-2011 and -2012 Shared Tasks .
Experiments	The best classifier-decoder combination reaches a score of 67.19, which would place it above the mean score (66.41) of the systems that took part in the C0NLL—2012 Shared Task (gold mentions track).
Introduction	As will be shown based on a variety of experiments on the CoNLL-2012 Shared Task English datasets, these improvements are consistent across different evaluation metrics and for the most part independent of the clustering decoder that was used.

shared task is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

15. Open-Domain Semantic Role Labeling by Modeling Word Spans

Huang, Fei and Yates, Alexander

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	In recent semantic role labeling (SRL) competitions such as the shared tasks of CoNLL 2005 and CoNLL 2008, supervised SRL systems have been trained on newswire text, and then tested on both an in-domain test set (Wall Street Journal text) and an out-of-domain test set (fiction).
Introduction	We test our open-domain semantic role labeling system using data from the CoNLL 2005 shared task (Carreras and Marquez, 2005).
Introduction	Like the best systems from the CoNLL 2005 shared task (Punyakanok et al., 2008; Pradhan et al., 2005), they also use features from multiple parses to remain robust in the face of parser error.

shared task is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

16. Steps to Excellence: Simple Inference with Refined Scoring of Dependency Trees

Zhang, Yuan and Lei, Tao and Barzilay, Regina and Jaakkola, Tommi and Globerson, Amir

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Setup	For the CATiB dataset, we report UAS including punctuation in order to be consistent with the published results in the 2013 SPMRL shared task (Seddah et al., 2013).
Introduction	This is better than the best published results in the 2013 SPMRL shared task (Seddah et al., 2013), including parser ensembles.
Results	To put these numbers into perspective, the bottom part of Table 3 shows the accuracy of the best systems from the 2013 SPMRL shared task on Arabic parsing using predicted information (Seddah et al., 2013).
Results	Bottom part shows UAS of the best systems in the SPMRL shared task .

shared task is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

17. Less Grammar, More Features

Hall, David and Durrett, Greg and Klein, Dan

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	On the SPMRL 2013 multilingual constituency parsing shared task (Seddah et al., 2013), our system outperforms the top single parser system of Bjorkelund et al.
Introduction	Our parser is also able to generalize well across languages with little tuning: it achieves state-of-the-art results on multilingual parsing, scoring higher than the best single-parser system from the SPMRL 2013 Shared Task on a range of languages, as well as on the competition’s average Fl metric.
Other Languages	We evaluate on the constituency treebanks from the Statistical Parsing of Morphologically Rich Languages Shared Task (Seddah et al., 2013).
Other Languages	5 Their best parser, and the best overall parser from the shared task , is a reranked product of “Replaced” Berkeley parsers.

shared task is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

18. A Transition-Based Parser for 2-Planar Dependency Structures

Gómez-Rodr'iguez, Carlos and Nivre, Joakim

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	In this paper, we present a transition system for 2-planar dependency trees — trees that can be decomposed into at most two planar graphs — and show that it can be used to implement a classifier-based parser that runs in linear time and outperforms a state-of-the-art transition-based parser on four data sets from the CoNLL-X shared task .
Empirical Evaluation	In order to get a first estimate of the empirical accuracy that can be obtained with transition-based 2-planar parsing, we have evaluated the parser on four data sets from the CoNLL—X shared task (Buchholz and Marsi, 2006): Czech, Danish, German and Portuguese.
Empirical Evaluation	(2006b) in the original shared task , where the pseudo-projective version of MaltParser was one of the two top performing systems (Buchholz and Marsi, 2006).
Introduction	Although the contributions of this paper are mainly theoretical, we also present an empirical evaluation of the 2-planar parser, showing that it outperforms the projective parser on four data sets from the CoNLL—X shared task (Buchholz and Marsi, 2006).

shared task is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

19. Using Discourse Structure Improves Machine Translation Evaluation

Guzmán, Francisco and Joty, Shafiq and Màrquez, Llu'is and Nakov, Preslav

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Setup	In our experiments, we used the data available for the WMT12 and the WMTll metrics shared tasks for translations into English.3 This included the output from the systems that participated in the WMT12 and the WMTll MT evaluation campaigns, both consisting of 3,003 sentences, for four different language pairs: Czech-English (CS-EN), French-English (FR-EN), German-English (DE-EN), and Spanish-English (ES-EN); as well as a dataset with the English references.
Experimental Setup	Table 1: Number of systems (systs), judgments (ranks), unique sentences (sents), and different judges (judges) for the different language pairs, for the human evaluation of the WMT12 and WMT11 shared tasks .
Introduction	We first design two discourse-aware similarity measures, which use DTs generated by a publicly-available discourse parser (J oty et al., 2012); then, we show that they can help improve a number of MT evaluation metrics at the segment- and at the system-level in the context of the WMT11 and the WMT12 metrics shared tasks (Callison-Burch et al., 2011; Callison-Burch et al., 2012).

shared task is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

20. Distributional Representations for Handling Sparsity in Supervised Sequence-Labeling

Huang, Fei and Yates, Alexander

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Following the CoNLL shared task from 2000, we use sections 15-18 of the Penn Treebank for our labeled training data for the supervised sequence labeler in all experiments (Tjong et al., 2000).
Experiments	The chunker’s accuracy is roughly in the middle of the range of results for the original CoNLL 2000 shared task (Tjong et al., 2000) .
Experiments	For our experiment on domain adaptation, we focus on NP chunking and POS tagging, and we use the labeled training data from the CoNLL 2000 shared task as before.

shared task is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

21. Automatic Keyphrase Extraction: A Survey of the State of the Art

Hasan, Kazi Saidul and Ng, Vincent

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Analysis	3 A more detailed analysis of the results of the SemEval-2010 shared task and the approaches adopted by the participating systems can be found in Kim et a1.
Evaluation	To score the output of a keyphrase extraction system, the typical approach, which is also adopted by the SemEval—2010 shared task on keyphrase extraction, is (1) to create a mapping between the keyphrases in the gold standard and those in the system output using exact match, and then (2) score the output using evaluation metrics such as precision (P), recall (R), and F-score (F).
Evaluation	For example, KP-Miner (El-Beltagy and Rafea, 2010), an unsupervised system, ranked third in the SemEval-2010 shared task with an F-score of 25.2, which is comparable to the best supervised system scoring 27.5.

shared task is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

22. An Extension of BLANC to System Mentions

Luo, Xiaoqiang and Pradhan, Sameer and Recasens, Marta and Hovy, Eduard

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

BLANC for Imperfect Response Mentions	Table 1: The proposed BLANC scores of the CoNLL-2011 shared task participants.
BLANC for Imperfect Response Mentions	Table 2: The proposed BLANC scores of the CoNLL-2012 shared task participants.
Introduction	The proposed BLANC is applied to the CoNLL 2011 and 2012 shared task participants, and the scores and its correlations with existing metrics are shown in Section 5.

shared task is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

23. Unsupervised Dependency Parsing with Transferring Distribution via Parallel Guidance and Entropy Regularization

Ma, Xuezhe and Xia, Fei

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Data and Tools	We evaluate our approach on three target languages from CoNLL shared task treebanks, which do not appear in Google Universal Treebanks.
Experiments	To make a thorough empirical comparison with previous studies, we also evaluate our system without unlabeled data (-U) on treebanks from CoNLL shared task on dependency parsing (Buchholz and Marsi, 2006; Nivre et al., 2007).
Experiments	Table 6: Parsing results on treebanks from CoNLL shared tasks for eight target languages.

shared task is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

24. Tagging The Web: Building A Robust Web Tagger with Neural Network

Ma, Ji and Zhang, Yue and Zhu, Jingbo

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Experiment on the SANCL 2012 shared task show that our approach achieves 93.15% average tagging accuracy, which is the best accuracy reported so far on this data set, higher than those given by ensembled syntactic parsers.
Experiments	Our experiments are conducted on the data set provided by the SANCL 2012 shared task , which aims at building a single robust syntactic analysis system across the web-domain.
Introduction	We conduct experiments on the official data set provided by the SANCL 2012 shared task (Petrov and McDonald, 2012).

shared task is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

25. Unsupervised Argument Identification for Semantic Role Labeling

Abend, Omri and Reichart, Roi and Rappoport, Ari

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Related Work	PB is a standard corpus for SRL evaluation and was used in the CoNLL SRL shared tasks of 2004 (Carreras and Marquez, 2004) and 2005 (Carreras and Marquez, 2005).
Related Work	The CoNLL shared tasks of 2004 and 2005 were devoted to SRL, and studied the influence of different syntactic annotations and domain changes on SRL results.
Related Work	Supervised clause detection was also tackled as a separate task, notably in the CoNLL 2001 shared task (Tjong Kim Sang and Dejean, 2001).

shared task is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

POS tags (14)
F-score (12)
parse tree (11)

26. Graph-based Local Coherence Modeling

Guinaudeau, Camille and Strube, Michael

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	To do so, we use one of the top performing systems from the CoNLL 2012 shared task (Martschat et al., 2012).
Experiments	These two tasks were performed on documents extracted from the English test part of the CoNLL 2012 shared task (Pradhan et al., 2012).
Experiments	The system was trained on the English training part of the CoNLL 2012 shared task filtered in the same way as the test part.

shared task is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

27. Robustness and Generalization of Role Sets: PropBank vs. VerbNet

Zapirain, Beñat and Agirre, Eneko and Màrquez, Llu'is

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Setting 3.1 Datasets	The data used in this work is the benchmark corpus provided by the SRL shared task of CoNLL-2005 (Carreras and Marquez, 2005).
Experimental Setting 3.1 Datasets	The system achieves very good performance in the CoNLL-2005 shared task dataset and in the SRL subtask of the SemEval-2007 English lexical sample task (Zapirain et al., 2007).
On the Generalization of Role Sets	This is the setting used in the CoNLL 2005 shared task (Carreras and Marquez, 2005).

shared task is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: