Index of papers in Proc. ACL 2014 that mention
  • Shared Task
Candito, Marie and Constant, Matthieu
Abstract
In this paper, we investigate various strategies to predict both syntactic dependency parsing and contiguous multiword expression (MWE) recognition, testing them on the dependency version of French Treebank (Abeille and Barrier, 2004), as instantiated in the SPMRL Shared Task (Seddah et al., 2013).
Architectures for MWE Analysis and Parsing
We compare these four architectures between them and also with two simpler architectures used by (Constant et al., 2013) within the SPMRL 13 Shared Task , in which regular and irregular MWEs are not distinguished:
Conclusion
We experimented strategies to predict both MWE analysis and dependency structure, and tested them on the dependency version of French Treebank (Abeille and Barrier, 2004), as instantiated in the SPMRL Shared Task (Seddah et al., 2013).
Data: MWEs in Dependency Trees
The Shared Task used an enhanced version of the constituency-to-dependency conversion of Candito et al.
Experiments
Moreover, we provide in table 5 a comparison of our best architecture with reg/irregular MWE distinction with other architectures that do not make this distinction, namely the two best comparable systems designed for the SPMRL Shared Task (Seddah et a1., 2013): the pipeline simple parser based on Mate-tools of Constant et al.
Introduction
While the realistic scenario of syntactic parsing with automatic MWE recognition (either done jointly or in a pipeline) has already been investigated in constituency parsing (Green et al., 2011; Constant et al., 2012; Green et al., 2013), the French dataset of the SPMRL 2013 Shared Task (Seddah et al., 2013) only recently provided the opportunity to evaluate this scenario within the framework of dependency syntax.2 In such a scenario, a system predicts dependency trees with marked groupings of tokens into MWEs.
Introduction
In this paper, we investigate various strategies for predicting from a tokenized sentence both MWEs and syntactic dependencies, using the French dataset of the SPMRL 13 Shared Task .
Introduction
2The main focus of the Shared Task was on predicting both morphological and syntactic analysis for morphologically-rich languages.
Related work
To our knowledge, the first works3 on predicting both MWEs and dependency trees are those presented to the SPMRL 2013 Shared Task that provided scores for French (which is the only dataset containing MWEs).
Related work
(2013) proposed to combine pipeline and joint systems in a reparser (Sagae and Lavie, 2006), and ranked first at the Shared Task .
Related work
It uses no feature nor treatment specific to MWEs as it focuses on the general aim of the Shared Task , namely coping with prediction of morphological and syntactic analysis.
Shared Task is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Packard, Woodley and Bender, Emily M. and Read, Jonathon and Oepen, Stephan and Dridan, Rebecca
Abstract
In this work, we revisit Shared Task 1 from the 2012 *SEM Conference: the automated analysis of negation.
Introduction
Owing to its immediate utility in the cura-tion of scholarly results, the analysis of negation and so-called hedges in biomedical research literature has been the focus of several workshops, as well as the Shared Task at the 2011 Conference on Computational Language Learning (CoNLL).
Introduction
1Our running example is a truncated variant of an item from the Shared Task training data.
Introduction
Though the task-specific concept of scope of negation is not the same as the notion of quantifier and operator scope in mainstream underspecified semantics, we nonetheless find that reviewing the 2012 *SEM Shared Task annotations with reference to an explicit encoding of semantic predicate-argument structure suggests a simple and straightforward operationalization of their concept of negation scope.
Related Work
(2012) describe some amount of tailoring of the Boxer lexicon to include more of the Shared Task scope cues among those that produce the negation operator in the DRSs, but otherwise the system appears to directly take the notion of scope of negation from the DRS and project it out to the string, with one caveat: As with the logical-forms representations we use, the DRS logical forms do not include function words as predicates in the semantics.
Related Work
Since the Shared Task gold standard annotations included such arguably semantically vacuous (see Bender, 2013, p. 107) words in the scope, further heuristics are needed to repair the string-based annotations coming from the DRS-based system.
System Description
From these underspecified representations of possible scopal configurations, a scope resolution component can spell out the full range of fully-connected logical forms (Koller and Thater, 2005), but it turns out that such enumeration is not relevant here: the notion of scope encoded in the Shared Task annotations is not concerned with the relative scope of quantifiers and negation, such as the two possible readings of (2) represented informally below:5
System Description
However, as shown below, the information about fixed scopal elements in an underspecified MRS is sufficient to model the Shared Task annotations.
System Description
5 In other words, a possible semantic interpretation of the (string-based) Shared Task annotation guidelines and data is in terms of a quantifier-free approach to meaning representation, or in terms of one where quantifier scope need not be made explicit (as once suggested by, among others, Alshawi, 1992).
Shared Task is mentioned in 20 sentences in this paper.
Topics mentioned in this paper:
Zou, Bowei and Zhou, Guodong and Zhu, Qiaoming
Abstract
Evaluation on the *SEM 2012 shared task corpus indicates the usefulness of contextual discourse information in negation focus identification and justifies the effectiveness of our graph model in capturing such global information.
Baselines
Negation focus identification in *SEM’2012 shared tasks is restricted to verbal negations annotated with MNEG in PropBank, with only the constituent belonging to a semantic role selected as negation focus.
Baselines
1 In *SEM’2013, the shared task is changed with focus on "Semantic Textual Similarity".
Baselines
For better illustration of the importance of contextual discourse information, Table 1 shows the statistics of intra- and inter-sentence information necessary for manual negation focus identification with 100 instances randomly extracted from the held-out dataset of *SEM'2012 shared task corpus.
Introduction
Evaluation on the *SEM 2012 shared task corpus (Morante and Blanco, 2012) justifies our approach over several strong baselines.
Related Work
Due to the increasing demand on deep understanding of natural language text, negation recognition has been drawing more and more attention in recent years, with a series of shared tasks and workshops, however, with focus on cue detection and scope resolution, such as the BioNLP 2009 shared task for negative event detection (Kim et al., 2009) and the ACL 2010 Workshop for scope resolution of negation and speculation (Morante and Sporleder, 2010), followed by a special issue of Computational Linguistics (Morante and Sporleder, 2012) for modality and negation.
Related Work
However, although Morante and Blanco (2012) proposed negation focus identification as one of the *SEM’2012 shared tasks , only one team (Rosenberg and Bergler, 2012)1 participated in this task.
Shared Task is mentioned in 15 sentences in this paper.
Topics mentioned in this paper:
Björkelund, Anders and Kuhn, Jonas
Abstract
Our model obtains the best results to date on recent shared task data for Arabic, Chinese, and English.
Background
Nevertheless, the two best systems in the latest CoNLL Shared Task on coreference resolution (Pradhan et al., 2012) were both variants of the mention-pair model.
Conclusion
We evaluated our system on all three languages from the CoNLL 2012 Shared Task and present the best results to date on these data sets.
Experimental Setup
We apply our model to the CoNLL 2012 Shared Task data, which includes a training, development, and test set split for three languages: Arabic, Chinese and English.
Features
As a baseline we use the features from Bjorkelund and Farkas (2012), who ranked second in the 2012 CoNLL shared task and is publicly available.
Introduction
The combination of this modification with nonlocal features leads to further improvements in the clustering accuracy, as we show in evaluation results on all languages from the CoNLL 2012 Shared Task —Arabic, Chinese, and English.
Related Work
Latent antecedents have recently gained popularity and were used by two systems in the CoNLL 2012 Shared Task , including the winning system (Femandes et al., 2012; Chang et al., 2012).
Results
As a general baseline, we also include Bjorkelund and Farkas’ (2012) system (denoted B&F), which was the second best system in the shared task .
Shared Task is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Gormley, Matthew R. and Mitchell, Margaret and Van Durme, Benjamin and Dredze, Mark
Approaches
Our feature template definitions build from those used by the top performing systems in the CoNLL—2009 Shared Task , Zhao et al.
Experiments
To compare to prior work (i.e., submissions to the CoNLL-2009 Shared Task ), we also consider the joint task of semantic role labeling and predicate sense disambiguation.
Experiments
The CoNLL-2009 Shared Task (Hajic et al., 2009) dataset contains POS tags, lemmas, morphological features, syntactic dependencies, predicate senses, and semantic roles annotations for 7 languages: Catalan, Chinese, Czech, English, German, Japanese,4 Spanish.
Experiments
The CoNLL-2005 and -2008 Shared Task datasets provide English SRL annotation, and for cross dataset comparability we consider only verbal predicates (more details in § 4.4).
Shared Task is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Turchi, Marco and Anastasopoulos, Antonios and C. de Souza, José G. and Negri, Matteo
Evaluation framework
0 One artificial setting (§5) obtained from the WMT12 QE shared task data, in which train-ing/test instances are arranged to reflect homogeneous distributions of the HTER labels.
Evaluation framework
To measure the adaptability of our model to a given test set we compute the Mean Absolute Error (MAE), a metric for regression problems also used in the WMT QE shared tasks .
Evaluation framework
The results of previous WMT QE shared tasks have shown that these baseline features are particularly competitive in the regression task (with only few systems able to beat them at WMT12).
Online QE for CAT environments
The tool, which implements a large number of features proposed by participants in the WMT QE shared tasks , has been modified to process one sentence at a time as requested for integration in a CAT environment;
Related work
In the last couple of years, research in the field received a strong boost by the shared tasks organized within the WMT workshop on SMT,2 which is also the framework of our first experiment in §5.
Related work
3For a comprehensive overview of the QE approaches proposed so far we refer the reader to the WMT12 and WMT13 QE shared task reports (Callison-Burch et al., 2012; Bojar et al., 2013).
Shared Task is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Hall, David and Durrett, Greg and Klein, Dan
Abstract
On the SPMRL 2013 multilingual constituency parsing shared task (Seddah et al., 2013), our system outperforms the top single parser system of Bjorkelund et al.
Introduction
Our parser is also able to generalize well across languages with little tuning: it achieves state-of-the-art results on multilingual parsing, scoring higher than the best single-parser system from the SPMRL 2013 Shared Task on a range of languages, as well as on the competition’s average Fl metric.
Other Languages
We evaluate on the constituency treebanks from the Statistical Parsing of Morphologically Rich Languages Shared Task (Seddah et al., 2013).
Other Languages
5 Their best parser, and the best overall parser from the shared task , is a reranked product of “Replaced” Berkeley parsers.
Shared Task is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Zhang, Yuan and Lei, Tao and Barzilay, Regina and Jaakkola, Tommi and Globerson, Amir
Experimental Setup
For the CATiB dataset, we report UAS including punctuation in order to be consistent with the published results in the 2013 SPMRL shared task (Seddah et al., 2013).
Introduction
This is better than the best published results in the 2013 SPMRL shared task (Seddah et al., 2013), including parser ensembles.
Results
To put these numbers into perspective, the bottom part of Table 3 shows the accuracy of the best systems from the 2013 SPMRL shared task on Arabic parsing using predicted information (Seddah et al., 2013).
Results
Bottom part shows UAS of the best systems in the SPMRL shared task .
Shared Task is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Guzmán, Francisco and Joty, Shafiq and Màrquez, Llu'is and Nakov, Preslav
Experimental Setup
In our experiments, we used the data available for the WMT12 and the WMTll metrics shared tasks for translations into English.3 This included the output from the systems that participated in the WMT12 and the WMTll MT evaluation campaigns, both consisting of 3,003 sentences, for four different language pairs: Czech-English (CS-EN), French-English (FR-EN), German-English (DE-EN), and Spanish-English (ES-EN); as well as a dataset with the English references.
Experimental Setup
Table 1: Number of systems (systs), judgments (ranks), unique sentences (sents), and different judges (judges) for the different language pairs, for the human evaluation of the WMT12 and WMT11 shared tasks .
Introduction
We first design two discourse-aware similarity measures, which use DTs generated by a publicly-available discourse parser (J oty et al., 2012); then, we show that they can help improve a number of MT evaluation metrics at the segment- and at the system-level in the context of the WMT11 and the WMT12 metrics shared tasks (Callison-Burch et al., 2011; Callison-Burch et al., 2012).
Shared Task is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Hasan, Kazi Saidul and Ng, Vincent
Analysis
3 A more detailed analysis of the results of the SemEval-2010 shared task and the approaches adopted by the participating systems can be found in Kim et a1.
Evaluation
To score the output of a keyphrase extraction system, the typical approach, which is also adopted by the SemEval—2010 shared task on keyphrase extraction, is (1) to create a mapping between the keyphrases in the gold standard and those in the system output using exact match, and then (2) score the output using evaluation metrics such as precision (P), recall (R), and F-score (F).
Evaluation
For example, KP-Miner (El-Beltagy and Rafea, 2010), an unsupervised system, ranked third in the SemEval-2010 shared task with an F-score of 25.2, which is comparable to the best supervised system scoring 27.5.
Shared Task is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Luo, Xiaoqiang and Pradhan, Sameer and Recasens, Marta and Hovy, Eduard
BLANC for Imperfect Response Mentions
Table 1: The proposed BLANC scores of the CoNLL-2011 shared task participants.
BLANC for Imperfect Response Mentions
Table 2: The proposed BLANC scores of the CoNLL-2012 shared task participants.
Introduction
The proposed BLANC is applied to the CoNLL 2011 and 2012 shared task participants, and the scores and its correlations with existing metrics are shown in Section 5.
Shared Task is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Ma, Xuezhe and Xia, Fei
Data and Tools
We evaluate our approach on three target languages from CoNLL shared task treebanks, which do not appear in Google Universal Treebanks.
Experiments
To make a thorough empirical comparison with previous studies, we also evaluate our system without unlabeled data (-U) on treebanks from CoNLL shared task on dependency parsing (Buchholz and Marsi, 2006; Nivre et al., 2007).
Experiments
Table 6: Parsing results on treebanks from CoNLL shared tasks for eight target languages.
Shared Task is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Ma, Ji and Zhang, Yue and Zhu, Jingbo
Abstract
Experiment on the SANCL 2012 shared task show that our approach achieves 93.15% average tagging accuracy, which is the best accuracy reported so far on this data set, higher than those given by ensembled syntactic parsers.
Experiments
Our experiments are conducted on the data set provided by the SANCL 2012 shared task , which aims at building a single robust syntactic analysis system across the web-domain.
Introduction
We conduct experiments on the official data set provided by the SANCL 2012 shared task (Petrov and McDonald, 2012).
Shared Task is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: