Low-Resource Semantic Role Labeling
Gormley, Matthew R. and Mitchell, Margaret and Van Durme, Benjamin and Dredze, Mark

Article Structure

Abstract

We explore the extent to which high-resource manual annotations such as treebanks are necessary for the task of semantic role labeling (SRL).

Introduction

The goal of semantic role labeling (SRL) is to identify predicates and arguments and label their semantic contribution in a sentence.

Related Work

Our work builds upon research in both semantic role labeling and unsupervised grammar induction (Klein and Manning, 2004; Spitkovsky et a1., 2010a).

Approaches

We consider an array of models, varying: 1.

Experiments

We are interested in the effects of varied supervision using pipeline and joint training for SRL.

Discussion and Future Work

We have compared various approaches for low-resource semantic role labeling at the state-of-the-art level.

Topics

grammar induction

Appears in 27 sentences as: Grammar Induction (4) Grammar induction (1) grammar induction (24)
In Low-Resource Semantic Role Labeling
  1. This work highlights a new application of unsupervised grammar induction and demonstrates several approaches to SRL in the absence of supervised syntax.
    Page 1, “Abstract”
  2. To better understand the effect of the low-resource grammars and features used in these models, we further include comparisons with (1) models that use higher-resource versions of the same features; (2) state-of-the-art high resource models; and (3) previous work on low-resource grammar induction .
    Page 1, “Introduction”
  3. 0 New application of unsupervised grammar induction : low-resource SRL.
    Page 2, “Introduction”
  4. o Constrained grammar induction using SRL for distant-supervision.
    Page 2, “Introduction”
  5. In the pipeline models, we develop a novel approach to unsupervised grammar induction and explore performance using SRL as distant supervision.
    Page 2, “Introduction”
  6. Examining the learning curve of the joint and pipeline models in two languages demonstrates that a small number of labeled SRL examples may be essential for good end-task performance, but that the choice of a good model for grammar induction has an even greater impact.
    Page 2, “Introduction”
  7. Our work builds upon research in both semantic role labeling and unsupervised grammar induction (Klein and Manning, 2004; Spitkovsky et a1., 2010a).
    Page 2, “Related Work”
  8. There has not yet been a comparison of techniques for SRL that do not rely on a syntactic treebank, and no exploration of probabilistic models for unsupervised grammar induction within an SRL pipeline that we have been able to find.
    Page 2, “Related Work”
  9. Train Time, Constrained Grammar Induction : Observed Constraints
    Page 3, “Related Work”
  10. Grammar induction work has further demonstrated that distant supervision in the form of ACE-style relations (Naseem and Barzilay, 2011) or HTML markup (Spitkovsky et al., 2010b) can lead to considerable gains.
    Page 3, “Related Work”
  11. We do not reimplement these approaches within the SRL pipeline here, but provide comparison of these methods against our grammar induction approach in isolation in § 4.5.
    Page 3, “Related Work”

See all papers in Proc. ACL 2014 that mention grammar induction.

See all papers in Proc. ACL that mention grammar induction.

Back to top.

semantic role

Appears in 19 sentences as: semantic role (15) semantic roles (4)
In Low-Resource Semantic Role Labeling
  1. We explore the extent to which high-resource manual annotations such as treebanks are necessary for the task of semantic role labeling (SRL).
    Page 1, “Abstract”
  2. The goal of semantic role labeling (SRL) is to identify predicates and arguments and label their semantic contribution in a sentence.
    Page 1, “Introduction”
  3. The problem of SRL for low-resource languages is an important one to solve, as solutions pave the way for a wide range of applications: Accurate identification of the semantic roles of entities is a critical step for any application sensitive to semantics, from information retrieval to machine translation to question answering.
    Page 1, “Introduction”
  4. We examine approaches in a joint setting where we marginalize over latent syntax to find the optimal semantic role assignment; and a pipeline setting where we first induce an unsupervised grammar.
    Page 1, “Introduction”
  5. We find that the joint approach is a viable alternative for making reasonable semantic role predictions, outperforming the pipeline models.
    Page 1, “Introduction”
  6. Our work builds upon research in both semantic role labeling and unsupervised grammar induction (Klein and Manning, 2004; Spitkovsky et a1., 2010a).
    Page 2, “Related Work”
  7. Previous related approaches to semantic role labeling include joint classification of semantic arguments (Toutanova et a1., 2005; J 0-hansson and Nugues, 2008), latent syntax induction (Boxwell et a1., 2011; Naradowsky et a1., 2012), and feature engineering for SRL (Zhao et a1., 2009; Bjorkelund et a1., 2009).
    Page 2, “Related Work”
  8. (2013) extend this idea by coupling predictions of a dependency parser with predictions from a semantic role labeler.
    Page 2, “Related Work”
  9. (2011) describe a method for training a semantic role labeler by extracting features from a packed CCG parse chart, where the parse weights are given by a simple ruleset.
    Page 2, “Related Work”
  10. Related work for the unsupervised learning of dependency structures separately from semantic roles primarily comes from Klein and Manning (2004), who introduced the Dependency Model with Valence (DMV).
    Page 2, “Related Work”
  11. A typical pipeline consists of a POS tagger, dependency parser, and semantic role labeler.
    Page 3, “Approaches”

See all papers in Proc. ACL 2014 that mention semantic role.

See all papers in Proc. ACL that mention semantic role.

Back to top.

dependency parsing

Appears in 16 sentences as: dependency parse (3) dependency parser (2) dependency parses (2) dependency parsing (9)
In Low-Resource Semantic Role Labeling
  1. 0 Simpler joint CRF for syntactic and semantic dependency parsing than previously reported.
    Page 2, “Introduction”
  2. (2013) extend this idea by coupling predictions of a dependency parser with predictions from a semantic role labeler.
    Page 2, “Related Work”
  3. (2012) marginalize over latent syntactic dependency parses .
    Page 2, “Related Work”
  4. Recent work in fully unsupervised dependency parsing has supplanted these methods with even higher accuracies (Spitkovsky et al., 2013) by arranging optimiz-ers into networks that suggest informed restarts based on previously identified local optima.
    Page 3, “Related Work”
  5. A typical pipeline consists of a POS tagger, dependency parser , and semantic role labeler.
    Page 3, “Approaches”
  6. Brown clusters have been used to good effect for various NLP tasks such as named entity recognition (Miller et al., 2004) and dependency parsing (Koo et al., 2008; Spitkovsky et al., 2011).
    Page 3, “Approaches”
  7. Unlike syntactic dependency parsing , the graph is not required to be a tree, nor even a connected graph.
    Page 4, “Approaches”
  8. Figure 2: Factor graph for the joint syntac-tic/semantic dependency parsing model.
    Page 4, “Approaches”
  9. In this section, we define a simple model for joint syntactic and semantic dependency parsing .
    Page 4, “Approaches”
  10. This model extends the CRF model in Section 3.1 to include the projective syntactic dependency parse for a sentence.
    Page 4, “Approaches”
  11. Unlike the semantic dependencies, these syntactic variables must be coupled so that they produce a projective dependency parse ; this requires an additional global constraint factor to ensure that this is the case (Smith and Eisner, 2008).
    Page 4, “Approaches”

See all papers in Proc. ACL 2014 that mention dependency parsing.

See all papers in Proc. ACL that mention dependency parsing.

Back to top.

POS tags

Appears in 15 sentences as: POS tag (1) POS tagger (2) POS tags (12)
In Low-Resource Semantic Role Labeling
  1. 0 Use of Brown clusters in place of POS tags for low-resource SRL.
    Page 2, “Introduction”
  2. (2012) limit their exploration to a small set of basic features, and included high-resource supervision in the form of lemmas, POS tags , and morphology available from the CoNLL 2009 data.
    Page 2, “Related Work”
  3. Our experiments also consider ‘longer’ pipelines that include earlier stages: a morphological analyzer, POS tagger , lemmatizer.
    Page 3, “Related Work”
  4. A typical pipeline consists of a POS tagger , dependency parser, and semantic role labeler.
    Page 3, “Approaches”
  5. Brown Clusters We use fully unsupervised Brown clusters (Brown et al., 1992) in place of POS tags .
    Page 3, “Approaches”
  6. We define the DMV such that it generates sequences of word classes: either POS tags or Brown clusters as in Spitkovsky et al.
    Page 3, “Approaches”
  7. First, for SRL, it has been observed that feature bigrams (the concatenation of simple features such as a predicate’s POS tag and an argument’s word) are important for state-of-the-art (Zhao et al., 2009; Bjorkelund et al., 2009).
    Page 5, “Approaches”
  8. Second, for syntactic dependency parsing, combining Brown cluster features with word forms or POS tags yields high accuracy even with little training data (Koo et al., 2008).
    Page 5, “Approaches”
  9. Our experiments are subtractive, beginning with all supervision available and then successively removing (a) dependency syntax, (b) morphological features, (c) POS tags , and (d) lemmas.
    Page 6, “Experiments”
  10. The CoNLL-2009 Shared Task (Hajic et al., 2009) dataset contains POS tags , lemmas, morphological features, syntactic dependencies, predicate senses, and semantic roles annotations for 7 languages: Catalan, Chinese, Czech, English, German, Japanese,4 Spanish.
    Page 6, “Experiments”
  11. We first compare our models trained as a pipeline, using all available supervision (syntax, morphology, POS tags , lemmas) from the CoNLL-2009 data.
    Page 6, “Experiments”

See all papers in Proc. ACL 2014 that mention POS tags.

See all papers in Proc. ACL that mention POS tags.

Back to top.

feature templates

Appears in 12 sentences as: Feature Template (1) feature template (4) Feature templates (1) feature templates (6)
In Low-Resource Semantic Role Labeling
  1. (2009) features, who use feature templates from combinations of word properties, syntactic positions including head and children, and semantic properties; and features from Bj orkelund et a1.
    Page 3, “Related Work”
  2. We create binary indicator features for each model using feature templates .
    Page 5, “Approaches”
  3. Our feature template definitions build from those used by the top performing systems in the CoNLL—2009 Shared Task, Zhao et al.
    Page 5, “Approaches”
  4. Template Creation Feature templates are defined over triples of (property, positions, order).
    Page 5, “Approaches”
  5. Constructing all feature template unigrams and bigrams would yield an unwieldy number of features.
    Page 5, “Approaches”
  6. where Tm is the mth feature template , f is a particular instantiation of that template, and arc, is an assignment to the variables in factor a.
    Page 5, “Approaches”
  7. This is simply the mutual information of the feature template instantiation with the variable assignment.
    Page 5, “Approaches”
  8. proaches, this filtering approach easily scales to many features since we can decompose the memory usage over feature templates .
    Page 6, “Approaches”
  9. 4.2 Feature Template Sets
    Page 6, “Experiments”
  10. Further, the best-performing low-resource features found in this work are those based on coarse feature templates and selected by information gain.
    Page 7, “Experiments”
  11. #FT indicates the number of feature templates used (unigrams+bigrams).
    Page 8, “Experiments”

See all papers in Proc. ACL 2014 that mention feature templates.

See all papers in Proc. ACL that mention feature templates.

Back to top.

role labeling

Appears in 11 sentences as: role labeler (3) role labeling (8)
In Low-Resource Semantic Role Labeling
  1. We explore the extent to which high-resource manual annotations such as treebanks are necessary for the task of semantic role labeling (SRL).
    Page 1, “Abstract”
  2. The goal of semantic role labeling (SRL) is to identify predicates and arguments and label their semantic contribution in a sentence.
    Page 1, “Introduction”
  3. Our work builds upon research in both semantic role labeling and unsupervised grammar induction (Klein and Manning, 2004; Spitkovsky et a1., 2010a).
    Page 2, “Related Work”
  4. Previous related approaches to semantic role labeling include joint classification of semantic arguments (Toutanova et a1., 2005; J 0-hansson and Nugues, 2008), latent syntax induction (Boxwell et a1., 2011; Naradowsky et a1., 2012), and feature engineering for SRL (Zhao et a1., 2009; Bjorkelund et a1., 2009).
    Page 2, “Related Work”
  5. (2013) extend this idea by coupling predictions of a dependency parser with predictions from a semantic role labeler .
    Page 2, “Related Work”
  6. (2011) describe a method for training a semantic role labeler by extracting features from a packed CCG parse chart, where the parse weights are given by a simple ruleset.
    Page 2, “Related Work”
  7. A typical pipeline consists of a POS tagger, dependency parser, and semantic role labeler .
    Page 3, “Approaches”
  8. Dependency-based semantic role labeling can be described as a simple structured prediction problem: the predicted structure is a labeled directed graph, where nodes correspond to words in the sentence.
    Page 4, “Approaches”
  9. Semantic Dependency Model As described above, semantic role labeling can be cast as a structured prediction problem where the structure is a labeled semantic dependency graph.
    Page 4, “Approaches”
  10. To compare to prior work (i.e., submissions to the CoNLL-2009 Shared Task), we also consider the joint task of semantic role labeling and predicate sense disambiguation.
    Page 6, “Experiments”
  11. We have compared various approaches for low-resource semantic role labeling at the state-of-the-art level.
    Page 9, “Discussion and Future Work”

See all papers in Proc. ACL 2014 that mention role labeling.

See all papers in Proc. ACL that mention role labeling.

Back to top.

semantic role labeling

Appears in 11 sentences as: semantic role labeler (3) semantic role labeling (8)
In Low-Resource Semantic Role Labeling
  1. We explore the extent to which high-resource manual annotations such as treebanks are necessary for the task of semantic role labeling (SRL).
    Page 1, “Abstract”
  2. The goal of semantic role labeling (SRL) is to identify predicates and arguments and label their semantic contribution in a sentence.
    Page 1, “Introduction”
  3. Our work builds upon research in both semantic role labeling and unsupervised grammar induction (Klein and Manning, 2004; Spitkovsky et a1., 2010a).
    Page 2, “Related Work”
  4. Previous related approaches to semantic role labeling include joint classification of semantic arguments (Toutanova et a1., 2005; J 0-hansson and Nugues, 2008), latent syntax induction (Boxwell et a1., 2011; Naradowsky et a1., 2012), and feature engineering for SRL (Zhao et a1., 2009; Bjorkelund et a1., 2009).
    Page 2, “Related Work”
  5. (2013) extend this idea by coupling predictions of a dependency parser with predictions from a semantic role labeler .
    Page 2, “Related Work”
  6. (2011) describe a method for training a semantic role labeler by extracting features from a packed CCG parse chart, where the parse weights are given by a simple ruleset.
    Page 2, “Related Work”
  7. A typical pipeline consists of a POS tagger, dependency parser, and semantic role labeler .
    Page 3, “Approaches”
  8. Dependency-based semantic role labeling can be described as a simple structured prediction problem: the predicted structure is a labeled directed graph, where nodes correspond to words in the sentence.
    Page 4, “Approaches”
  9. Semantic Dependency Model As described above, semantic role labeling can be cast as a structured prediction problem where the structure is a labeled semantic dependency graph.
    Page 4, “Approaches”
  10. To compare to prior work (i.e., submissions to the CoNLL-2009 Shared Task), we also consider the joint task of semantic role labeling and predicate sense disambiguation.
    Page 6, “Experiments”
  11. We have compared various approaches for low-resource semantic role labeling at the state-of-the-art level.
    Page 9, “Discussion and Future Work”

See all papers in Proc. ACL 2014 that mention semantic role labeling.

See all papers in Proc. ACL that mention semantic role labeling.

Back to top.

joint models

Appears in 10 sentences as: joint model (4) joint models (6)
In Low-Resource Semantic Role Labeling
  1. 0 Comparison of pipeline and joint models for SRL.
    Page 1, “Introduction”
  2. The joint models use a non-loopy conditional random field (CRF) with a global factor constraining latent syntactic edge variables to form a tree.
    Page 2, “Introduction”
  3. Even at the expense of no dependency path features, the joint models best pipeline-trained models for state-of-the-art performance in the low-resource setting (§ 4.4).
    Page 2, “Introduction”
  4. In both pipeline and joint models , we use features adapted from state-of-the-art approaches to SRL.
    Page 3, “Related Work”
  5. Figure 2 shows the factor graph for this joint model .
    Page 4, “Approaches”
  6. This highlights an important advantage of the pipeline trained model: the features can consider any part of the syntax (e. g., arbitrary sub-trees), whereas the joint model is limited to those features over which it can efficiently marginalize (e.g., short dependency paths).
    Page 6, “Experiments”
  7. In the low-resource setting of the CoNLL-2009 Shared task without syntactic supervision, our joint model (Joint) with marginalized syntax obtains state-of-the-art results with features IGC described in § 4.2.
    Page 7, “Experiments”
  8. These results begin to answer a key research question in this work: The joint models outperform the pipeline models in the low-resource setting.
    Page 7, “Experiments”
  9. We find that we can outperform prior work in the low-resource setting by coupling the selection of feature templates based on information gain with a joint model that marginalizes over latent syntax.
    Page 9, “Discussion and Future Work”
  10. Our discriminative joint models treat latent syntax as a structured-feature to be optimized for the end-task of SRL, while our other grammar induction techniques optimize for unlabeled data likelihood—optionally with distant supervision.
    Page 9, “Discussion and Future Work”

See all papers in Proc. ACL 2014 that mention joint models.

See all papers in Proc. ACL that mention joint models.

Back to top.

bigrams

Appears in 8 sentences as: bigram (2) bigrams (7)
In Low-Resource Semantic Role Labeling
  1. The clusters are formed by a greedy hierachi-cal clustering algorithm that finds an assignment of words to classes by maximizing the likelihood of the training data under a latent-class bigram model.
    Page 3, “Approaches”
  2. First, for SRL, it has been observed that feature bigrams (the concatenation of simple features such as a predicate’s POS tag and an argument’s word) are important for state-of-the-art (Zhao et al., 2009; Bjorkelund et al., 2009).
    Page 5, “Approaches”
  3. We consider both template unigrams and bigrams , combining two templates in sequence.
    Page 5, “Approaches”
  4. Constructing all feature template unigrams and bigrams would yield an unwieldy number of features.
    Page 5, “Approaches”
  5. We therefore determine the top N template bigrams for a dataset and factor a according to an information gain measure (Martins et al., 2011):
    Page 5, “Approaches”
  6. Each of 1G0 and 1GB also include 32 template bigrams selected by information gain on 1000 sentences—we select a different set of template bigrams for each dataset.
    Page 6, “Experiments”
  7. However, the original unigram Bjorkelund features (Bdeflmemh), which were tuned for a high-resource model, obtain higher Fl than our information gain set using the same features in unigram and bigram templates (1GB).
    Page 6, “Experiments”
  8. In Czech, we disallowed template bigrams involving path-grams.
    Page 6, “Experiments”

See all papers in Proc. ACL 2014 that mention bigrams.

See all papers in Proc. ACL that mention bigrams.

Back to top.

treebank

Appears in 8 sentences as: Treebank (2) treebank (4) treebanks (2)
In Low-Resource Semantic Role Labeling
  1. We explore the extent to which high-resource manual annotations such as treebanks are necessary for the task of semantic role labeling (SRL).
    Page 1, “Abstract”
  2. However, richly annotated data such as that provided in parsing treebanks is expensive to produce, and may be tied to specific domains (e.g., newswire).
    Page 1, “Introduction”
  3. (2012) observe that syntax may be treated as latent when a treebank is not available.
    Page 2, “Related Work”
  4. (2011) require an oracle CCG tag dictionary extracted from a treebank .
    Page 2, “Related Work”
  5. There has not yet been a comparison of techniques for SRL that do not rely on a syntactic treebank , and no exploration of probabilistic models for unsupervised grammar induction within an SRL pipeline that we have been able to find.
    Page 2, “Related Work”
  6. To compare with prior approaches that use semantic supervision for grammar induction, we utilize Section 23 of the WSJ portion of the Penn Treebank (Marcus et al., 1993).
    Page 6, “Experiments”
  7. We contrast low-resource (D) and high-resource settings (E), where latter uses a treebank .
    Page 7, “Experiments”
  8. We therefore turn to an analysis of other approaches to grammar induction in Table 8, evaluated on the Penn Treebank .
    Page 9, “Experiments”

See all papers in Proc. ACL 2014 that mention treebank.

See all papers in Proc. ACL that mention treebank.

Back to top.

Shared Task

Appears in 8 sentences as: Shared Task (7) Shared task (1)
In Low-Resource Semantic Role Labeling
  1. Our feature template definitions build from those used by the top performing systems in the CoNLL—2009 Shared Task , Zhao et al.
    Page 5, “Approaches”
  2. To compare to prior work (i.e., submissions to the CoNLL-2009 Shared Task ), we also consider the joint task of semantic role labeling and predicate sense disambiguation.
    Page 6, “Experiments”
  3. The CoNLL-2009 Shared Task (Hajic et al., 2009) dataset contains POS tags, lemmas, morphological features, syntactic dependencies, predicate senses, and semantic roles annotations for 7 languages: Catalan, Chinese, Czech, English, German, Japanese,4 Spanish.
    Page 6, “Experiments”
  4. The CoNLL-2005 and -2008 Shared Task datasets provide English SRL annotation, and for cross dataset comparability we consider only verbal predicates (more details in § 4.4).
    Page 6, “Experiments”
  5. Table 4(b) contrasts our high-resource results for the task of SRL and sense disambiguation with the top systems in the CoNLL-2009 Shared Task , giving further insight into the performance of the simple information gain feature selection technique.
    Page 6, “Experiments”
  6. In the low-resource setting of the CoNLL-2009 Shared task without syntactic supervision, our joint model (Joint) with marginalized syntax obtains state-of-the-art results with features IGC described in § 4.2.
    Page 7, “Experiments”
  7. They report results on Prop-CCGbank (Boxwell and White, 2008), which uses the same training/testing splits as the CoNLL-2005 Shared Task .
    Page 7, “Experiments”
  8. *Indicates the supervised parser outputs provided by the CoNLL’09 Shared Task .
    Page 9, “Experiments”

See all papers in Proc. ACL 2014 that mention Shared Task.

See all papers in Proc. ACL that mention Shared Task.

Back to top.

CRF

Appears in 7 sentences as: CRF (7)
In Low-Resource Semantic Role Labeling
  1. 0 Simpler joint CRF for syntactic and semantic dependency parsing than previously reported.
    Page 2, “Introduction”
  2. The joint models use a non-loopy conditional random field ( CRF ) with a global factor constraining latent syntactic edge variables to form a tree.
    Page 2, “Introduction”
  3. We define a conditional random field ( CRF ) (Lafferty et al., 2001) for this task.
    Page 4, “Approaches”
  4. This model extends the CRF model in Section 3.1 to include the projective syntactic dependency parse for a sentence.
    Page 4, “Approaches”
  5. train our CRF models by maximizing conditional log-likelihood using stochastic gradient descent with an adaptive learning rate (AdaGrad) (Duchi et al., 2011) over mini-batches.
    Page 5, “Approaches”
  6. 3.3 Features for CRF Models
    Page 5, “Approaches”
  7. Similarly, improving the non-convex optimization of our latent-variable CRF (Marginalized) may offer further gains.
    Page 9, “Experiments”

See all papers in Proc. ACL 2014 that mention CRF.

See all papers in Proc. ACL that mention CRF.

Back to top.

distant supervision

Appears in 7 sentences as: (1) distant supervision (6)
In Low-Resource Semantic Role Labeling
  1. In the pipeline models, we develop a novel approach to unsupervised grammar induction and explore performance using SRL as distant supervision .
    Page 2, “Introduction”
  2. In our low-resource pipelines, we assume that the syntactic parser is given no labeled parses—however, it may optionally utilize the semantic parses as distant supervision .
    Page 3, “Related Work”
  3. Grammar induction work has further demonstrated that distant supervision in the form of ACE-style relations (Naseem and Barzilay, 2011) or HTML markup (Spitkovsky et al., 2010b) can lead to considerable gains.
    Page 3, “Related Work”
  4. WSJ"o Distant Supervision SAJM’ 10 44.8 none SAJ’ 13 64.4 none _ SJA’10 _ _ _ _ _ _ ' _ _5_O._4_ _HT_ML_ _ _ ' NB’ 11 59.4 ACE05 _ DMVZbE) _ _ _ _ _ ' _ ‘24—8_ _n_one_ _ _ _ ' DMV+C (bc) 44.8 SRL Marginalized, IGC 48.8 SRL Marginalized, IGB 5 8 .9 SRL
    Page 9, “Experiments”
  5. Interestingly, the marginalized grammars best the DMV grammar induction method; however, this difference is less pronounced when the DMV is constrained using SRL labels as distant supervision .
    Page 9, “Experiments”
  6. We contrast with methods using distant supervision (Naseem and Barzilay, 2011; Spitkovsky et al., 2010b) and fully unsupervised dependency parsing (Spitkovsky et al., 2013).
    Page 9, “Experiments”
  7. Our discriminative joint models treat latent syntax as a structured-feature to be optimized for the end-task of SRL, while our other grammar induction techniques optimize for unlabeled data likelihood—optionally with distant supervision .
    Page 9, “Discussion and Future Work”

See all papers in Proc. ACL 2014 that mention distant supervision.

See all papers in Proc. ACL that mention distant supervision.

Back to top.

feature sets

Appears in 7 sentences as: feature set (3) feature sets (4)
In Low-Resource Semantic Role Labeling
  1. Our primary feature set IGC consists of 127 template unigrams that emphasize coarse properties (i.e., properties 7, 9, and 11 in Table 1).
    Page 6, “Experiments”
  2. We compare against the language-specific feature sets detailed in the literature on high-resource top-performing SRL systems: From Bj o'rkelund et al.
    Page 6, “Experiments”
  3. (2009), these are feature sets for German, English, Spanish and Chinese, obtained by weeks of forward selection (Bdeflmemh); and from Zhao et al.
    Page 6, “Experiments”
  4. Table 4(a) shows the results of our model with gold syntax and a richer feature set than that of Naradowsky et al.
    Page 6, “Experiments”
  5. We find that 1GB obtain higher Fl than the original Bjorkelund feature sets (Bdeflmemh) in the low-resource pipeline setting with constrained grammar induction (DMV+C).
    Page 6, “Experiments”
  6. 6This covers all CoNLL languages but Czech, where feature sets were not made publicly available in either work.
    Page 6, “Experiments”
  7. For these experiments, we utilize the coarse-grained feature set (IGC), which includes Brown clusters.
    Page 8, “Experiments”

See all papers in Proc. ACL 2014 that mention feature sets.

See all papers in Proc. ACL that mention feature sets.

Back to top.

sense disambiguation

Appears in 7 sentences as: sense disambiguation (7)
In Low-Resource Semantic Role Labeling
  1. We optionally include additional variables that perform word sense disambiguation for each predicate.
    Page 4, “Approaches”
  2. To compare to prior work (i.e., submissions to the CoNLL-2009 Shared Task), we also consider the joint task of semantic role labeling and predicate sense disambiguation .
    Page 6, “Experiments”
  3. Table 4(b) contrasts our high-resource results for the task of SRL and sense disambiguation with the top systems in the CoNLL-2009 Shared Task, giving further insight into the performance of the simple information gain feature selection technique.
    Page 6, “Experiments”
  4. Table 5: F1 for SRL approaches (without sense disambiguation ) in matched and mismatched trairfltest settings for CoNLL 2005 span and 2008 head supervision.
    Page 7, “Experiments”
  5. (2011), who evaluate on SRL in isolation (without sense disambiguation , as in CoNLL—2009).
    Page 7, “Experiments”
  6. Each row contains the F1 for SRL only (without sense disambiguation ) where the supervision type of that row and all above it have been removed.
    Page 8, “Experiments”
  7. Fl of SRL only (without sense disambiguation ) shown as the number of training sentences is increased.
    Page 8, “Experiments”

See all papers in Proc. ACL 2014 that mention sense disambiguation.

See all papers in Proc. ACL that mention sense disambiguation.

Back to top.

Viterbi

Appears in 5 sentences as: Viterbi (5)
In Low-Resource Semantic Role Labeling
  1. (2010a) show that Viterbi (hard) EM training of the DMV with simple uniform initialization of the model parameters yields higher accuracy models than standard soft-EM
    Page 2, “Related Work”
  2. In Viterbi EM, the E—step finds the maximum likelihood corpus parse given the current model parameters.
    Page 3, “Related Work”
  3. Unsupervised Grammar Induction Our first method for grammar induction is fully unsupervised Viterbi EM training of the Dependency Model with Valence (DMV) (Klein and Manning, 2004), with uniform initialization of the model parameters.
    Page 3, “Approaches”
  4. Constrained Grammar Induction Our second method, which we will refer to as DMV+C, induces grammar in a distantly supervised fashion by using a constrained parser in the E-step of Viterbi EM.
    Page 4, “Approaches”
  5. We contrast the DMV trained with Viterbi EM+uniform initialization (DMV), our constrained DMV (DMV+C), and our model’s MBR decoding of latent syntax (Marginalized) with other recent work: Spitkovsky et al.
    Page 9, “Experiments”

See all papers in Proc. ACL 2014 that mention Viterbi.

See all papers in Proc. ACL that mention Viterbi.

Back to top.

CoNLL

Appears in 5 sentences as: CoNLL (4) CoNLL’ (1)
In Low-Resource Semantic Role Labeling
  1. (2012) limit their exploration to a small set of basic features, and included high-resource supervision in the form of lemmas, POS tags, and morphology available from the CoNLL 2009 data.
    Page 2, “Related Work”
  2. 4We do not report results on Japanese as that data was only made freely available to researchers that competed in CoNLL 2009.
    Page 6, “Experiments”
  3. 6This covers all CoNLL languages but Czech, where feature sets were not made publicly available in either work.
    Page 6, “Experiments”
  4. Table 5: F1 for SRL approaches (without sense disambiguation) in matched and mismatched trairfltest settings for CoNLL 2005 span and 2008 head supervision.
    Page 7, “Experiments”
  5. Table 7: Unlabeled directed dependency accuracy on CoNLL’ 09 test set in low-resource settings.
    Page 9, “Experiments”

See all papers in Proc. ACL 2014 that mention CoNLL.

See all papers in Proc. ACL that mention CoNLL.

Back to top.

dependency path

Appears in 4 sentences as: dependency path (3) dependency paths (1)
In Low-Resource Semantic Role Labeling
  1. Even at the expense of no dependency path features, the joint models best pipeline-trained models for state-of-the-art performance in the low-resource setting (§ 4.4).
    Page 2, “Introduction”
  2. (2009), who utilize features on syntactic siblings and the dependency path concatenated with the direction of each edge.
    Page 3, “Related Work”
  3. (2009), we include the notion of verb and noun supports and sections of the dependency path .
    Page 5, “Approaches”
  4. This highlights an important advantage of the pipeline trained model: the features can consider any part of the syntax (e. g., arbitrary sub-trees), whereas the joint model is limited to those features over which it can efficiently marginalize (e.g., short dependency paths ).
    Page 6, “Experiments”

See all papers in Proc. ACL 2014 that mention dependency path.

See all papers in Proc. ACL that mention dependency path.

Back to top.

unigrams

Appears in 4 sentences as: unigram (2) unigrams (3)
In Low-Resource Semantic Role Labeling
  1. We consider both template unigrams and bigrams, combining two templates in sequence.
    Page 5, “Approaches”
  2. Constructing all feature template unigrams and bigrams would yield an unwieldy number of features.
    Page 5, “Approaches”
  3. Our primary feature set IGC consists of 127 template unigrams that emphasize coarse properties (i.e., properties 7, 9, and 11 in Table 1).
    Page 6, “Experiments”
  4. However, the original unigram Bjorkelund features (Bdeflmemh), which were tuned for a high-resource model, obtain higher Fl than our information gain set using the same features in unigram and bigram templates (1GB).
    Page 6, “Experiments”

See all papers in Proc. ACL 2014 that mention unigrams.

See all papers in Proc. ACL that mention unigrams.

Back to top.

CCG

Appears in 4 sentences as: CCG (4)
In Low-Resource Semantic Role Labeling
  1. (2011) describe a method for training a semantic role labeler by extracting features from a packed CCG parse chart, where the parse weights are given by a simple ruleset.
    Page 2, “Related Work”
  2. (2011) require an oracle CCG tag dictionary extracted from a treebank.
    Page 2, “Related Work”
  3. 7The comparison is imperfect for two reasons: first, the CCGBank contains only 99.44% of the original PTB sentences (Hockenmaier and Steedman, 2007); second, because PropB ank was annotated over CFGs, after converting to CCG only 99.977% of the argument spans were exact matches (Boxwell and White, 2008).
    Page 7, “Experiments”
  4. (2011) (B’11) uses additional supervision in the form of a CCG tag dictionary derived from supervised data with (tdc) and without (tc) a cutoff.
    Page 8, “Experiments”

See all papers in Proc. ACL 2014 that mention CCG.

See all papers in Proc. ACL that mention CCG.

Back to top.

semantic relationship

Appears in 3 sentences as: semantic relationship (3)
In Low-Resource Semantic Role Labeling
  1. The label on the edge indicates the type of semantic relationship .
    Page 4, “Approaches”
  2. Because each word in a sentence may be in a semantic relationship with any other word (including itself), a sentence of length n has n2 possible edges.
    Page 4, “Approaches”
  3. In this way, we jointly perform identification (determining whether a semantic relationship exists) and classification (determining the semantic label).
    Page 4, “Approaches”

See all papers in Proc. ACL 2014 that mention semantic relationship.

See all papers in Proc. ACL that mention semantic relationship.

Back to top.

model parameters

Appears in 3 sentences as: model parameters (3)
In Low-Resource Semantic Role Labeling
  1. (2010a) show that Viterbi (hard) EM training of the DMV with simple uniform initialization of the model parameters yields higher accuracy models than standard soft-EM
    Page 2, “Related Work”
  2. In Viterbi EM, the E—step finds the maximum likelihood corpus parse given the current model parameters .
    Page 3, “Related Work”
  3. Unsupervised Grammar Induction Our first method for grammar induction is fully unsupervised Viterbi EM training of the Dependency Model with Valence (DMV) (Klein and Manning, 2004), with uniform initialization of the model parameters .
    Page 3, “Approaches”

See all papers in Proc. ACL 2014 that mention model parameters.

See all papers in Proc. ACL that mention model parameters.

Back to top.

syntactic parser

Appears in 3 sentences as: syntactic parser (1) syntactic parses (1) syntactically parses (1)
In Low-Resource Semantic Role Labeling
  1. In this simple pipeline, the first stage syntactically parses the corpus, and the second stage predicts semantic predicate-argument structure for each sentence using the labels of the first stage as features.
    Page 3, “Related Work”
  2. In our low-resource pipelines, we assume that the syntactic parser is given no labeled parses—however, it may optionally utilize the semantic parses as distant supervision.
    Page 3, “Related Work”
  3. In Section 3.1, we introduced pipeline-trained models for SRL, which used grammar induction to predict unlabeled syntactic parses .
    Page 4, “Approaches”

See all papers in Proc. ACL 2014 that mention syntactic parser.

See all papers in Proc. ACL that mention syntactic parser.

Back to top.

unlabeled data

Appears in 3 sentences as: unlabeled data (3)
In Low-Resource Semantic Role Labeling
  1. We utilize unlabeled data in both generative and discriminative models for dependency syntax and in generative word clustering.
    Page 9, “Discussion and Future Work”
  2. Our discriminative joint models treat latent syntax as a structured-feature to be optimized for the end-task of SRL, while our other grammar induction techniques optimize for unlabeled data likelihood—optionally with distant supervision.
    Page 9, “Discussion and Future Work”
  3. We observe that careful use of these unlabeled data resources can improve performance on the end task.
    Page 9, “Discussion and Future Work”

See all papers in Proc. ACL 2014 that mention unlabeled data.

See all papers in Proc. ACL that mention unlabeled data.

Back to top.