Open-Domain Semantic Role Labeling by Modeling Word Spans
Huang, Fei and Yates, Alexander

Article Structure

Abstract

Most supervised language processing systems show a significant drop-off in performance when they are tested on text that comes from a domain significantly different from the domain of the training data.

Introduction

In recent semantic role labeling (SRL) competitions such as the shared tasks of CoNLL 2005 and CoNLL 2008, supervised SRL systems have been trained on newswire text, and then tested on both an in-domain test set (Wall Street Journal text) and an out-of-domain test set (fiction).

Topics

CoNLL

Appears in 9 sentences as: CoNLL (10)
In Open-Domain Semantic Role Labeling by Modeling Word Spans
  1. In recent semantic role labeling (SRL) competitions such as the shared tasks of CoNLL 2005 and CoNLL 2008, supervised SRL systems have been trained on newswire text, and then tested on both an in-domain test set (Wall Street Journal text) and an out-of-domain test set (fiction).
    Page 1, “Introduction”
  2. Yet the baseline from CoNLL 2005 suggests that the fiction texts are actually easier than the newswire texts.
    Page 1, “Introduction”
  3. We test our open-domain semantic role labeling system using data from the CoNLL 2005 shared task (Carreras and Marquez, 2005).
    Page 2, “Introduction”
  4. We perform our tests on the Brown corpus (Kucera and Francis, 1967) test data from CoNLL 2005, consisting of 3 sections (ck01-ck03) of propbanked Brown corpus data.
    Page 2, “Introduction”
  5. Like the best systems from the CoNLL 2005 shared task (Punyakanok et al., 2008; Pradhan et al., 2005), they also use features from multiple parses to remain robust in the face of parser error.
    Page 3, “Introduction”
  6. In order to perform true open-domain SRL, we must first consider a task which is not formally part of the CoNLL shared task: the task of identifying predicates in a given sentence.
    Page 3, “Introduction”
  7. Our first step is therefore to build an SRL system that relies on partial parsing, as was done in CoNLL 2004 (Carreras and Marquez, 2004).
    Page 4, “Introduction”
  8. Results for the Span-HMM models on the CoNLL 2005 Brown corpus are shown in Table 3.
    Page 7, “Introduction”
  9. Deschact and Moens (2009) use a latent-variable language model to provide features for an SRL system, and they show on CoNLL 2008 data that they can significantly improve performance when little labeled training data is available.
    Page 9, “Introduction”

See all papers in Proc. ACL 2010 that mention CoNLL.

See all papers in Proc. ACL that mention CoNLL.

Back to top.

semantic role

Appears in 9 sentences as: Semantic Role (1) Semantic role (1) semantic role (7)
In Open-Domain Semantic Role Labeling by Modeling Word Spans
  1. Semantic role labeling techniques are typically trained on newswire text, and in tests their performance on fiction is as much as 19% worse than their performance on newswire text.
    Page 1, “Abstract”
  2. We investigate techniques for building open-domain semantic role labeling systems that approach the ideal of a train-once, use-anywhere system.
    Page 1, “Abstract”
  3. We leverage recently-developed techniques for learning representations of text using latent-variable language models, and extend these techniques to ones that provide the kinds of features that are useful for semantic role labeling.
    Page 1, “Abstract”
  4. In recent semantic role labeling (SRL) competitions such as the shared tasks of CoNLL 2005 and CoNLL 2008, supervised SRL systems have been trained on newswire text, and then tested on both an in-domain test set (Wall Street Journal text) and an out-of-domain test set (fiction).
    Page 1, “Introduction”
  5. We test our open-domain semantic role labeling system using data from the CoNLL 2005 shared task (Carreras and Marquez, 2005).
    Page 2, “Introduction”
  6. Owing to the established difficulty of the Brown test set and the different domains of the Brown test and WSJ training data, this dataset makes for an excellent testbed for open-domain semantic role labeling.
    Page 3, “Introduction”
  7. 5 Semantic Role Labeling with HMM-based Representations
    Page 3, “Introduction”
  8. question for future work is whether a less sparse model of the spans themselves, such as a Na'1've Bayes model for the span node, would yield a better clustering for producing features for semantic role labeling.
    Page 9, “Introduction”
  9. Furstenau and Lapata (2009b; 2009a) use semi-supervised techniques to automatically annotate data for previously unseen predicates with semantic role information.
    Page 9, “Introduction”

See all papers in Proc. ACL 2010 that mention semantic role.

See all papers in Proc. ACL that mention semantic role.

Back to top.

in-domain

Appears in 8 sentences as: in-domain (8)
In Open-Domain Semantic Role Labeling by Modeling Word Spans
  1. In recent semantic role labeling (SRL) competitions such as the shared tasks of CoNLL 2005 and CoNLL 2008, supervised SRL systems have been trained on newswire text, and then tested on both an in-domain test set (Wall Street Journal text) and an out-of-domain test set (fiction).
    Page 1, “Introduction”
  2. We aim to build an open-domain supervised SRL system; that is, one whose performance on out-of-domain tests approaches the same level of performance as that of state-of-the-art systems on in-domain tests.
    Page 1, “Introduction”
  3. Experiments on out-of-domain test sets show that our learned representations can dramatically improve out—of-domain performance, and narrow the gap between in-domain and out-of-domain performance by half.
    Page 1, “Introduction”
  4. The data also includes a second test set of in-domain text (section 23 of the Treebank), which we refer to as the WSJ test set and use as a reference point.
    Page 3, “Introduction”
  5. For predicates that never or rarely appear in training, the HMM features increase Fl by 4.2, and they increase the overall F1 of the system by 3.5 to 93.5, which approaches the F1 of 94.7 that the Baseline system achieves on the in-domain WSJ test set.
    Page 3, “Introduction”
  6. Table 4: Multi-Span-HMM has a much smaller drop-off in F1 than comparable systems on out-of-domain test data vs in-domain test data.
    Page 7, “Introduction”
  7. (2008) use deep learning techniques based on semi-supervised em-beddings to improve an SRL system, though their tests are on in-domain data.
    Page 9, “Introduction”
  8. The disparity in performance between in-domain and out-of-domain tests is by no means restricted to SRL.
    Page 9, “Introduction”

See all papers in Proc. ACL 2010 that mention in-domain.

See all papers in Proc. ACL that mention in-domain.

Back to top.

semantic role labeling

Appears in 8 sentences as: Semantic Role Labeling (1) Semantic role labeling (1) semantic role labeling (6)
In Open-Domain Semantic Role Labeling by Modeling Word Spans
  1. Semantic role labeling techniques are typically trained on newswire text, and in tests their performance on fiction is as much as 19% worse than their performance on newswire text.
    Page 1, “Abstract”
  2. We investigate techniques for building open-domain semantic role labeling systems that approach the ideal of a train-once, use-anywhere system.
    Page 1, “Abstract”
  3. We leverage recently-developed techniques for learning representations of text using latent-variable language models, and extend these techniques to ones that provide the kinds of features that are useful for semantic role labeling .
    Page 1, “Abstract”
  4. In recent semantic role labeling (SRL) competitions such as the shared tasks of CoNLL 2005 and CoNLL 2008, supervised SRL systems have been trained on newswire text, and then tested on both an in-domain test set (Wall Street Journal text) and an out-of-domain test set (fiction).
    Page 1, “Introduction”
  5. We test our open-domain semantic role labeling system using data from the CoNLL 2005 shared task (Carreras and Marquez, 2005).
    Page 2, “Introduction”
  6. Owing to the established difficulty of the Brown test set and the different domains of the Brown test and WSJ training data, this dataset makes for an excellent testbed for open-domain semantic role labeling .
    Page 3, “Introduction”
  7. 5 Semantic Role Labeling with HMM-based Representations
    Page 3, “Introduction”
  8. question for future work is whether a less sparse model of the spans themselves, such as a Na'1've Bayes model for the span node, would yield a better clustering for producing features for semantic role labeling .
    Page 9, “Introduction”

See all papers in Proc. ACL 2010 that mention semantic role labeling.

See all papers in Proc. ACL that mention semantic role labeling.

Back to top.

role labeling

Appears in 8 sentences as: Role Labeling (1) role labeling (7)
In Open-Domain Semantic Role Labeling by Modeling Word Spans
  1. Semantic role labeling techniques are typically trained on newswire text, and in tests their performance on fiction is as much as 19% worse than their performance on newswire text.
    Page 1, “Abstract”
  2. We investigate techniques for building open-domain semantic role labeling systems that approach the ideal of a train-once, use-anywhere system.
    Page 1, “Abstract”
  3. We leverage recently-developed techniques for learning representations of text using latent-variable language models, and extend these techniques to ones that provide the kinds of features that are useful for semantic role labeling .
    Page 1, “Abstract”
  4. In recent semantic role labeling (SRL) competitions such as the shared tasks of CoNLL 2005 and CoNLL 2008, supervised SRL systems have been trained on newswire text, and then tested on both an in-domain test set (Wall Street Journal text) and an out-of-domain test set (fiction).
    Page 1, “Introduction”
  5. We test our open-domain semantic role labeling system using data from the CoNLL 2005 shared task (Carreras and Marquez, 2005).
    Page 2, “Introduction”
  6. Owing to the established difficulty of the Brown test set and the different domains of the Brown test and WSJ training data, this dataset makes for an excellent testbed for open-domain semantic role labeling .
    Page 3, “Introduction”
  7. 5 Semantic Role Labeling with HMM-based Representations
    Page 3, “Introduction”
  8. question for future work is whether a less sparse model of the spans themselves, such as a Na'1've Bayes model for the span node, would yield a better clustering for producing features for semantic role labeling .
    Page 9, “Introduction”

See all papers in Proc. ACL 2010 that mention role labeling.

See all papers in Proc. ACL that mention role labeling.

Back to top.

CRF

Appears in 7 sentences as: CRF (9)
In Open-Domain Semantic Role Labeling by Modeling Word Spans
  1. To address the task of open-domain predicate identification, we construct a Conditional Random Field ( CRF ) (Lafferty et al., 2001) model with target labels of B-Pred, I-Pred, and O-Pred (for the beginning, interior, and outside of a predicate).
    Page 3, “Introduction”
  2. We use an open source CRF software package to implement our CRF models.1 We use words, POS tags, chunk labels, and the predicate label at the preceding and following nodes as features for our Baseline system.
    Page 3, “Introduction”
  3. We refer to the CRF model with these features as our Baseline SRL system; in what follows we extend the Baseline model with more sophisticated features.
    Page 4, “Introduction”
  4. Despite partial success in improving our baseline SRL system with path features, these features still suffer from data sparsity — many paths in the test set are never or very rarely observed during training, so the CRF model has little or no data points from which to estimate accurate parameters for these features.
    Page 5, “Introduction”
  5. While these new features may be very helpful, ideally we would like our learned representations to produce multiple useful features for the CRF model, so that the CRF can combine the signals from each feature to learn a sophisticated model.
    Page 7, “Introduction”
  6. We call the CRF model with these N Span-HMM features the Multi-Span-HMM model(MSH); in experiments we use N = 5.
    Page 7, “Introduction”
  7. Developing techniques that can incrementally adapt to new domains without the computational expense of retraining the CRF model every time would help make open-domain SRL more practical.
    Page 9, “Introduction”

See all papers in Proc. ACL 2010 that mention CRF.

See all papers in Proc. ACL that mention CRF.

Back to top.

language models

Appears in 7 sentences as: language model (1) Language Models (1) language models (5)
In Open-Domain Semantic Role Labeling by Modeling Word Spans
  1. We leverage recently-developed techniques for learning representations of text using latent-variable language models , and extend these techniques to ones that provide the kinds of features that are useful for semantic role labeling.
    Page 1, “Abstract”
  2. Using latent-variable language models , we learn representations of texts that provide novel kinds of features to our supervised learning algorithms.
    Page 1, “Introduction”
  3. The next section provides background information on learning representations for NLP tasks using latent-variable language models .
    Page 1, “Introduction”
  4. 2 Open-Domain Representations Using Latent-Variable Language Models
    Page 2, “Introduction”
  5. show how to build systems that learn new representations for open-domain NLP using latent-variable language models like Hidden Markov Models (HMMs).
    Page 2, “Introduction”
  6. Deschact and Moens (2009) use a latent-variable language model to provide features for an SRL system, and they show on CoNLL 2008 data that they can significantly improve performance when little labeled training data is available.
    Page 9, “Introduction”
  7. They use HMM language models trained on unlabeled text, much like we use in our baseline systems, but they do not consider models of word spans, which we found to be most beneficial.
    Page 9, “Introduction”

See all papers in Proc. ACL 2010 that mention language models.

See all papers in Proc. ACL that mention language models.

Back to top.

Baseline system

Appears in 6 sentences as: Baseline system (3) baseline system (1) baseline systems (2)
In Open-Domain Semantic Role Labeling by Modeling Word Spans
  1. We use an open source CRF software package to implement our CRF models.1 We use words, POS tags, chunk labels, and the predicate label at the preceding and following nodes as features for our Baseline system .
    Page 3, “Introduction”
  2. For predicates that never or rarely appear in training, the HMM features increase Fl by 4.2, and they increase the overall F1 of the system by 3.5 to 93.5, which approaches the F1 of 94.7 that the Baseline system achieves on the in-domain WSJ test set.
    Page 3, “Introduction”
  3. Table 2 shows the performance of our three baseline systems .
    Page 5, “Introduction”
  4. Most of our improvement over the Baseline system comes from the core arguments A0 and A1, but also from a few adjunct types like AM-TMP and AM-LOC.
    Page 7, “Introduction”
  5. Our baseline system decreases in F-score from 81.5 to 78.9 for argument identification, but suffers a much larger 8% drop in argument classification.
    Page 7, “Introduction”
  6. They use HMM language models trained on unlabeled text, much like we use in our baseline systems , but they do not consider models of word spans, which we found to be most beneficial.
    Page 9, “Introduction”

See all papers in Proc. ACL 2010 that mention Baseline system.

See all papers in Proc. ACL that mention Baseline system.

Back to top.

POS tags

Appears in 5 sentences as: POS tag (1) POS tagger (1) POS tagging (1) POS tags (2)
In Open-Domain Semantic Role Labeling by Modeling Word Spans
  1. For the chunker and POS tagger , the drop-offs are less severe: 94.89 to 91.73, and 97.36 to 94.73.
    Page 3, “Introduction”
  2. We use an open source CRF software package to implement our CRF models.1 We use words, POS tags , chunk labels, and the predicate label at the preceding and following nodes as features for our Baseline system.
    Page 3, “Introduction”
  3. 0 P05 before, after predicate: the POS tag of the tokens immediately preceding and following the predicate
    Page 4, “Introduction”
  4. Given an input sentence labeled with POS tags , and chunks, we construct path features for a token 212,- by concatenating words (or tags or chunk labels) between 212,- and the predicate.
    Page 5, “Introduction”
  5. Numerous domain adaptation techniques have been developed to address this problem, including self-training (McClosky et al., 2006) and instance weighting (Bacchiani et al., 2006) for parser adaptation and structural correspondence learning for POS tagging (Blitzer et al., 2006).
    Page 9, “Introduction”

See all papers in Proc. ACL 2010 that mention POS tags.

See all papers in Proc. ACL that mention POS tags.

Back to top.

precision and recall

Appears in 4 sentences as: precision and recall (4)
In Open-Domain Semantic Role Labeling by Modeling Word Spans
  1. do not report (NR) separate values for precision and recall on this dataset.
    Page 5, “Introduction”
  2. Differences in both precision and recall between the baseline and the other systems are statistically significant at p < 0.01 using the two-tailed Fisher’s exact test.
    Page 5, “Introduction”
  3. Differences in both precision and recall between the baseline and the Span-HMM systems are statistically significant at p < 0.01 using the two-tailed Fisher’s exact test.
    Page 7, “Introduction”
  4. Using multiple independent copies of the Span-HMMs provides a small (0.7) gain in precision and recall .
    Page 7, “Introduction”

See all papers in Proc. ACL 2010 that mention precision and recall.

See all papers in Proc. ACL that mention precision and recall.

Back to top.

part-of-speech

Appears in 4 sentences as: part-of-speech (4)
In Open-Domain Semantic Role Labeling by Modeling Word Spans
  1. Similar representations have proven useful in domain-adaptation for part-of-speech tagging and phrase chunking (Huang and Yates, 2009).
    Page 1, “Introduction”
  2. Every sentence in the dataset is automatically annotated with a number of NLP pipeline systems, including part-of-speech (POS) tags, phrase chunk labels (Carreras and Marquez, 2003), named-entity tags, and full parse information by multiple parsers.
    Page 3, “Introduction”
  3. As with our other HMM-based models, we use the largest number of latent states that will allow the resulting model to fit in our machine’s memory — our previous experiments on representations for part-of-speech tagging suggest that more latent states are usually better.
    Page 6, “Introduction”
  4. Past research in a variety of NLP tasks has shown that parsers (Gildea, 2001), chunkers (Huang and Yates, 2009), part-of-speech taggers (Blitzer et al., 2006), named-entity taggers (Downey et al., 2007a), and word sense disambiguation systems (Escudero et al., 2000) all suffer from a similar drop-off in performance on out-of-domain tests.
    Page 9, “Introduction”

See all papers in Proc. ACL 2010 that mention part-of-speech.

See all papers in Proc. ACL that mention part-of-speech.

Back to top.

shared task

Appears in 4 sentences as: shared task (3) shared tasks (1)
In Open-Domain Semantic Role Labeling by Modeling Word Spans
  1. In recent semantic role labeling (SRL) competitions such as the shared tasks of CoNLL 2005 and CoNLL 2008, supervised SRL systems have been trained on newswire text, and then tested on both an in-domain test set (Wall Street Journal text) and an out-of-domain test set (fiction).
    Page 1, “Introduction”
  2. We test our open-domain semantic role labeling system using data from the CoNLL 2005 shared task (Carreras and Marquez, 2005).
    Page 2, “Introduction”
  3. Like the best systems from the CoNLL 2005 shared task (Punyakanok et al., 2008; Pradhan et al., 2005), they also use features from multiple parses to remain robust in the face of parser error.
    Page 3, “Introduction”
  4. In order to perform true open-domain SRL, we must first consider a task which is not formally part of the CoNLL shared task : the task of identifying predicates in a given sentence.
    Page 3, “Introduction”

See all papers in Proc. ACL 2010 that mention shared task.

See all papers in Proc. ACL that mention shared task.

Back to top.

Viterbi

Appears in 4 sentences as: Viterbi (4)
In Open-Domain Semantic Role Labeling by Modeling Word Spans
  1. The Viterbi algorithm (Rabiner, 1989) can then be used to produce the optimal sequence of latent states 3,- for a given instance X.
    Page 2, “Introduction”
  2. To learn an open-domain representation, we then trained an 80 state HMM on the unlabeled texts of the training and Brown test data, and used the Viterbi optimum states of each word as categorical features.
    Page 3, “Introduction”
  3. Span-HMMs can be used to provide a single categorical value for any span of a sentence using the usual Viterbi algorithm for HMMs.
    Page 5, “Introduction”
  4. We determine the Viterbi optimal state of this span node, and use that state as the value of the new feature.
    Page 5, “Introduction”

See all papers in Proc. ACL 2010 that mention Viterbi.

See all papers in Proc. ACL that mention Viterbi.

Back to top.

part-of-speech tagging

Appears in 3 sentences as: part-of-speech taggers (1) part-of-speech tagging (2)
In Open-Domain Semantic Role Labeling by Modeling Word Spans
  1. Similar representations have proven useful in domain-adaptation for part-of-speech tagging and phrase chunking (Huang and Yates, 2009).
    Page 1, “Introduction”
  2. As with our other HMM-based models, we use the largest number of latent states that will allow the resulting model to fit in our machine’s memory — our previous experiments on representations for part-of-speech tagging suggest that more latent states are usually better.
    Page 6, “Introduction”
  3. Past research in a variety of NLP tasks has shown that parsers (Gildea, 2001), chunkers (Huang and Yates, 2009), part-of-speech taggers (Blitzer et al., 2006), named-entity taggers (Downey et al., 2007a), and word sense disambiguation systems (Escudero et al., 2000) all suffer from a similar drop-off in performance on out-of-domain tests.
    Page 9, “Introduction”

See all papers in Proc. ACL 2010 that mention part-of-speech tagging.

See all papers in Proc. ACL that mention part-of-speech tagging.

Back to top.

parse trees

Appears in 3 sentences as: parse tree (1) parse trees (2)
In Open-Domain Semantic Role Labeling by Modeling Word Spans
  1. We then gradually add in less-sparse alternatives for the syntactic features that previous systems derive from parse trees .
    Page 4, “Introduction”
  2. In standard SRL systems, these path features usually consist of a sequence of constituent parse nodes representing the shortest path through the parse tree between a word and the predicate (Gildea and Jurafsky, 2002).
    Page 5, “Introduction”
  3. We substitute paths that do not depend on parse trees .
    Page 5, “Introduction”

See all papers in Proc. ACL 2010 that mention parse trees.

See all papers in Proc. ACL that mention parse trees.

Back to top.

latent variable

Appears in 3 sentences as: latent variable (3)
In Open-Domain Semantic Role Labeling by Modeling Word Spans
  1. An HMM is a generative probabilistic model that generates each word 5137; in the corpus conditioned on a latent variable Y}.
    Page 2, “Introduction”
  2. Each Y; in the model takes on integral values from 1 to K, and each one is generated by the latent variable for the preceding word, Y};_1.
    Page 2, “Introduction”
  3. In response, we introduce latent variable models of word spans, or sequences of words.
    Page 5, “Introduction”

See all papers in Proc. ACL 2010 that mention latent variable.

See all papers in Proc. ACL that mention latent variable.

Back to top.

semi-supervised

Appears in 3 sentences as: semi-supervised (3)
In Open-Domain Semantic Role Labeling by Modeling Word Spans
  1. Furstenau and Lapata (2009b; 2009a) use semi-supervised techniques to automatically annotate data for previously unseen predicates with semantic role information.
    Page 9, “Introduction”
  2. (2008) use deep learning techniques based on semi-supervised em-beddings to improve an SRL system, though their tests are on in-domain data.
    Page 9, “Introduction”
  3. Unsupervised SRL systems (Swier and Stevenson, 2004; Grenager and Manning, 2006; Abend et al., 2009) can naturally be ported to new domains with little trouble, but their accuracy thus far falls short of state-of-the-art supervised and semi-supervised systems.
    Page 9, “Introduction”

See all papers in Proc. ACL 2010 that mention semi-supervised.

See all papers in Proc. ACL that mention semi-supervised.

Back to top.

statistically significant

Appears in 3 sentences as: statistically significant (3)
In Open-Domain Semantic Role Labeling by Modeling Word Spans
  1. Differences in both precision and recall between the baseline and the other systems are statistically significant at p < 0.01 using the two-tailed Fisher’s exact test.
    Page 5, “Introduction”
  2. Differences in both precision and recall between the baseline and the Span-HMM systems are statistically significant at p < 0.01 using the two-tailed Fisher’s exact test.
    Page 7, “Introduction”
  3. were not statistically significant , except that the difference in precision between the Multi-Span-HMM and the Span-HMM-Base10 is significant at p < .1.
    Page 7, “Introduction”

See all papers in Proc. ACL 2010 that mention statistically significant.

See all papers in Proc. ACL that mention statistically significant.

Back to top.