Discriminative Strategies to Integrate Multiword Expression Recognition and Parsing
Constant, Matthieu and Sigogne, Anthony and Watrin, Patrick

Article Structure

Abstract

The integration of multiword expressions in a parsing procedure has been shown to improve accuracy in an artificial context where such expressions have been perfectly pre-identified.

Introduction

The integration of Multiword Expressions (MWE) in real-life applications is crucial because such expressions have the particularity of having a certain level of idiomaticity.

Multiword expressions

2.1 Overview

Two strategies, two discriminative models

3.1 Pre-grouping Multiword Expressions

Resources

4.1 Corpus

MWE-dedicated Features

The two discriminative models described in section 3 require MWE-dedicated features.

Evaluation

6.1 Experiment Setup

Conclusions and Future Work

In this paper, we evaluated two discriminative strategies to integrate Multiword Expression Recognition in probabilistic parsing: (a) pre-grouping MWEs with a state-of-the-art recognizer and (b) MWE identification with a reranker after parsing.

Topics

reranker

Appears in 20 sentences as: RERANKER (1) Reranker (1) reranker (11) Reranking (1) reranking (8)
In Discriminative Strategies to Integrate Multiword Expression Recognition and Parsing
  1. Secondly, integrating multiword expressions in the parser grammar followed by a reranker specific to such expressions slightly improves all evaluation metrics.
    Page 1, “Abstract”
  2. Our proposal is to evaluate two discriminative strategies in a real constituency parsing context: (a) pre-grouping MWE before parsing; this would be done with a state-of-the-art recognizer based on Conditional Random Fields; (b) parsing with a grammar including MWE identification and then reranking the output parses thanks to a Maximum Entropy model integrating MWE-dedicated features.
    Page 1, “Introduction”
  3. 3.2 Reranking
    Page 3, “Two strategies, two discriminative models”
  4. Discriminative reranking consists in reranking the n-best parses of a baseline parser with a discriminative model, hence integrating features associated with each node of the candidate parses.
    Page 3, “Two strategies, two discriminative models”
  5. Formally, given a sentence 8, the reranker selects the best candidate parse p among a set of candidates P (s) with respect to a scoring function V9:
    Page 3, “Two strategies, two discriminative models”
  6. In this paper, we slightly deviate from the original reranker usage, by focusing on improving MWER in the context of parsing.
    Page 4, “Two strategies, two discriminative models”
  7. In order to make these models comparable, we use two comparable sets of feature templates: one adapted to sequence labelling (CRF—based MWER) and the other one adapted to reranking (MaXEnt-based reranker ).
    Page 5, “MWE-dedicated Features”
  8. The reranker templates are instantiated only for the nodes of the candidate parse tree, which are leaves dominated by a MWE node (i.e.
    Page 5, “MWE-dedicated Features”
  9. o RERANKER : for each leaf (in position 77.)
    Page 5, “MWE-dedicated Features”
  10. The reranker models integrate features associated with each MWE node, the value of which is the compound itself.
    Page 5, “MWE-dedicated Features”
  11. Table 2: Feature templates (f) used both in the MWER and the reranker models: n is the current position in the sentence, is the word at position i; is the part-of-speech tag of w(z’); if the word at absolute position i is part of a compound in the Shortest Path Segmentation, mwt(i) and mws(i) are respectively the part-of-speech tag and the internal structure of the compound, mwpos(i) indicates its relative position in the compound (B or I).
    Page 6, “MWE-dedicated Features”

See all papers in Proc. ACL 2012 that mention reranker.

See all papers in Proc. ACL that mention reranker.

Back to top.

part-of-speech

Appears in 12 sentences as: part-of-speech (14)
In Discriminative Strategies to Integrate Multiword Expression Recognition and Parsing
  1. In this paper, we focus on contiguous MWEs that form a lexical unit which can be marked by a part-of-speech tag (e. g. at night is an adverb, because of is a preposition).
    Page 2, “Multiword expressions”
  2. Constant and Sigogne (2011) proposed to combine MWE segmentation and part-of-speech tagging into a single sequence labelling task by assigning to each token a tag of the form TAG+X where TAG is the part-of-speech (POS) of the leXical unit the token belongs to and X is either B (i.e.
    Page 3, “Two strategies, two discriminative models”
  3. Compounds are identified with a specific nonterminal symbol ”MWX” where X is the part-of-speech of the expression.
    Page 4, “Resources”
  4. They have a flat structure made of the part-of-speech of their components as shown in figure 1.
    Page 4, “Resources”
  5. The nonterminal tagset is composed of 14 part-of-speech labels and 24 phrasal ones (including 11 MWE labels).
    Page 4, “Resources”
  6. In both, lexical entries are composed of a inflected form, a lemma, a part-of-speech and morphological features.
    Page 4, “Resources”
  7. We use part-of-speech unigrams and bigrams in order to capture MWEs with irregular syntactic structures that might indicate the id-iomacity of a word sequence.
    Page 5, “MWE-dedicated Features”
  8. We also integrated mixed bigrams made up of a word and a part-of-speech .
    Page 5, “MWE-dedicated Features”
  9. We associate each word with its part-of-speech tags found in our external morphological lexicon.
    Page 5, “MWE-dedicated Features”
  10. This segmentation is also a source of features: a word belonging to a compound segment is assigned different properties such as the segment part-of-speech mwt and its syntactic structure mws encoded in the lexical resource, its relative position mwpos in the segment (’B’ or ’I’).
    Page 6, “MWE-dedicated Features”
  11. Table 2: Feature templates (f) used both in the MWER and the reranker models: n is the current position in the sentence, is the word at position i; is the part-of-speech tag of w(z’); if the word at absolute position i is part of a compound in the Shortest Path Segmentation, mwt(i) and mws(i) are respectively the part-of-speech tag and the internal structure of the compound, mwpos(i) indicates its relative position in the compound (B or I).
    Page 6, “MWE-dedicated Features”

See all papers in Proc. ACL 2012 that mention part-of-speech.

See all papers in Proc. ACL that mention part-of-speech.

Back to top.

treebank

Appears in 11 sentences as: Treebank (5) treebank (6)
In Discriminative Strategies to Integrate Multiword Expression Recognition and Parsing
  1. The grammar was trained with a reference treebank where MWEs were annotated with a specific nonterminal node.
    Page 1, “Introduction”
  2. The experiments were carried out on the French Treebank (Abeille et al., 2003) where MWEs are annotated.
    Page 1, “Introduction”
  3. (2011) confirmed these bad results on the French Treebank .
    Page 2, “Multiword expressions”
  4. They show a general tagging accuracy of 94% on the French Treebank .
    Page 2, “Multiword expressions”
  5. To do so, the MWEs in the training treebank were annotated with specific nonterminal nodes.
    Page 3, “Multiword expressions”
  6. The vector 6 is estimated during the training stage from a reference treebank and the baseline parser ouputs.
    Page 4, “Two strategies, two discriminative models”
  7. The French Treebank is composed of 435,860 lexical units (34,178 types).
    Page 4, “Resources”
  8. In order to compare compounds in these lexical resources with the ones in the French Treebank , we applied on the development corpus the dictionaries and the lexicon extracted from the training corpus.
    Page 4, “Resources”
  9. The authors provided us with a list of 17,315 candidate nominal collocations occurring in the French treebank with their log-likelihood and their internal flat structure.
    Page 5, “Resources”
  10. In our collocation resource, each candidate collocation of the French treebank is associated with its internal syntactic structure and its association score (log-likelihood).
    Page 6, “MWE-dedicated Features”
  11. The authors are very grateful to Spence Green for his useful help on the treebank , and to Jennifer Thewis-sen for her careful proofreading.
    Page 9, “Conclusions and Future Work”

See all papers in Proc. ACL 2012 that mention treebank.

See all papers in Proc. ACL that mention treebank.

Back to top.

statistically significant

Appears in 7 sentences as: statistical significance (2) statistically significant (6)
In Discriminative Strategies to Integrate Multiword Expression Recognition and Parsing
  1. However, it has no statistically significant impact in terms of F-score as incorrect multiword expression recognition has important side effects on parsing.
    Page 1, “Abstract”
  2. In order to establish the statistical significance of results between two parsing experiments in terms of F1 and UAS, we used a unidirectional t-test for two independent samples”.
    Page 7, “Evaluation”
  3. The statistical significance between two MWE identification experiments was established by using the McNemar—s test (Gillick and Cox, 1989).
    Page 7, “Evaluation”
  4. The results of the two experiments are considered statistically significant with the computed value p < 0.01.
    Page 7, “Evaluation”
  5. The differences between all systems are statistically significant with respect to McNemar’s test (Gillick and Cox, 1989), except lex/all and all/coll; lex/ coll is ”borderline”.
    Page 7, “Evaluation”
  6. Furthermore, pre-grouping has no statistically significant impact on the F-score14, whereas reranking leads to a statistically significant improvement (except for collocations).
    Page 8, “Evaluation”
  7. Both strategies also lead to a statistically significant UAS increase.
    Page 8, “Evaluation”

See all papers in Proc. ACL 2012 that mention statistically significant.

See all papers in Proc. ACL that mention statistically significant.

Back to top.

bigrams

Appears in 5 sentences as: bigram (1) bigrams (4)
In Discriminative Strategies to Integrate Multiword Expression Recognition and Parsing
  1. We use word unigrams and bigrams in order to capture multiwords present in the training section and to extract lexical cues to discover new MWEs.
    Page 5, “MWE-dedicated Features”
  2. For instance, the bigram coup de is often the prefix of compounds such as coup de pied (kick), coup de foudre (love at first sight), coup de main (help).
    Page 5, “MWE-dedicated Features”
  3. We use part-of-speech unigrams and bigrams in order to capture MWEs with irregular syntactic structures that might indicate the id-iomacity of a word sequence.
    Page 5, “MWE-dedicated Features”
  4. We also integrated mixed bigrams made up of a word and a part-of-speech.
    Page 5, “MWE-dedicated Features”
  5. We also add label bigrams .
    Page 5, “MWE-dedicated Features”

See all papers in Proc. ACL 2012 that mention bigrams.

See all papers in Proc. ACL that mention bigrams.

Back to top.

CRF

Appears in 5 sentences as: CRF (5)
In Discriminative Strategies to Integrate Multiword Expression Recognition and Parsing
  1. For such a task, we used Linear chain Conditional Ramdom Fields ( CRF ) that are discriminative prob-
    Page 3, “Two strategies, two discriminative models”
  2. In order to deal with unknown words and special tokens, we incorporate standard tagging features in the CRF : lowercase forms of the words, word prefixes of length l to 4, word suffice of length l to 4, whether the word is capitalized, whether the token has a digit, whether it is an hyphen.
    Page 5, “MWE-dedicated Features”
  3. We first tested a standalone MWE recognizer based on CRF .
    Page 6, “Evaluation”
  4. The CRF recognizer relies on the software Wapiti6 (Lavergne et al., 2010) to train and apply the model, and on the software Unitex (Paumier, 2011) to apply lexical resources.
    Page 6, “Evaluation”
  5. Table 3: MWE identification with CRF : base are the features corresponding to token properties and word n-grams.
    Page 7, “Evaluation”

See all papers in Proc. ACL 2012 that mention CRF.

See all papers in Proc. ACL that mention CRF.

Back to top.

part-of-speech tag

Appears in 5 sentences as: part-of-speech tag (3) part-of-speech tagger (1) part-of-speech tagging (1) part-of-speech tags (1)
In Discriminative Strategies to Integrate Multiword Expression Recognition and Parsing
  1. In this paper, we focus on contiguous MWEs that form a lexical unit which can be marked by a part-of-speech tag (e. g. at night is an adverb, because of is a preposition).
    Page 2, “Multiword expressions”
  2. Constant and Sigogne (2011) proposed to combine MWE segmentation and part-of-speech tagging into a single sequence labelling task by assigning to each token a tag of the form TAG+X where TAG is the part-of-speech (POS) of the leXical unit the token belongs to and X is either B (i.e.
    Page 3, “Two strategies, two discriminative models”
  3. We associate each word with its part-of-speech tags found in our external morphological lexicon.
    Page 5, “MWE-dedicated Features”
  4. Table 2: Feature templates (f) used both in the MWER and the reranker models: n is the current position in the sentence, is the word at position i; is the part-of-speech tag of w(z’); if the word at absolute position i is part of a compound in the Shortest Path Segmentation, mwt(i) and mws(i) are respectively the part-of-speech tag and the internal structure of the compound, mwpos(i) indicates its relative position in the compound (B or I).
    Page 6, “MWE-dedicated Features”
  5. The part-of-speech tagger used to extract POS features was lgtagger7 (Constant and Sigogne, 2011).
    Page 6, “Evaluation”

See all papers in Proc. ACL 2012 that mention part-of-speech tag.

See all papers in Proc. ACL that mention part-of-speech tag.

Back to top.

UAS

Appears in 5 sentences as: UAS (5)
In Discriminative Strategies to Integrate Multiword Expression Recognition and Parsing
  1. The unlabeled attache-ment score [ UAS ] evaluates the quality of unlabeled
    Page 6, “Evaluation”
  2. In order to establish the statistical significance of results between two parsing experiments in terms of F1 and UAS , we used a unidirectional t-test for two independent samples”.
    Page 7, “Evaluation”
  3. Parser | F1 ‘ LA ‘ UAS ‘ F1(MWE) |
    Page 7, “Evaluation”
  4. Firstly, we note that the accuracy of the best realistic parsers is much lower than that of a parser with a golden MWE segmentation13 (-2.65 and -5.92 respectively in terms of F—score and UAS ), which shows the importance of not neglecting MWE recognition in the framework of parsing.
    Page 8, “Evaluation”
  5. Both strategies also lead to a statistically significant UAS increase.
    Page 8, “Evaluation”

See all papers in Proc. ACL 2012 that mention UAS.

See all papers in Proc. ACL that mention UAS.

Back to top.

sequence labelling

Appears in 4 sentences as: sequence labelling (3) sequential labelling (1)
In Discriminative Strategies to Integrate Multiword Expression Recognition and Parsing
  1. MWER can be seen as a sequence labelling task (like chunking) by using an IOB-like annotation scheme (Ramshaw and Marcus, 1995).
    Page 3, “Two strategies, two discriminative models”
  2. Constant and Sigogne (2011) proposed to combine MWE segmentation and part-of-speech tagging into a single sequence labelling task by assigning to each token a tag of the form TAG+X where TAG is the part-of-speech (POS) of the leXical unit the token belongs to and X is either B (i.e.
    Page 3, “Two strategies, two discriminative models”
  3. (2001) for sequential labelling .
    Page 3, “Two strategies, two discriminative models”
  4. In order to make these models comparable, we use two comparable sets of feature templates: one adapted to sequence labelling (CRF—based MWER) and the other one adapted to reranking (MaXEnt-based reranker).
    Page 5, “MWE-dedicated Features”

See all papers in Proc. ACL 2012 that mention sequence labelling.

See all papers in Proc. ACL that mention sequence labelling.

Back to top.

significant improvement

Appears in 4 sentences as: significant improvement (2) significantly improve (1) significantly improves (1)
In Discriminative Strategies to Integrate Multiword Expression Recognition and Parsing
  1. Although experiments always relied on a corpus where the MWEs were perfectly pre-identified, they showed that pre-grouping such expressions could significantly improve parsing accuracy.
    Page 1, “Introduction”
  2. Chamiak and Johnson (2005) introduced different features that showed significant improvement in general parsing accuracy (e.g.
    Page 3, “Two strategies, two discriminative models”
  3. Furthermore, pre-grouping has no statistically significant impact on the F-score14, whereas reranking leads to a statistically significant improvement (except for collocations).
    Page 8, “Evaluation”
  4. We showed that MWE pre-grouping significantly improves compound recognition and unlabeled dependency annotation, which implies that this strategy could be useful for dependency parsing.
    Page 8, “Conclusions and Future Work”

See all papers in Proc. ACL 2012 that mention significant improvement.

See all papers in Proc. ACL that mention significant improvement.

Back to top.

constituency parsing

Appears in 3 sentences as: constituency parsing (3)
In Discriminative Strategies to Integrate Multiword Expression Recognition and Parsing
  1. This paper evaluates two empirical strategies to integrate multiword units in a real constituency parsing context and shows that the results are not as promising as has sometimes been suggested.
    Page 1, “Abstract”
  2. view, their incorporation has also been considered such as in (Nivre and Nilsson, 2004) for dependency parsing and in (Arun and Keller, 2005) in constituency parsing .
    Page 1, “Introduction”
  3. Our proposal is to evaluate two discriminative strategies in a real constituency parsing context: (a) pre-grouping MWE before parsing; this would be done with a state-of-the-art recognizer based on Conditional Random Fields; (b) parsing with a grammar including MWE identification and then reranking the output parses thanks to a Maximum Entropy model integrating MWE-dedicated features.
    Page 1, “Introduction”

See all papers in Proc. ACL 2012 that mention constituency parsing.

See all papers in Proc. ACL that mention constituency parsing.

Back to top.

feature templates

Appears in 3 sentences as: Feature templates (1) feature templates (2)
In Discriminative Strategies to Integrate Multiword Expression Recognition and Parsing
  1. In order to make these models comparable, we use two comparable sets of feature templates : one adapted to sequence labelling (CRF—based MWER) and the other one adapted to reranking (MaXEnt-based reranker).
    Page 5, “MWE-dedicated Features”
  2. All feature templates are given in table 2.
    Page 6, “MWE-dedicated Features”
  3. Table 2: Feature templates (f) used both in the MWER and the reranker models: n is the current position in the sentence, is the word at position i; is the part-of-speech tag of w(z’); if the word at absolute position i is part of a compound in the Shortest Path Segmentation, mwt(i) and mws(i) are respectively the part-of-speech tag and the internal structure of the compound, mwpos(i) indicates its relative position in the compound (B or I).
    Page 6, “MWE-dedicated Features”

See all papers in Proc. ACL 2012 that mention feature templates.

See all papers in Proc. ACL that mention feature templates.

Back to top.

lexicalized

Appears in 3 sentences as: leXicalized (1) lexicalized (2)
In Discriminative Strategies to Integrate Multiword Expression Recognition and Parsing
  1. They are often divided into two main classes: multiword expressions defined through linguistic idiomaticity criteria ( lexicalized phrases in the terminology of Sag et al.
    Page 2, “Multiword expressions”
  2. They used a Tree Substitution Grammar instead of a Probabilistic Context-free Grammar (PCFG) with latent annotations in order to capture leXicalized rules as well as general rules.
    Page 3, “Multiword expressions”
  3. Nevertheless, as it does not have a lexicalized strategy, it is not able to filter out incorrect candidates; the precision is therefore very low (the worst).
    Page 7, “Evaluation”

See all papers in Proc. ACL 2012 that mention lexicalized.

See all papers in Proc. ACL that mention lexicalized.

Back to top.

n-grams

Appears in 3 sentences as: n-grams (3)
In Discriminative Strategies to Integrate Multiword Expression Recognition and Parsing
  1. Word n-grams .
    Page 5, “MWE-dedicated Features”
  2. POS n-grams .
    Page 5, “MWE-dedicated Features”
  3. Table 3: MWE identification with CRF: base are the features corresponding to token properties and word n-grams .
    Page 7, “Evaluation”

See all papers in Proc. ACL 2012 that mention n-grams.

See all papers in Proc. ACL that mention n-grams.

Back to top.

parse tree

Appears in 3 sentences as: parse tree (2) parse trees (1)
In Discriminative Strategies to Integrate Multiword Expression Recognition and Parsing
  1. The reranker templates are instantiated only for the nodes of the candidate parse tree , which are leaves dominated by a MWE node (i.e.
    Page 5, “MWE-dedicated Features”
  2. dominated by a MWE node m in the current parse tree p,
    Page 5, “MWE-dedicated Features”
  3. In order to compare both approaches, parse trees generated by BKYc were automatically transformed in trees with the same MWE annotation scheme as the trees generated by BKY.
    Page 7, “Evaluation”

See all papers in Proc. ACL 2012 that mention parse tree.

See all papers in Proc. ACL that mention parse tree.

Back to top.