Hybrid Simplification using Deep Semantics and Machine Translation
Narayan, Shashi and Gardent, Claire

Article Structure

Abstract

We present a hybrid approach to sentence simplification which combines deep semantics and monolingual machine translation to derive simple sentences from complex ones.

Introduction

Sentence simplification maps a sentence to a simpler, more readable one approximating its content.

Related Work

Earlier work on sentence simplification relied on handcrafted rules to capture syntactic simplification e.g., to split coordinated and subordinated sentences into several, simpler clauses or to model active/passive transformations (Siddharthan, 2002; Chandrasekar and Srinivas, 1997; Bott et al., 2012; Canning, 2002; Siddharthan, 2011; Siddharthan, 2010).

Simplification Framework

We start by motivating our approach and explaining how it relates to previous proposals w.r.t., the four main operations involved in simplification namely, splitting, deletion, substitution and reordering.

Experiments

We trained our simplification and translation models on the PWKP corpus.

Conclusion

A key feature of our approach is that it is semantically based.

Topics

machine translation

Appears in 14 sentences as: Machine Translation (1) machine translation (13)
In Hybrid Simplification using Deep Semantics and Machine Translation
  1. We present a hybrid approach to sentence simplification which combines deep semantics and monolingual machine translation to derive simple sentences from complex ones.
    Page 1, “Abstract”
  2. It is useful as a preprocessing step for a variety of NLP systems such as parsers and machine translation systems (Chandrasekar et al., 1996), sum-marisation (Knight and Marcu, 2000), sentence fusion (Filippova and Strube, 2008) and semantic
    Page 1, “Introduction”
  3. Machine Translation systems have been adapted to translate complex sentences into $nqfleones(ZhuetaL,2010;VVubbenetaL,2012; Coster and Kauchak, 2011).
    Page 1, “Introduction”
  4. First, it combines a model encoding probabilities for splitting and deletion with a monolingual machine translation module which handles reordering and substitution.
    Page 1, “Introduction”
  5. In this way, we exploit the ability of statistical machine translation (SMT) systems to capture phrasal/lexical substitution and reordering while relying on a dedicated probabilistic module to capture the splitting and deletion operations which are less well (deletion) or not at all (splitting) captured by SMT approaches.
    Page 1, “Introduction”
  6. (2010) constructed a parallel corpus (PWKP) of 108,016/114,924 comple)dsimple sentences by aligning sentences from EWKP and SWKP and used the resulting bitext to train a simplification model inspired by syntax-based machine translation (Yamada and Knight, 2001).
    Page 2, “Related Work”
  7. To account for deletions, reordering and substitution, Coster and Kauchak (2011) trained a phrase based machine translation system on the PWKP corpus while modifying the word alignment output by GIZA++ in Moses to allow for null phrasal alignments.
    Page 2, “Related Work”
  8. (2012) use Moses and the PWKP data to train a phrase based machine translation system augmented with a post-hoc reranking procedure designed to rank the output based on their dissimilarity from the source.
    Page 2, “Related Work”
  9. We also depart from Coster and Kauchak (2011) who rely on null phrasal alignments for deletion during phrase based machine translation .
    Page 3, “Simplification Framework”
  10. Second, the simplified sentence(s) s’ is further simplified to s using a phrase based machine translation system (PBMT+LM).
    Page 5, “Simplification Framework”
  11. where the probabilities p(s’ |DC), p(s’ |s) and 19(3) are given by the DRS simplification model, the phrase based machine translation model and the language model respectively.
    Page 5, “Simplification Framework”

See all papers in Proc. ACL 2014 that mention machine translation.

See all papers in Proc. ACL that mention machine translation.

Back to top.

BLEU

Appears in 10 sentences as: BLEU (11)
In Hybrid Simplification using Deep Semantics and Machine Translation
  1. (2010) namely, an aligned corpus of 100/131 EWKP/SWKP sentences and show that they achieve better BLEU score.
    Page 2, “Related Work”
  2. To assess and compare simplification systems, two main automatic metrics have been used in previous work namely, BLEU and the Flesch-Kincaid Grade Level Index (FKG).
    Page 7, “Experiments”
  3. BLEU gives a measure of how close a system’s output is to the gold standard simple sentence.
    Page 7, “Experiments”
  4. Because there are many possible ways of simplifying a sentence, BLEU alone fails to correctly assess the appropriateness of a simplification.
    Page 7, “Experiments”
  5. Moreover BLEU does not capture the degree to which the system’s output differs from the complex sentence input.
    Page 7, “Experiments”
  6. We therefore use BLEU as a means to evaluate how close the systems output are to the reference corpus but complement it with further manual metrics capturing other important factors when
    Page 7, “Experiments”
  7. System BLEU to System) to Simple) LD No edit LD No edit GOLD 100 12.24 3 0 100 Zhu 37.4 7.87 2 14.64 0 Woodsend 42 8.63 24 16.03 2 Wubben 41.4 3.33 6 13.57 2 Hybrid 53.6 6.32 4 11.53 3
    Page 8, “Experiments”
  8. BLEU score We used Moses support tools: multi-bleu10 to calculate BLEU scores.
    Page 8, “Experiments”
  9. The BLEU scores shown in Table 4 show that our system produces simplifications that are closest to the reference.
    Page 8, “Experiments”
  10. In sum, the automatic metrics indicate that our system produces simplification that are consistently closest to the reference in terms of edit distance, number of splits and BLEU score.
    Page 8, “Experiments”

See all papers in Proc. ACL 2014 that mention BLEU.

See all papers in Proc. ACL that mention BLEU.

Back to top.

translation system

Appears in 8 sentences as: translation system (6) Translation systems (1) translation systems (1)
In Hybrid Simplification using Deep Semantics and Machine Translation
  1. It is useful as a preprocessing step for a variety of NLP systems such as parsers and machine translation systems (Chandrasekar et al., 1996), sum-marisation (Knight and Marcu, 2000), sentence fusion (Filippova and Strube, 2008) and semantic
    Page 1, “Introduction”
  2. Machine Translation systems have been adapted to translate complex sentences into $nqfleones(ZhuetaL,2010;VVubbenetaL,2012; Coster and Kauchak, 2011).
    Page 1, “Introduction”
  3. To account for deletions, reordering and substitution, Coster and Kauchak (2011) trained a phrase based machine translation system on the PWKP corpus while modifying the word alignment output by GIZA++ in Moses to allow for null phrasal alignments.
    Page 2, “Related Work”
  4. (2012) use Moses and the PWKP data to train a phrase based machine translation system augmented with a post-hoc reranking procedure designed to rank the output based on their dissimilarity from the source.
    Page 2, “Related Work”
  5. Second, the simplified sentence(s) s’ is further simplified to s using a phrase based machine translation system (PBMT+LM).
    Page 5, “Simplification Framework”
  6. The DRS associated with the final M-node D fin is then mapped to a simplified sentence s’fm which is further simplified using the phrase-based machine translation system to produce the final simplified sentence ssimple.
    Page 7, “Simplification Framework”
  7. These input are simplified using our simplification system namely, the DRS-SM model and the phrase-based machine translation system (Section 3.2).
    Page 7, “Experiments”
  8. These four sentences are directly sent to the phrase-based machine translation system to produce simplified sentences.
    Page 7, “Experiments”

See all papers in Proc. ACL 2014 that mention translation system.

See all papers in Proc. ACL that mention translation system.

Back to top.

language model

Appears in 6 sentences as: language model (6)
In Hybrid Simplification using Deep Semantics and Machine Translation
  1. It is combined with a language model to improve grammaticality and the decoder translates sentences into sim-
    Page 2, “Related Work”
  2. In addition, the language model we integrate in the SMT module helps ensuring better fluency and grammaticality.
    Page 3, “Simplification Framework”
  3. Finally the translation and language model ensures that published, describing and boson are simplified to wrote, explaining and elementary particle respectively; and that the phrase “In 1964” is moved from the beginning of the sentence to its end.
    Page 5, “Simplification Framework”
  4. Our simplification framework consists of a probabilistic model for splitting and dropping which we call DRS simplification model (DRS-SM); a phrase based translation model for substitution and reordering (PBMT); and a language model learned on Simple English Wikipedia (LM) for fluency and grammaticality.
    Page 5, “Simplification Framework”
  5. where the probabilities p(s’ |DC), p(s’ |s) and 19(3) are given by the DRS simplification model, the phrase based machine translation model and the language model respectively.
    Page 5, “Simplification Framework”
  6. Our trigram language model is trained using the SRILM toolkit6 on the SWKP corpus7.
    Page 7, “Simplification Framework”

See all papers in Proc. ACL 2014 that mention language model.

See all papers in Proc. ACL that mention language model.

Back to top.

translation model

Appears in 5 sentences as: translation model (4) translation models (1)
In Hybrid Simplification using Deep Semantics and Machine Translation
  1. Second, it combines a simplification model for splitting and deletion with a monolingual translation model for phrase substitution and reordering.
    Page 1, “Abstract”
  2. Our simplification framework consists of a probabilistic model for splitting and dropping which we call DRS simplification model (DRS-SM); a phrase based translation model for substitution and reordering (PBMT); and a language model learned on Simple English Wikipedia (LM) for fluency and grammaticality.
    Page 5, “Simplification Framework”
  3. where the probabilities p(s’ |DC), p(s’ |s) and 19(3) are given by the DRS simplification model, the phrase based machine translation model and the language model respectively.
    Page 5, “Simplification Framework”
  4. Our phrase based translation model is trained using the Moses toolkit5 with its default command line options on the PWKP corpus (except the sentences from the test set) considering the complex sentence as the source and the simpler one as the target.
    Page 7, “Simplification Framework”
  5. We trained our simplification and translation models on the PWKP corpus.
    Page 7, “Experiments”

See all papers in Proc. ACL 2014 that mention translation model.

See all papers in Proc. ACL that mention translation model.

Back to top.

BLEU score

Appears in 4 sentences as: BLEU score (3) BLEU scores (2)
In Hybrid Simplification using Deep Semantics and Machine Translation
  1. (2010) namely, an aligned corpus of 100/131 EWKP/SWKP sentences and show that they achieve better BLEU score .
    Page 2, “Related Work”
  2. BLEU score We used Moses support tools: multi-bleu10 to calculate BLEU scores .
    Page 8, “Experiments”
  3. The BLEU scores shown in Table 4 show that our system produces simplifications that are closest to the reference.
    Page 8, “Experiments”
  4. In sum, the automatic metrics indicate that our system produces simplification that are consistently closest to the reference in terms of edit distance, number of splits and BLEU score .
    Page 8, “Experiments”

See all papers in Proc. ACL 2014 that mention BLEU score.

See all papers in Proc. ACL that mention BLEU score.

Back to top.

parse tree

Appears in 4 sentences as: parse tree (3) parse trees (1)
In Hybrid Simplification using Deep Semantics and Machine Translation
  1. First, it is semantic based in that it takes as input a deep semantic representation rather than e.g., a sentence or a parse tree .
    Page 1, “Abstract”
  2. While previous simplification approaches starts from either the input sentence or its parse tree , our model takes as input a deep semantic representation namely, the Discourse Representation Structure (DRS, (Kamp, 1981)) assigned by Boxer (Curran et al., 2007) to the input complex sentence.
    Page 1, “Introduction”
  3. Their simplification model encodes the probabilities for four rewriting operations on the parse tree of an input sentences namely, substitution, reordering, splitting and deletion.
    Page 2, “Related Work”
  4. (2010) and the edit history of Simple Wikipedia, Woodsend and Lapata (2011) learn a quasi synchronous grammar (Smith and Eisner, 2006) describing a loose alignment between parse trees of complex and of simple sentences.
    Page 2, “Related Work”

See all papers in Proc. ACL 2014 that mention parse tree.

See all papers in Proc. ACL that mention parse tree.

Back to top.

edit distance

Appears in 3 sentences as: edit distance (3)
In Hybrid Simplification using Deep Semantics and Machine Translation
  1. The governing criteria for the construction of the training graph is that, at each step, it tries to minimize the Leven-shtein edit distance between the complex and the simple sentences.
    Page 6, “Simplification Framework”
  2. Number of Edits Table 4 indicates the edit distance of the output sentences w.r.t.
    Page 8, “Experiments”
  3. In sum, the automatic metrics indicate that our system produces simplification that are consistently closest to the reference in terms of edit distance , number of splits and BLEU score.
    Page 8, “Experiments”

See all papers in Proc. ACL 2014 that mention edit distance.

See all papers in Proc. ACL that mention edit distance.

Back to top.

phrase-based

Appears in 3 sentences as: phrase-based (3)
In Hybrid Simplification using Deep Semantics and Machine Translation
  1. The DRS associated with the final M-node D fin is then mapped to a simplified sentence s’fm which is further simplified using the phrase-based machine translation system to produce the final simplified sentence ssimple.
    Page 7, “Simplification Framework”
  2. These input are simplified using our simplification system namely, the DRS-SM model and the phrase-based machine translation system (Section 3.2).
    Page 7, “Experiments”
  3. These four sentences are directly sent to the phrase-based machine translation system to produce simplified sentences.
    Page 7, “Experiments”

See all papers in Proc. ACL 2014 that mention phrase-based.

See all papers in Proc. ACL that mention phrase-based.

Back to top.

semantic representation

Appears in 3 sentences as: semantic representation (2) semantic representations (1)
In Hybrid Simplification using Deep Semantics and Machine Translation
  1. First, it is semantic based in that it takes as input a deep semantic representation rather than e.g., a sentence or a parse tree.
    Page 1, “Abstract”
  2. While previous simplification approaches starts from either the input sentence or its parse tree, our model takes as input a deep semantic representation namely, the Discourse Representation Structure (DRS, (Kamp, 1981)) assigned by Boxer (Curran et al., 2007) to the input complex sentence.
    Page 1, “Introduction”
  3. By handling deletion using a probabilistic model trained on semantic representations , we can avoid deleting obligatory arguments.
    Page 3, “Simplification Framework”

See all papers in Proc. ACL 2014 that mention semantic representation.

See all papers in Proc. ACL that mention semantic representation.

Back to top.

sentence pairs

Appears in 3 sentences as: sentence pair (1) sentence pairs (2)
In Hybrid Simplification using Deep Semantics and Machine Translation
  1. (2010); and build training graphs (Figure 2) from the pair of complex and simple sentence pairs in the training data.
    Page 6, “Simplification Framework”
  2. Each training graph represents a complex-simple sentence pair and consists of two types of nodes: major nodes (M-nodes) and operation nodes (O-nodes).
    Page 6, “Simplification Framework”
  3. PWKP contains 108016 / 114924 comple)dsimple sentence pairs .
    Page 7, “Experiments”

See all papers in Proc. ACL 2014 that mention sentence pairs.

See all papers in Proc. ACL that mention sentence pairs.

Back to top.

state of the art

Appears in 3 sentences as: state of the art (3)
In Hybrid Simplification using Deep Semantics and Machine Translation
  1. When compared against current state of the art methods, our model yields significantly simpler output that is both grammatical and meaning preserving.
    Page 1, “Abstract”
  2. When compared against current state of the art methods (Zhu et al., 2010; Woodsend and Lapata, 2011; Wubben et al., 2012), our model yields significantly simpler output that is both grammatical and meaning preserving.
    Page 2, “Introduction”
  3. To evaluate performance, we compare our approach with three other state of the art systems using the test set provided by Zhu et al.
    Page 7, “Experiments”

See all papers in Proc. ACL 2014 that mention state of the art.

See all papers in Proc. ACL that mention state of the art.

Back to top.

statistically significant

Appears in 3 sentences as: statistical significance (1) statistically significant (2)
In Hybrid Simplification using Deep Semantics and Machine Translation
  1. No human evaluation is provided but the approach is shown to result in statistically significant improvements over a traditional phrase based approach.
    Page 2, “Related Work”
  2. Pairwise comparisons between all models and their statistical significance were carried out using a one-way ANOVA with post-hoc Tukey HSD tests and are shown in Table 6.
    Page 9, “Experiments”
  3. With regard to simplification, our system ranks first and is very close to the manually simplified input (the difference is not statistically significant ).
    Page 9, “Experiments”

See all papers in Proc. ACL 2014 that mention statistically significant.

See all papers in Proc. ACL that mention statistically significant.

Back to top.