Event Causality Extraction Method | An event causality candidate is given a causality score 0 8 core, which is the SVM score (distance from the hyperplane) that is normalized to [0,1] by the sigmoid function Each event causality candidate may be given multiple original sentences, since a phrase pair can appear in multiple sentences, in which case it is given more than one SVM score. |
Experiments | These three datasets have no overlap in terms of phrase pairs . |
Experiments | We observed that CEAsup and CEAWS performed poorly and tended to favor event causality candidates whose phrase pairs were highly relevant to each other but described the contrasts of events rather than event causality (e. g. build a slow muscle and build a fast muscle) probably because their |
Experiments | phrase pairs described two events that often happen in parallel but are not event causality (e. g. reduce the intake of energy and increase the energy consumption) in the highly ranked event causality candidates of Csuns and Cssup. |
Future Scenario Generation Method | A naive approach chains two phrase pairs by exact matching. |
Future Scenario Generation Method | Scenarios (scs) generated by chaining causally-compatible phrase pairs are scored by Score(sc), which embodies our assumption that an acceptable scenario consists of plausible event causality pairs: |
Introduction | Annotators regarded as event causality only phrase pairs that were interpretable as event causality without contexts (i.e., self-contained). |
Abstract | A semi-supervised training approach is proposed to train the parameters, and the phrase pair embedding is explored to model translation confidence directly. |
Introduction | (2013) use recursive auto encoders to make full use of the entire merging phrase pairs , going beyond the boundary words with a maximum entropy classifier (Xiong et al., 2006). |
Introduction | So as to model the translation confidence for a translation phrase pair, we initialize the phrase pair embedding by leveraging the sparse features and recurrent neural network. |
Introduction | The sparse features are phrase pairs in translation table, and recurrent neural network is utilized to learn a smoothed translation score with the source and target side information. |
Our Model | We then check whether translation candidates can be found in the translation table for each span, together with the phrase pair embedding and recurrent input vector (global features). |
Our Model | We extract phrase pairs using the conventional method (Och and Ney, 2004). |
Our Model | 0 Representations of phrase pairs are automatically learnt to optimize the translation performance, while features used in conventional model are handcrafted. |
Related Work | Given the representations of the smaller phrase pairs, recursive auto-encoder can generate the representation of the parent phrase pair with a reordering confidence score. |
Input Features for DNN Feature Learning | Following (Maskey and Zhou, 2012), we use the following 4 phrase features of each phrase pair (Koehn et al., 2003) in the phrase table as the first type of input features, bidirectional phrase translation probability (P (6| f) and P (f |e)), bidirectional lexical weighting (Lem(e|f) and Lex(f|e)), |
Input Features for DNN Feature Learning | 3.2 Phrase pair similarity |
Input Features for DNN Feature Learning | (2004) proposed a way of using term weight based models in a vector space as additional evidences for phrase pair translation quality. |
Introduction | First, the input original features for the DBN feature learning are too simple, the limited 4 phrase features of each phrase pair , such as bidirectional phrase translation probability and bidirectional lexical weighting (Koehn et al., 2003), which are a bottleneck for learning effective feature representation. |
Introduction | To address the first shortcoming, we adapt and extend some simple but effective phrase features as the input features for new DNN feature leam-ing, and these features have been shown significant improvement for SMT, such as, phrase pair similarity (Zhao et al., 2004), phrase frequency, phrase length (Hopkins and May, 2011), and phrase generative probability (Foster et al., 2010), which also show further improvement for new phrase feature learning in our experiments. |
Semi-Supervised Deep Auto-encoder Features Learning for SMT | To speedup the pre-training, we subdivide the entire phrase pairs (with features X) in the phrase table into small mini-batches, each containing 100 cases, and update the weights after each mini-batch. |
Semi-Supervised Deep Auto-encoder Features Learning for SMT | Each layer is greedily pre-trained for 50 epochs through the entire phrase pairs . |
Semi-Supervised Deep Auto-encoder Features Learning for SMT | After the pre-training, for each phrase pair in the phrase table, we generate the DBN features (Maskey and Zhou, 2012) by passing the original phrase features X through the DBN using forward computation. |
Bilingually-constrained Recursive Auto-encoders | We can make inference from this fact that if a model can learn the same embedding for any phrase pair sharing the same meaning, the learned embedding must encode the semantics of the phrases and the corresponding model is our desire. |
Bilingually-constrained Recursive Auto-encoders | For a phrase pair (3, If), two kinds of errors are involved: |
Bilingually-constrained Recursive Auto-encoders | For the phrase pair (3, t), the joint error is: |
Experiments | To obtain high-quality bilingual phrase pairs to train our BRAE model, we perform forced decoding for the bilingual training sentences and collect the phrase pairs used. |
Experiments | After removing the duplicates, the remaining 1.12M bilingual phrase pairs (length ranging from 1 to 7) are obtained. |
Experiments | These algorithms are based on corpus statistics including co-occurrence statistics, phrase pair usage and composition information. |
Introduction | In decoding with phrasal semantic similarities, we apply the semantic similarities of the phrase pairs as new features during decoding to guide translation can- |
Generation & Propagation | In order to utilize these newly acquired phrase pairs , we need to compute their relevant features. |
Generation & Propagation | The phrase pairs have four log-probability features with two likelihood features and two lexical weighting features. |
Generation & Propagation | In addition, we use a sophisticated lexicalized hierarchical reordering model (HRM) (Galley and Manning, 2008) with five features for each phrase pair . |
Related Work | The operational scope of their approach is limited in that they assume a scenario where unknown phrase pairs are provided (thereby sidestepping the issue of translation candidate generation for completely unknown phrases), and what remains is the estimation of phrasal probabilities. |
Related Work | In our case, we obtain the phrase pairs from the graph structure (and therefore indirectly from the monolingual data) and a separate generation step, which plays an important role in good performance of the method. |
Introduction | We use two complementary paraphrase models: an association model based on aligned phrase pairs extracted from a monolingual parallel corpus, and a vector space model, which represents each utterance as a vector and learns a similarity score between them. |
Paraphrasing | We define associations in cc and c primarily by looking up phrase pairs in a phrase table constructed using the PARALEX corpus (Fader et al., 2013). |
Paraphrasing | We use the word alignments to construct a phrase table by applying the consistent phrase pair heuristic (Och and Ney, 2004) to all 5-grams. |
Paraphrasing | This results in a phrase table with approximately 1.3 million phrase pairs . |
Topic Models for Machine Translation | Lexical Weighting In phrase-based SMT, lexical weighting features estimate the phrase pair quality by combining lexical translation probabilities of words in a phrase (Koehn et al., 2003). |
Topic Models for Machine Translation | The phrase pair probabilities pw (6| f) are the normalized product of lexical probabilities of the aligned word pairs within that phrase pair (Koehn et al., 2003). |
Topic Models for Machine Translation | from which we can compute the phrase pair probabilities pw (6| f; k) by multiplying the lexical probabilities and normalizing as in Koehn et a1. |
Experiments | alignment phrase pair |
Experiments | Table 2: Alignment accuracy and phrase pair extraction accuracy for directional and bidirectional models. |
Experiments | AER is alignment error rate and F l is the phrase pair extraction F1 score. |
Training | Second, we extract phoneme phrase pairs consistent with these alignments. |
Training | From the example above, we pull out phrase pairs like: |
Training | We add these phrase pairs to FST B, and call this the phoneme-phrase-based model. |