Abstract | Experiments on several language pairs demonstrate that the proposed model matches the accuracy of traditional two-step word alignment/phrase extraction approach while reducing the phrase table to a fraction of the original size. |
Conclusion | Machine translation systems using phrase tables learned directly by the proposed model were able to achieve accuracy competitive with the traditional pipeline of word alignment and heuristic phrase extraction, the first such result for an unsupervised model. |
Conclusion | In addition, we will test probabilities learned using the proposed model with an ITG—based decoder. |
Conclusion | We will also examine the applicability of the proposed model in the context of hierarchical phrases (Chiang, 2007), or in alignment using syntactic structure (Galley et al., 2006). |
Experimental Evaluation | For the proposed models , we train for 100 iterations, and use the final sample acquired at the end of the training process for our experiments using a single sample6. |
Hierarchical ITG Model | All of these techniques are applicable to the proposed model , but we choose to apply the sentence-based blocked sampling of Blunsom and Cohn (2010), which has desirable convergence properties compared to sampling single alignments. |
Introduction | In the proposed model , at each branch in the tree, we first attempt to generate a phrase pair from the phrase pair distribution, falling back to ITG-based divide and conquer strategy to generate phrase pairs that do not exist (or are given low probability) in the phrase distribution. |
Phrase Extraction | However, as the proposed models tend to align relatively large phrases, we also use two other techniques to create smaller alignment chunks that prevent sparsity. |
Related Work | We plan to examine variational inference for the proposed models in future work. |
Abstract | Unlike previous multistage pipeline approaches, which directly merge TM result into the final output, the proposed models refer to the corresponding TM information associated with each phrase at SMT decoding. |
Abstract | Besides, the proposed models also outperform previous approaches significantly. |
Conclusion and Future Work | In addition, all possible TM target phrases are kept and the proposed models select the best one during decoding via referring SMT information. |
Experiments | To estimate the probabilities of proposed models , the corresponding phrase segmentations for bilingual sentences are required. |
Experiments | In order to compare our proposed models with previous work, we re-implement two XML-Markup approaches: (Koehn and Senellart, 2010) and (Ma et al, 2011), which are denoted as Koehn-10 and Mall, respectively. |
Experiments | More importantly, the proposed models achieve much better TER score than the TM system does at interval [0.9, 1.0), but Koehn-10 does not even exceed the TM system at this interval. |
Introduction | Furthermore, the proposed models significantly outperform previous pipeline approaches. |
Bilingual Infinite Tree Model | Specifically, the proposed model introduces bilingual observations by embedding the aligned target words in the source-side dependency trees. |
Bilingual Infinite Tree Model | Note that POSS of target words are assigned by a POS tagger in the target language and are not inferred in the proposed model . |
Discussion | Table 2 shows the number of the IPA POS tags used in the experiments and the POS tags induced by the proposed models . |
Discussion | These examples show that the proposed models can disambiguate POS tags that have different functions in English, whereas the IPA POS tagset treats them jointly. |
Experiment | We tested our proposed models under the NTCIR-9 Japanese-to-English patent translation task (Goto et al., 2011), consisting of approximately 3.2 million bilingual sentences. |
Experiment | The results show that the proposed models can generate more favorable POS tagsets for SMT than an existing POS tagset. |
Related Work | In the following, we overview the infinite tree model, which is the basis of our proposed model . |
Related Work | model (Finkel et al., 2007), where children are dependent only on their parents, used in our proposed modell . |
Abstract | Our experimental results show that the two proposed models are indeed able to perform the task effectively. |
Experiments | This section evaluates the proposed models . |
Experiments | Even with this noisy automatically-labeled data, the proposed models can produce good results. |
Experiments | However, it is important to note that the proposed models are flexible and do not need to have seeds for every aspect/topic. |
Introduction | The proposed models are evaluated using a large number of hotel reviews. |
Introduction | Experimental results show that the proposed models outperform the two baselines by large margins. |
Related Work | We will show in Section 4 that the proposed models outperform it by a large margin. |
RNN-based Alignment Model | Under the recurrence, the proposed model compactly encodes the entire history of previous alignments in the hidden layer configuration 3),. |
RNN-based Alignment Model | Therefore, the proposed model can find alignments by taking advantage of the long alignment history, while the FFNN-based model considers only the last alignment. |
Training | We evaluated the alignment performance of the proposed models with two tasks: Japanese-English word alignment with the Basic Travel Expression Corpus (BTEC) (Takezawa et a1., 2002) and French-English word alignment with the Hansard dataset (H ansards) from the 2003 NAACL shared task (Mihalcea and Pedersen, 2003). |
Training | In addition, Table 3 shows that these proposed models are comparable to IBM4au in NTCIR and FBIS even though the proposed models are trained from only a small part of the training data. |
Training | Our experiments have shown that the proposed model outperforms the FFNN-based model (Yang et al., 2013) for word alignment and machine translation, and that the agreement constraint improves alignment performance. |
Conclusion | The experimental results show that significant improvements have been achieved on various test data, meanwhile the translations are more cohesive and smooth, which together demonstrate the effectiveness of our proposed models . |
Experiments | To test the effectiveness of the proposed models , we have compared the translation quality of different integration strategies. |
Experiments | To further evaluate the effectiveness of the proposed models , we also conducted an experiment on a larger set of bilingual training data from the LDC corpus7 for translation model and transfer model. |
Experiments | The results in Table 4 further verify the effectiveness of our proposed models . |
Related Work | To the best of our knowledge, our work is the first attempt to exploit the source functional relationship to generate the target transitional expressions for grammatical cohesion, and we have successfully incorporated the proposed models into an SMT system with significant improvement of BLEU metrics. |
Abstract | The proposed models are tested on three different tasks: coarse-grained word sense disambiguation, fine-grained word sense disambiguation, and detection of literal vs. nonliteral usages of potentially idiomatic expressions. |
Conclusion | We test the proposed models on three tasks. |
Experiments | Table 5 shows the results of our proposed model compared with state-of-the-art systems. |
The Sense Disambiguation Model | To overcome this problem, we propose Model 11, which indirectly maximizes the sense-context probability by maximizing the cosine value of two document vectors that encode the document-topic frequencies from sampling, v(z|dc) and v(z|ds). |
The Sense Disambiguation Model | We propose Model III: |
Abstract | By fitting parameters to maximize the likelihood of the bilingual parallel data, the proposed model learns previously unseen sentiment words from the large bilingual parallel data and improves vocabulary coverage significantly. |
Conclusion and Future Work | First, the proposed model can learn previously unseen sentiment words from large unlabeled data, which are not covered by the limited vocabulary in machine translation of the labeled data. |
Experiment | Table 2 shows the accuracy of the baseline systems as well as the proposed model (CLMM). |
Introduction | By “synchronizing” the generation of words in the source language and the target language in a parallel corpus, the proposed model can (1) improve vocabulary coverage by learning sentiment words from the unlabeled parallel corpus; (2) transfer polarity label information between the source language and target language using a parallel corpus. |
Introduction | This paper makes two contributions: (1) we propose a model to effectively leverage large bilingual parallel data for improving vocabulary coverage; and (2) the proposed model is applicable in both settings of cross-lingual sentiment classification, irrespective of the availability of labeled data in the target language. |
Conclusion | By adding an unaligned word tag, the unaligned word phenomenon is automatically implanted in the proposed model . |
Conclusion | We also show that the proposed model is able to improve a very strong baseline system. |
Experiments | For the proposed model , significance testing results on both BLEU and TER are reported (B2 and B3 compared to B1, T2 and T3 compared to T1). |
Experiments | Our proposed model ranks the second position. |
Introduction | Section 3 describes the proposed model . |
Conclusions and Future Work | The experimental results on the NIST MT-2005 Chinese-English translation task demonstrate the effectiveness of the proposed model . |
Experiments | For the SCFG/STSG and our proposed model , we used the same settings except for the parameters 61' and h (d =landh=2for the SCFG; d =landh=6for the STSG; d =4 and h =6 for our model). |
Introduction | The proposed model adopts tree sequence1 as the basic translation unit and utilizes tree sequence alignments to model the translation process. |
Tree Sequence Alignment Model | 1 and 3 show how the proposed model works. |
Abstract | Empirical results on Chinese tree bank (CTB-7) and Microsoft Research corpora (MSR) reveal that the proposed model can yield better results than the supervised baselines and other competitive semi-supervised CRFs in this task. |
Introduction | Experiments on the data from the Chinese tree bank (CTB-7) and Microsoft Research (MSR) show that the proposed model results in significant improvement over other comparative candidates in terms of F-score and out-of-vocabulary (OOV) recall. |
Method | 5.2 Baseline and Proposed Models |
Method | The proposed model will also be compared with the semi-supervised pipeline S&T model described in (Wang et al., 2011). |
Conclusion | Focusing on the three financial crisis related datasets, the proposed model significantly outperform the standard linear regression method in statistics and strong discriminative support vector regression baselines. |
Conclusion | By varying the size of the training data and the dimensionality of the covariates, we have demonstrated that our proposed model is relatively robust across different parameter settings. |
Copula Models for Text Regression | Christensen (2005) shows that sorting and balanced binary trees can be used to calculate the correlation coefficients with complexity of 0(nlog Therefore, the computational complexity of MLE for the proposed model is O(n log |
Discussions | main questions we ask are: how is the proposed model different from standard text regres-siorflclassification models? |
Model | 4.2 Baseline and Proposed Models |
Model | We use the following baseline and proposed models for evaluation. |
Model | Figure 2 shows the F1 scores of the proposed model (SegTagDep) on CTB-Sc-l with respect to the training epoch and different parsing feature weights, where “Seg”, “Tag”, and “Dep” respectively denote the F1 scores of word segmentation, POS tagging, and dependency parsing. |
Abstract | The proposed model leverages on the strengths of both tree sequence-based and forest-based translation models. |
Experiment | This clearly demonstrates the effectiveness of our proposed model for syntax-based SMT. |
Experiment | This again demonstrates the effectiveness of our proposed model . |
Forest-based tree sequence to string model | In this section, we first explain what a packed forest is and then define the concept of the tree sequence in the context of forest followed by the discussion on our proposed model . |
Abstract | Compared with the contiguous tree sequence-based model, the proposed model can well handle noncontiguous phrases with any large gaps by means of noncontiguous tree sequence alignment. |
Abstract | Experimental results on the NIST MT-05 Chi-nese-English translation task show that the proposed model statistically significantly outperforms the baseline systems. |
Introduction | With the help of the noncontiguous tree sequence, the proposed model can well capture the noncontiguous phrases in avoidance of the constraints of large applicability of context and enhance the noncontiguous constituent modeling. |
Introduction | As for the above example, the proposed model enables the noncontiguous tree sequence pair indexed as TSPS in Fig. |
Abstract | Experiments revealed that the proposed model worked robustly, and outperformed five out of six state-of-the-art abbreviation recognizers. |
Introduction | Experimental results indicate that the proposed models significantly outperform previous abbreviation generation studies. |
Introduction | In addition, we apply the proposed models to the task of abbreviation recognition, in which a model extracts the abbreviation definitions in a given text. |
Recognition as a Generation Task | Note that all of the six systems were specifically designed and optimized for this recognition task, whereas the proposed model is directly transported from the generation task. |
Abstract | The experiments on a Chinese-to-English machine translation task reveal that the proposed model can bring positive segmentation effects to translation quality. |
Conclusion | The empirical results indicate that the proposed model can yield better segmentations for SMT. |
Introduction | Section 4 reports the experimental results of the proposed model for a Chinese-to-English MT task. |
Abstract | Experimental results show that the proposed model achieves 83% in F-measure, and outperforms the state-of-the-art baseline by over 7%. |
Conclusions and Future Work | Instead of employing labeled corpora for training, the proposed model only requires the identification of named entities, locations and time expressions. |
Conclusions and Future Work | Our proposed model has been evaluated on the FSD corpus. |
Evaluation | “CharPos” stands for our proposed model which has been described in section 3. |
Evaluation | The results show that, while the differences between the baseline model and the proposed model in word segmentation accuracies are small, the proposed model achieves significant improvement in the experiment of joint segmentati- |
Evaluation | As the results show, despite the fact that the performance of our baseline model is relatively weak in the joint segmentation and POS tagging task, our proposed model achieves the second-best performance in both segmentation and joint tasks. |
Experiments and Results | We train our proposed model from results of classic HMM and IBM model 4 separately. |
Experiments and Results | It can be seen from Table l, the proposed model consistently outperforms its corresponding baseline whether it is trained from alignment of classic HMM or IBM model 4. |
Experiments and Results | The second row and fourth row show results of the proposed model trained from HMM and IBM4 respectively. |
Abstract | The experiments on Japanese and Chinese WS have shown that the proposed models achieve significant improvement over state-of-the-art, reducing 16% errors in Japanese. |
Experiments | Table 3 shows the result of the proposed models and major open-source Japanese WS systems, namely, MeCab 0.98 (Kudo et al., 2004), JUMAN 7.0 (Kurohashi and Nagao, 1994), |
Experiments | Here, MeCab+UniDic achieved slightly better Katakana WS than the proposed models . |
Experiment | 4.2 Training for the Proposed Models |
Introduction | The proposed models are the pair model and the sequence model. |
Proposed Method | Then, we describe two proposed models : the pair model and the sequence model that is the further improved model. |
The Proposed Approaches 3.1 The psycholinguistic experiments | However, the proposed model still fails to predict processing of around 32% of words. |
The Proposed Approaches 3.1 The psycholinguistic experiments | The evaluation of the proposed model returns an accuracy of 76% which comes to be 8% better than the preceding models. |
The Proposed Approaches 3.1 The psycholinguistic experiments | We believe much more rigorous experiments are needed to be performed in order to validate our proposed models . |