Abstract | We also perform manual evaluation on bilingual terms extracted from English-German term-tagged comparable corpora. |
Abstract | The results of this manual evaluation showed 60-83% of the term pairs generated are exact translations and over 90% exact or partial translations. |
Conclusion | We measured the performance of our classifier using Information Retrieval (IR) metrics and a manual evaluation . |
Conclusion | In the manual evaluation we had our algorithm extract pairs of terms from Wikipedia articles — articles forming comparable corpora in the IT and automotive domains — and asked native speakers to categorize a selection of the term pairs into categories reflecting the level of translation of the terms. |
Conclusion | In the manual evaluation we used the English-German language pair and showed that over 80% of the extracted term pairs were exact translations in the IT domain and over 60% in the automotive domain. |
Experiments 5.1 Data Sources | 5.3 Manual evaluation |
Experiments 5.1 Data Sources | 5.4.2 Manual evaluation |
Experiments 5.1 Data Sources | The results of the manual evaluation are shown in Table 4. |
Abstract | A manual evaluation of an English-to-German translation task shows that the subcategorization information has a positive impact on translation quality through better prediction of case. |
Conclusion | We showed in a manual evaluation that the proposed features have a positive impact on translation quality. |
Experiments and evaluation | We also present a manual evaluation of our best system which shows that the new features improve translation quality. |
Experiments and evaluation | We present three types of evaluation: BLEU scores (Papineni et al., 2001), prediction accuracy on clean data and a manual evaluation of the best system in section 5.3. |
Experiments and evaluation | While the inflection prediction systems (1-4) are significantly12 better than the surface-form system (0), the different versions of the inflection systems are not distinguishable in terms of BLEU; however, our manual evaluation shows that the new features have a positive impact on translation quality. |
Abstract | HEADY improves over a state-of-the-art open-domain title abstraction method, bridging half of the gap that separates it from extractive methods using human-generated titles in manual evaluations , and performs comparably to human-generated headlines as evaluated with ROUGE. |
Experiment settings | Table 3: Results from the manual evaluation . |
Results | Table 3 lists the results of the manual evaluation of readability and informativeness of the generated headlines. |
Results | In fact, in the DUC competitions, the gap between human summaries and automatic summaries was also more apparent in the manual evaluations than using ROUGE. |
Results | The manual evaluation is asking raters to judge whether real, human-written titles that were actually used for those news are grammatical and informative. |
Corpus preparation | For manual evaluation , we randomly selected 330 sentences out of 947 used for automatic evaluation, specifically, 190 from the ‘news’ part and 140 from the ‘regulations’ part. |
Evaluation methodology | The main idea of manual evaluation was (1) to make the assessment as simple as possible for a human judge and (2) to make the results of evaluation unambiguous. |
Results | For 11 runs automatic evaluation measures were calculated; eight runs underwent manual evaluation (four online systems plus four participants’ runs; no manual evaluation was done by agreement with the participants for the runs P3, P6, and P7 to reduce the workload). |
Abstract | Table 2: Manual evaluation of precision (by sentence pair) on the extracted parallel data for Spanish, French, and German (paired with English). |
Abstract | In addition to the manual evaluation of precision, we applied language identification to our extracted parallel data for several additional languages. |
Abstract | Comparing against our manual evaluation from Table 2, it appears that many sentence pairs are being incorrectly judged as nonparallel. |
Conclusion | formly outperform the state-of—the-art supervised extraction-based systems in both automatic and manual evaluation . |
Surface Realization | We tune the parameter on a small held-out development set by manually evaluating the induced templates. |
Surface Realization | Note that we do not explicitly evaluate the quality of the learned templates, which would require a significant amount of manual evaluation . |