Introduction | It is entirely based on models learned from an SMS corpus and its transcription, aligned at the character-level in order to get parallel corpora . |
The normalization models | Together, the SMS corpus and its transcription constitute parallel corpora aligned at the message-level. |
The normalization models | On our parallel corpora , it converged after 7 iterations and provided us with a result from which the learning could start. |
The normalization models | After examining our parallel corpora aligned at the character-level, we decided to consider as a word “the longest sequence of characters parsed without meeting the same separator on both sides of the alignment”. |
Abstract | from parallel corpora . |
Conclusions and Future Work | similarity for terms from parallel corpora and applied it to statistical machine translation. |
Conclusions and Future Work | We have shown that the sense similarity computed between units from parallel corpora by means of our algorithm is helpful for at least one multilingual application: statistical machine translation. |
Introduction | However, there is no previous work that uses the VSM to compute sense similarity for terms from parallel corpora . |
Introduction | the translation probabilities in a translation model, for units from parallel corpora are mainly based on the co-occurrence counts of the two units. |
Introduction | Therefore, questions emerge: how good is the sense similarity computed via VSM for two units from parallel corpora ? |
Abstract | Modern automated lexicon generation methods usually require parallel corpora , which are not available for most language pairs. |
Introduction | Traditionally, when bilingual lexicons are not compiled manually, they are extracted from parallel corpora . |
Lexicon Generation Experiments | We chose a language pair for which basically no parallel corpora existz, and that do not share ancestry or writing system in a way that can provide cues for alignment. |
Previous Work | 2.1 Parallel Corpora |
Previous Work | Parallel corpora are often used to infer word-oriented machine-readable bilingual lexicons. |
Previous Work | The limited availability of parallel corpora of sufficient size for most language pairs restricts the usefulness of these methods. |