Beyond lexical CLTE | A motivation for this augmentation is that semantic tags allow to match tokens that do not occur in the original bilingual parallel corpora used for phrase table extraction. |
Beyond lexical CLTE | Like lexical phrase tables, SPTs are extracted from parallel corpora . |
Beyond lexical CLTE | As a first step we annotate the parallel corpora with named-entity taggers for the source and target languages, replacing named entities with general semantic labels chosen from a coarse-grained taxonomy (person, location, organization, date and numeric expression). |
CLTE-based content synchronization | CLTE has been previously modeled as a phrase matching problem that exploits dictionaries and phrase tables extracted from bilingual parallel corpora to determine the number of word sequences in H that can be mapped to word sequences in T. In this way a semantic judgement about entailment is made exclusively on the basis of lexical evidence. |
Experiments and results | To build the English-German phrase tables we combined the Europarl, News Commentary and “de-news”3 parallel corpora . |
Experiments and results | The dictionary created during the alignment of the parallel corpora provided the lexical knowledge to perform matches when the connected words are different, but semantically equivalent in the two languages. |
Experiments and results | In order to build the English-Spanish lexical phrase table (PT), we used the Europarl, News Commentary and United Nations parallel corpora . |
Word Sense Disambiguation | We construct our supervised WSD system directly from parallel corpora . |
Word Sense Disambiguation | To generate the WSD training data, 7 parallel corpora were used, including Chinese Treebank, FBIS Corpus, Hong Kong Hansards, Hong Kong Laws, Hong Kong News, Sinorama News Magazine, and Xinhua Newswire. |
Word Sense Disambiguation | Then, word alignment was performed on the parallel corpora with the GIZA+ + software (Och and Ney, 2003). |