Cross Language Dependency Parsing using a Bilingual Lexicon
Zhao, Hai and Song, Yan and Kit, Chunyu and Zhou, Guodong

Article Structure

Abstract

This paper proposes an approach to enhance dependency parsing in a language by using a translated treebank from another language.

Introduction

Although supervised learning methods bring state-of-the-art outcome for dependency parser inferring (McDonald et al., 2005; Hall et al., 2007), a large enough data set is often required for specific parsing accuracy according to this type of methods.

The Related Work

As this work is about exploiting extra resources to enhance an existing parser, it is related to domain adaption for parsing that has been draw some interests in recent years.

Treebank Translation and Dependency Transformation

3.1 Data

Dependency Parsing: Baseline

4.1 Learning Model and Features

Exploiting the Translated Treebank

As we cannot expect too much for a word-by-word translation, only word pairs with dependency relation in translated text are extracted as useful and reliable information.

Evaluation Results

The quality of the parser is measured by the parsing accuracy or the unlabeled attachment score (UAS), i.e., the percentage of tokens with correct head.

Discussion

If a treebank in the source language can help improve parsing in the target language, then there must be something common between these two languages, or more precisely, these two corresponding treebanks.

Conclusion and Future Work

We propose a method to enhance dependency parsing in one language by using a translated treebank from another language.

Topics

treebank

Appears in 39 sentences as: Treebank (2) treebank (23) treebanks (16)
In Cross Language Dependency Parsing using a Bilingual Lexicon
  1. This paper proposes an approach to enhance dependency parsing in a language by using a translated treebank from another language.
    Page 1, “Abstract”
  2. A simple statistical machine translation method, word-by-word decoding, where not a parallel corpus but a bilingual lexicon is necessary, is adopted for the treebank translation.
    Page 1, “Abstract”
  3. The proposed method is evaluated in English and Chinese treebanks .
    Page 1, “Abstract”
  4. It is shown that a translated English treebank helps a Chinese parser obtain a state-of-the-art result.
    Page 1, “Abstract”
  5. But, this is not the case as we observe all treebanks in different languages as a whole.
    Page 1, “Introduction”
  6. For example, of ten treebanks for CoNLL-2007 shared task, none includes more than 500K
    Page 1, “Introduction”
  7. 1It is a tradition to call an annotated syntactic corpus as treebank in parsing community.
    Page 1, “Introduction”
  8. tokens, while the sum of tokens from all treebanks is about two million (Nivre et al., 2007).
    Page 1, “Introduction”
  9. As different human languages or treebanks should share something common, this makes it possible to let dependency parsing in multiple languages be beneficial with each other.
    Page 1, “Introduction”
  10. As a case study, we consider how to enhance a Chinese dependency parser by using a translated English treebank .
    Page 1, “Introduction”
  11. What our method relies on is not the close relation of the chosen language pair but the similarity of two treebanks , this is the most different from the previous work.
    Page 1, “Introduction”

See all papers in Proc. ACL 2009 that mention treebank.

See all papers in Proc. ACL that mention treebank.

Back to top.

word pair

Appears in 19 sentences as: word pair (13) word pairs (9)
In Cross Language Dependency Parsing using a Bilingual Lexicon
  1. Using an ensemble method, the key information extracted from word pairs with dependency relations in the translated text is effectively integrated into the parser for the target language.
    Page 1, “Abstract”
  2. However, dependency parsing focuses on the relations of word pairs , this allows us to use a dictionary-based translation without assuming a parallel corpus available, and the training stage of translation may be ignored and the decoding will be quite fast in this case.
    Page 1, “Introduction”
  3. In each step, the classifier checks a word pair , namely, 5, the top of a stack that consists of the processed words, and, i, the first word in the (input) unprocessed sequence, to determine if a dependent relation should be established between them.
    Page 4, “Dependency Parsing: Baseline”
  4. As we cannot expect too much for a word-by-word translation, only word pairs with dependency relation in translated text are extracted as useful and reliable information.
    Page 5, “Exploiting the Translated Treebank”
  5. Then some features based on a query in these word pairs according to the current parsing state (namely, words in the current stack and input) will be derived to enhance the Chinese parser.
    Page 5, “Exploiting the Translated Treebank”
  6. As all concerned feature values here are calculated from the searching result in the translated word pair list according to the current parsing state, and a complete and exact match cannot be always expected, our solution to the above segmentation issue is using a partial matching strategy based on characters that the words include.
    Page 6, “Exploiting the Translated Treebank”
  7. Above all, a translated word pair list, L, is extracted from the translated treebank.
    Page 6, “Exploiting the Translated Treebank”
  8. There are two basic strategies to organize the features derived from the translated word pair list.
    Page 6, “Exploiting the Translated Treebank”
  9. The first is to find the most matching word pair in the list and extract some properties from it, such as the matched length, part-of-speech tags and so on, to generate features.
    Page 6, “Exploiting the Translated Treebank”
  10. The second is to check every matching models between the current parsing state and the partially matched word pair .
    Page 6, “Exploiting the Translated Treebank”
  11. There are four input parameters required by the function Two parameters of them are about which part of the stack(input) words is chosen, and other two are about which part of each item in the translated word pair is chosen.
    Page 6, “Exploiting the Translated Treebank”

See all papers in Proc. ACL 2009 that mention word pair.

See all papers in Proc. ACL that mention word pair.

Back to top.

dependency parsing

Appears in 15 sentences as: dependency parser (5) dependency parsing (10)
In Cross Language Dependency Parsing using a Bilingual Lexicon
  1. This paper proposes an approach to enhance dependency parsing in a language by using a translated treebank from another language.
    Page 1, “Abstract”
  2. Although supervised learning methods bring state-of-the-art outcome for dependency parser inferring (McDonald et al., 2005; Hall et al., 2007), a large enough data set is often required for specific parsing accuracy according to this type of methods.
    Page 1, “Introduction”
  3. As different human languages or treebanks should share something common, this makes it possible to let dependency parsing in multiple languages be beneficial with each other.
    Page 1, “Introduction”
  4. In this paper, we study how to improve dependency parsing by using (automatically) translated texts attached with transformed dependency information.
    Page 1, “Introduction”
  5. As a case study, we consider how to enhance a Chinese dependency parser by using a translated English treebank.
    Page 1, “Introduction”
  6. Two main obstacles are supposed to confront in a cross-language dependency parsing task.
    Page 1, “Introduction”
  7. However, dependency parsing focuses on the relations of word pairs, this allows us to use a dictionary-based translation without assuming a parallel corpus available, and the training stage of translation may be ignored and the decoding will be quite fast in this case.
    Page 1, “Introduction”
  8. Section 4 describes a dependency parser for Chinese as a baseline.
    Page 2, “Introduction”
  9. Even the translation outputs are not so good as the expected, a dependency parser for the
    Page 2, “The Related Work”
  10. However, although it is not essentially different, we only focus on dependency parsing itself, while the parsing scheme in (Burkett and Klein, 2008) based on a constituent representation.
    Page 2, “The Related Work”
  11. how a translated English treebank enhances a Chinese dependency parser .
    Page 3, “Treebank Translation and Dependency Transformation”

See all papers in Proc. ACL 2009 that mention dependency parsing.

See all papers in Proc. ACL that mention dependency parsing.

Back to top.

dependency relation

Appears in 9 sentences as: dependency relation (3) dependency relations (3) dependency relationship (1) dependent relation (1) dependent relationship (1)
In Cross Language Dependency Parsing using a Bilingual Lexicon
  1. Using an ensemble method, the key information extracted from word pairs with dependency relations in the translated text is effectively integrated into the parser for the target language.
    Page 1, “Abstract”
  2. Bind POS tag and dependency relation of a word with itself; 2.
    Page 3, “Treebank Translation and Dependency Transformation”
  3. As word order is often changed after translation, the pointer of each dependency relationship , represented by a serial number, should be recalculated.
    Page 3, “Treebank Translation and Dependency Transformation”
  4. In each step, the classifier checks a word pair, namely, 5, the top of a stack that consists of the processed words, and, i, the first word in the (input) unprocessed sequence, to determine if a dependent relation should be established between them.
    Page 4, “Dependency Parsing: Baseline”
  5. As we cannot expect too much for a word-by-word translation, only word pairs with dependency relation in translated text are extracted as useful and reliable information.
    Page 5, “Exploiting the Translated Treebank”
  6. Chinese word should be strictly segmented according to the guideline before POS tags and dependency relations are annotated.
    Page 5, “Exploiting the Translated Treebank”
  7. The difference is, rootscore counts for the given POS tag occurring as ROOT, and pairscore counts for two POS tag combination occurring for a dependent relationship .
    Page 6, “Exploiting the Translated Treebank”
  8. The experimental results in (McDonald and Nivre, 2007) show a negative impact on the parsing accuracy from too long dependency relation .
    Page 7, “Evaluation Results”
  9. As dependency parsing is concerned with the relations of word pairs, only those word pairs with dependency relations in the translated treebank are
    Page 8, “Conclusion and Future Work”

See all papers in Proc. ACL 2009 that mention dependency relation.

See all papers in Proc. ACL that mention dependency relation.

Back to top.

machine translation

Appears in 8 sentences as: Machine translation (1) machine translation (7)
In Cross Language Dependency Parsing using a Bilingual Lexicon
  1. A simple statistical machine translation method, word-by-word decoding, where not a parallel corpus but a bilingual lexicon is necessary, is adopted for the treebank translation.
    Page 1, “Abstract”
  2. Machine translation has been shown one of the most expensive language processing tasks, as a great deal of time and space is required to perform this task.
    Page 1, “Introduction”
  3. In addition, a standard statistical machine translation method based on a parallel corpus will not work effectively if it is not able to find a parallel corpus that right covers source and target treebanks.
    Page 1, “Introduction”
  4. In our method, a machine translation method is applied to tackle golden-standard treebank, while all the previous works focus on the unlabeled data.
    Page 2, “The Related Work”
  5. The proposed parser using features from monolingual and mutual constraints helped its log-linear model to achieve better performance for both monolingual parsers and machine translation system.
    Page 2, “The Related Work”
  6. The second is that a parallel corpus is required for their work and a strict statistical machine translation procedure was performed, while our approach holds a merit of simplicity as only a bilingual lexicon is required.
    Page 2, “The Related Work”
  7. A word-by-word statistical machine translation strategy is adopted to translate words attached with the respective dependency information from the source language to the target one.
    Page 3, “Treebank Translation and Dependency Transformation”
  8. A simple statistical machine translation technique, word-by-word decoding, where only a bilingual lexicon is necessary, is used to translate the source treebank.
    Page 8, “Conclusion and Future Work”

See all papers in Proc. ACL 2009 that mention machine translation.

See all papers in Proc. ACL that mention machine translation.

Back to top.

Chinese word

Appears in 7 sentences as: Chinese word (7) Chinese words (1)
In Cross Language Dependency Parsing using a Bilingual Lexicon
  1. Translate the PTB text into Chinese word by word.
    Page 3, “Treebank Translation and Dependency Transformation”
  2. After the target sentence is generated, the attached POS tags and dependency information of each English word will also be transferred to each corresponding Chinese word .
    Page 3, “Treebank Translation and Dependency Transformation”
  3. Although we try to perform an exact word-by-word translation, this aim cannot be fully reached in fact, as the following case is frequently encountered, multiple English words have to be translated into one Chinese word .
    Page 3, “Treebank Translation and Dependency Transformation”
  4. To solve this problem, we use a policy that lets the output Chinese word only inherits the attached information of the highest syntactic head in the original multiple English words.
    Page 3, “Treebank Translation and Dependency Transformation”
  5. In addition, there is often a close connection between the meaning of a Chinese word and its first or last character.
    Page 4, “Dependency Parsing: Baseline”
  6. Chinese word should be strictly segmented according to the guideline before POS tags and dependency relations are annotated.
    Page 5, “Exploiting the Translated Treebank”
  7. English treebank is translated into Chinese word by word, Chinese words in the translated text are exactly some entries from the bilingual lexicon, they are actually irregular phrases, short sentences or something else rather than words that follows any existing word segmentation convention.
    Page 6, “Exploiting the Translated Treebank”

See all papers in Proc. ACL 2009 that mention Chinese word.

See all papers in Proc. ACL that mention Chinese word.

Back to top.

POS tag

Appears in 7 sentences as: POS tag (5) POS tags (3)
In Cross Language Dependency Parsing using a Bilingual Lexicon
  1. Bind POS tag and dependency relation of a word with itself; 2.
    Page 3, “Treebank Translation and Dependency Transformation”
  2. After the target sentence is generated, the attached POS tags and dependency information of each English word will also be transferred to each corresponding Chinese word.
    Page 3, “Treebank Translation and Dependency Transformation”
  3. pas POS tag of word
    Page 4, “Dependency Parsing: Baseline”
  4. cpos] coarse POS: the first letter of POS tag of word
    Page 4, “Dependency Parsing: Baseline”
  5. cposZ coarse POS: the first two POS tags of word
    Page 4, “Dependency Parsing: Baseline”
  6. Chinese word should be strictly segmented according to the guideline before POS tags and dependency relations are annotated.
    Page 5, “Exploiting the Translated Treebank”
  7. The difference is, rootscore counts for the given POS tag occurring as ROOT, and pairscore counts for two POS tag combination occurring for a dependent relationship.
    Page 6, “Exploiting the Translated Treebank”

See all papers in Proc. ACL 2009 that mention POS tag.

See all papers in Proc. ACL that mention POS tag.

Back to top.

parallel corpus

Appears in 5 sentences as: parallel corpus (6)
In Cross Language Dependency Parsing using a Bilingual Lexicon
  1. A simple statistical machine translation method, word-by-word decoding, where not a parallel corpus but a bilingual lexicon is necessary, is adopted for the treebank translation.
    Page 1, “Abstract”
  2. In addition, a standard statistical machine translation method based on a parallel corpus will not work effectively if it is not able to find a parallel corpus that right covers source and target treebanks.
    Page 1, “Introduction”
  3. However, dependency parsing focuses on the relations of word pairs, this allows us to use a dictionary-based translation without assuming a parallel corpus available, and the training stage of translation may be ignored and the decoding will be quite fast in this case.
    Page 1, “Introduction”
  4. The second is that a parallel corpus is required for their work and a strict statistical machine translation procedure was performed, while our approach holds a merit of simplicity as only a bilingual lexicon is required.
    Page 2, “The Related Work”
  5. Since we use a lexicon rather than a parallel corpus to estimate the translation probabilities, we simply assign uniform probabilities to all translation options.
    Page 3, “Treebank Translation and Dependency Transformation”

See all papers in Proc. ACL 2009 that mention parallel corpus.

See all papers in Proc. ACL that mention parallel corpus.

Back to top.

statistical machine translation

Appears in 5 sentences as: statistical machine translation (5)
In Cross Language Dependency Parsing using a Bilingual Lexicon
  1. A simple statistical machine translation method, word-by-word decoding, where not a parallel corpus but a bilingual lexicon is necessary, is adopted for the treebank translation.
    Page 1, “Abstract”
  2. In addition, a standard statistical machine translation method based on a parallel corpus will not work effectively if it is not able to find a parallel corpus that right covers source and target treebanks.
    Page 1, “Introduction”
  3. The second is that a parallel corpus is required for their work and a strict statistical machine translation procedure was performed, while our approach holds a merit of simplicity as only a bilingual lexicon is required.
    Page 2, “The Related Work”
  4. A word-by-word statistical machine translation strategy is adopted to translate words attached with the respective dependency information from the source language to the target one.
    Page 3, “Treebank Translation and Dependency Transformation”
  5. A simple statistical machine translation technique, word-by-word decoding, where only a bilingual lexicon is necessary, is used to translate the source treebank.
    Page 8, “Conclusion and Future Work”

See all papers in Proc. ACL 2009 that mention statistical machine translation.

See all papers in Proc. ACL that mention statistical machine translation.

Back to top.

beam search

Appears in 5 sentences as: Beam Search (1) beam search (4)
In Cross Language Dependency Parsing using a Bilingual Lexicon
  1. A beam search algorithm is used for this process to find the best path from all the translation options; As the training stage, especially, the most time-consuming alignment substage, is skipped, the translation only includes a decoding procedure that takes about 4.5 hours for about one million words of the PTB in a 2.8GHz PC.
    Page 3, “Treebank Translation and Dependency Transformation”
  2. 4.2 Parsing using 3 Beam Search Algorithm
    Page 4, “Dependency Parsing: Baseline”
  3. We use a beam search algorithm to find the object parsing action sequence.
    Page 5, “Dependency Parsing: Baseline”
  4. beam search algorithm with width 5 is used for parsing, otherwise, a simple shift-reduce decoding is used.
    Page 7, “Evaluation Results”
  5. +d 0.861 0.870 “+d: using three Markovian features preact and beam search decoding.
    Page 7, “Evaluation Results”

See all papers in Proc. ACL 2009 that mention beam search.

See all papers in Proc. ACL that mention beam search.

Back to top.

shift-reduce

Appears in 4 sentences as: shift-reduce (4)
In Cross Language Dependency Parsing using a Bilingual Lexicon
  1. In detail, a shift-reduce method is adopted as in (Nivre, 2003), where a classifier is used to make a parsing decision step by step.
    Page 4, “Dependency Parsing: Baseline”
  2. While memory-based and margin-based learning approaches such as support vector machines are popularly applied to shift-reduce parsing, we apply maximum entropy model as the learning model for efficient training and adopting overlapped features as our work in (Zhao and Kit, 2008), especially, those character-level ones for Chinese parsing.
    Page 4, “Dependency Parsing: Baseline”
  3. Without this type of features, a shift-reduce parser may directly scan through an input sequence in linear time.
    Page 4, “Dependency Parsing: Baseline”
  4. beam search algorithm with width 5 is used for parsing, otherwise, a simple shift-reduce decoding is used.
    Page 7, “Evaluation Results”

See all papers in Proc. ACL 2009 that mention shift-reduce.

See all papers in Proc. ACL that mention shift-reduce.

Back to top.

language model

Appears in 4 sentences as: language model (3) language model, (1)
In Cross Language Dependency Parsing using a Bilingual Lexicon
  1. In detail, a word-based decoding is used, which adopts a log-linear framework as in (Och and Ney, 2002) with only two features, translation model and language model,
    Page 3, “Treebank Translation and Dependency Transformation”
  2. is the language model , a word trigram model trained from the CTB.
    Page 3, “Treebank Translation and Dependency Transformation”
  3. Thus the decoding process is actually only determined by the language model .
    Page 3, “Treebank Translation and Dependency Transformation”
  4. Similar to the “bag translation” experiment in (Brown et al., 1990), the candidate target sentences made up by a sequence of the optional target words are ranked by the trigram language model .
    Page 3, “Treebank Translation and Dependency Transformation”

See all papers in Proc. ACL 2009 that mention language model.

See all papers in Proc. ACL that mention language model.

Back to top.

UAS

Appears in 4 sentences as: UAS (4) “UAS (2)
In Cross Language Dependency Parsing using a Bilingual Lexicon
  1. The quality of the parser is measured by the parsing accuracy or the unlabeled attachment score ( UAS ), i.e., the percentage of tokens with correct head.
    Page 6, “Evaluation Results”
  2. Two types of scores are reported for comparison: “UAS without p” is the UAS score without all punctuation tokens and “UAS with p” is the one with all punctuation tokens.
    Page 6, “Evaluation Results”
  3. Table 5 shows the results achieved by other researchers and ours ( UAS with p), which indicates that our parser outperforms any other ones 4.
    Page 7, “Evaluation Results”
  4. 4There is a slight exception: using the same data splitting, (Yu et al., 2008) reported UAS without p as 0.873 versus ours, 0.870.
    Page 7, “Evaluation Results”

See all papers in Proc. ACL 2009 that mention UAS.

See all papers in Proc. ACL that mention UAS.

Back to top.

unlabeled data

Appears in 4 sentences as: unlabeled data (5)
In Cross Language Dependency Parsing using a Bilingual Lexicon
  1. Typical domain adaptation tasks often assume annotated data in new domain absent or insufficient and a large scale unlabeled data available.
    Page 2, “The Related Work”
  2. As unlabeled data are concerned, semi-supervised or unsupervised methods will be naturally adopted.
    Page 2, “The Related Work”
  3. The first is usually focus on exploiting automatic generated labeled data from the unlabeled data (Steedman et al., 2003; McClosky et al., 2006; Reichart and Rappoport, 2007; Sagae and Tsujii, 2007; Chen et al., 2008), the second is on combining supervised and unsupervised methods, and only unlabeled data are considered (Smith and Eisner, 2006; Wang and Schuurmans, 2008; Koo et al., 2008).
    Page 2, “The Related Work”
  4. In our method, a machine translation method is applied to tackle golden-standard treebank, while all the previous works focus on the unlabeled data .
    Page 2, “The Related Work”

See all papers in Proc. ACL 2009 that mention unlabeled data.

See all papers in Proc. ACL that mention unlabeled data.

Back to top.

feature set

Appears in 4 sentences as: feature set (2) feature sets (2)
In Cross Language Dependency Parsing using a Bilingual Lexicon
  1. With notations defined in Table l, a feature set as shown in Table 2 is adopted.
    Page 4, “Dependency Parsing: Baseline”
  2. We used a large scale feature selection approach as in (Zhao et al., 2009) to obtain the feature set in Table 2.
    Page 4, “Dependency Parsing: Baseline”
  3. The results with different feature sets are in Table 4.
    Page 6, “Evaluation Results”
  4. Table 4: The results with different feature sets features with p without p
    Page 7, “Evaluation Results”

See all papers in Proc. ACL 2009 that mention feature set.

See all papers in Proc. ACL that mention feature set.

Back to top.

word segmentation

Appears in 4 sentences as: word segmentation (4)
In Cross Language Dependency Parsing using a Bilingual Lexicon
  1. However, Chinese has a special primary processing task, i.e., word segmentation .
    Page 5, “Exploiting the Translated Treebank”
  2. Note that CTB or any other Chinese treebank has its own word segmentation guideline.
    Page 5, “Exploiting the Translated Treebank”
  3. English treebank is translated into Chinese word by word, Chinese words in the translated text are exactly some entries from the bilingual lexicon, they are actually irregular phrases, short sentences or something else rather than words that follows any existing word segmentation convention.
    Page 6, “Exploiting the Translated Treebank”
  4. If the bilingual lexicon is not carefully selected or refined according to the treebank where the Chinese parser is trained from, then there will be a serious inconsistence on word segmentation conventions between the translated and the target treebanks.
    Page 6, “Exploiting the Translated Treebank”

See all papers in Proc. ACL 2009 that mention word segmentation.

See all papers in Proc. ACL that mention word segmentation.

Back to top.

named entities

Appears in 3 sentences as: named entities (2) named entity (1)
In Cross Language Dependency Parsing using a Bilingual Lexicon
  1. As a bilingual lexicon is required for our task and none of existing lexicons are suitable for translating PTB, two lexicons, LDC Chinese-English Translation Lexicon Version 2.0 (LDC2002L27), and an English to Chinese lexicon in StarDith, are conflated, with some necessary manual extensions, to cover 99% words appearing in the PTB (the most part of the untranslated words are named entities .).
    Page 3, “Treebank Translation and Dependency Transformation”
  2. We once found many words, mostly named entities , were outside the lexicon.
    Page 8, “Discussion”
  3. Thus we managed to collect a named entity translation dictionary to enhance the original one.
    Page 8, “Discussion”

See all papers in Proc. ACL 2009 that mention named entities.

See all papers in Proc. ACL that mention named entities.

Back to top.

language pair

Appears in 3 sentences as: language pair (3) language pairs (1)
In Cross Language Dependency Parsing using a Bilingual Lexicon
  1. What our method relies on is not the close relation of the chosen language pair but the similarity of two treebanks, this is the most different from the previous work.
    Page 1, “Introduction”
  2. As fewer language properties are concerned, our approach holds the more possibility to be extended to other language pairs than theirs.
    Page 2, “The Related Work”
  3. (Zeman and Resnik, 2008) assumed that the morphology and syntax in the language pair should be very similar, and that is so for the language pair that they considered, Danish and Swedish, two very close north European languages.
    Page 7, “Discussion”

See all papers in Proc. ACL 2009 that mention language pair.

See all papers in Proc. ACL that mention language pair.

Back to top.