Unsupervised Dependency Parsing with Transferring Distribution via Parallel Guidance and Entropy Regularization
Ma, Xuezhe and Xia, Fei

Article Structure

Abstract

We present a novel approach for inducing unsupervised dependency parsers for languages that have no labeled training data, but have translated text in a resource-rich language.

Introduction

In recent years, dependency parsing has gained universal interest due to its usefulness in a Wide range of applications such as synonym generation (Shinyama et al., 2002), relation extraction (Nguyen et al., 2009) and machine translation (Katz-Brown et al., 2011; Xie et al., 2011).

Our Approach

Dependency trees represent syntactic relationships through labeled directed edges between heads and their dependents.

Data and Tools

In this section, we illustrate the data sets used in our experiments and the tools for data preparation.

Experiments

In this section, we will describe the details of our experiments and compare our results with previous methods.

Conclusion

In this paper, we propose an unsupervised projective dependency parsing approach for resource-poor languages, using existing resources from a resource-rich source language.

Topics

treebanks

Appears in 28 sentences as: treebank (3) Treebanks (17) treebanks (18)
In Unsupervised Dependency Parsing with Transferring Distribution via Parallel Guidance and Entropy Regularization
  1. We perform experiments on three Data sets — Version 1.0 and version 2.0 of Google Universal Dependency Treebanks and Treebanks from CoNLL shared-tasks, across ten languages.
    Page 1, “Abstract”
  2. Several supervised dependency parsing algorithms (Nivre and Scholz, 2004; McDonald et al., 2005a; McDonald et al., 2005b; McDonald and Pereira, 2006; Carreras, 2007; K00 and Collins, 2010; Ma and Zhao, 2012; Zhang et al., 2013) have been proposed and achieved high parsing accuracies on several treebanks, due in large part to the availability of dependency treebanks in a number of languages (McDonald et al., 2013).
    Page 1, “Introduction”
  3. However, the manually annotated treebanks that these parsers rely on are highly expensive to create, in particular when we want to build treebanks for resource-poor languages.
    Page 1, “Introduction”
  4. However, most bilingual text parsing approaches require bilingual treebanks — treebanks that have manually annotated tree structures on both sides of source and target languages (Smith and Smith, 2004; Burkett and Klein, 2008), or have tree structures on the source side and translated sentences in the target languages (Huang et
    Page 1, “Introduction”
  5. Obviously, bilingual treebanks are much more difficult to acquire than the resources required in our scenario, since the labeled training data and the parallel text in our case are completely separated.
    Page 2, “Introduction”
  6. Table 1: Data statistics of two versions of Google Universal Treebanks for the target languages.
    Page 5, “Our Approach”
  7. Our experiments rely on two kinds of data sets: (i) Monolingual Treebanks with consistent annotation schema — English treebank is used to train the English parsing model, and the Treebanks for target languages are used to evaluate the parsing performance of our approach.
    Page 5, “Data and Tools”
  8. The monolingual treebanks in our experiments are from the Google Universal Dependency Treebanks (McDonald et al., 2013), for the reason that the treebanks of different languages in Google Universal Dependency Treebanks have consistent syntactic representations.
    Page 5, “Data and Tools”
  9. The treebanks from CoNLL shared-tasks on dependency parsing (Buchholz and Marsi, 2006; Nivre et al., 2007) appear to be another reasonable choice.
    Page 5, “Data and Tools”
  10. However, previous studies (McDonald et al., 2011; McDonald et al., 2013) have demonstrated that a homogeneous representation is critical for multilingual language technologies that require consistent cross-lingual analysis for downstream components, and the heterogenous representations used in CoNLL shared-tasks treebanks weaken any conclusion that can be drawn.
    Page 5, “Data and Tools”
  11. Table 3: UAS for two versions of our approach, together with baseline and oracle systems on Google Universal Treebanks version 1.0.
    Page 6, “Data and Tools”

See all papers in Proc. ACL 2014 that mention treebanks.

See all papers in Proc. ACL that mention treebanks.

Back to top.

parallel data

Appears in 22 sentences as: parallel data (23)
In Unsupervised Dependency Parsing with Transferring Distribution via Parallel Guidance and Entropy Regularization
  1. In this paper, we consider a practically motivated scenario, in which we want to build statistical parsers for resource-poor target languages, using existing resources from a resource-rich source language (like English).1 We assume that there are absolutely no labeled training data for the target language, but we have access to parallel data with a resource-rich language and a sufficient amount of labeled training data to build an accurate parser for the resource-rich language.
    Page 1, “Introduction”
  2. (2011) proposed an approach for unsupervised dependency parsing with nonparallel multilingual guidance from one or more helper languages, in which parallel data is not used.
    Page 2, “Introduction”
  3. We train probabilistic parsing models for resource-poor languages by maximizing a combination of likelihood on parallel data and confidence on unlabeled data.
    Page 2, “Introduction”
  4. Another advantage of the learning framework is that it combines both the likelihood on parallel data and confidence on unlabeled data, so that both parallel text and unlabeled data can be utilized in our approach.
    Page 2, “Our Approach”
  5. In our scenario, we have a set of aligned parallel data P = mg, a,} where ai is the word alignment for the pair of source-target sentences (mf, and a set of unlabeled sentences of the target language U = We also have a trained English parsing model pAE Then the K in equation (7) can be divided into two cases, according to whether 3:,- belongs to parallel data set P or unlabeled data set U.
    Page 3, “Our Approach”
  6. We define the transferring distribution by defining the transferring weight utilizing the English parsing model pAE (y Via parallel data with word alignments:
    Page 3, “Our Approach”
  7. Table 2: The number of tokens in parallel data used in our experiments.
    Page 5, “Our Approach”
  8. The parallel data come from the Europarl corpus version 7 (Koehn, 2005) and Kaist Corpus4.
    Page 5, “Data and Tools”
  9. The parallel data for these three languages are also from the Europarl corpus version 7.
    Page 6, “Data and Tools”
  10. POS tags are not available for parallel data in the Europarl and Kaist corpus, so we need to pro-
    Page 6, “Data and Tools”
  11. 6Japanese and Indonesia are excluded as no practicable parallel data are available.
    Page 6, “Experiments”

See all papers in Proc. ACL 2014 that mention parallel data.

See all papers in Proc. ACL that mention parallel data.

Back to top.

dependency parsing

Appears in 21 sentences as: dependency parser (6) dependency parsers (3) dependency parsing (13)
In Unsupervised Dependency Parsing with Transferring Distribution via Parallel Guidance and Entropy Regularization
  1. We present a novel approach for inducing unsupervised dependency parsers for languages that have no labeled training data, but have translated text in a resource-rich language.
    Page 1, “Abstract”
  2. Our method can be used as a purely monolingual dependency parser , requiring no human translations for the test data, thus making it applicable to a Wide range of resource-poor languages.
    Page 1, “Abstract”
  3. In recent years, dependency parsing has gained universal interest due to its usefulness in a Wide range of applications such as synonym generation (Shinyama et al., 2002), relation extraction (Nguyen et al., 2009) and machine translation (Katz-Brown et al., 2011; Xie et al., 2011).
    Page 1, “Introduction”
  4. Several supervised dependency parsing algorithms (Nivre and Scholz, 2004; McDonald et al., 2005a; McDonald et al., 2005b; McDonald and Pereira, 2006; Carreras, 2007; K00 and Collins, 2010; Ma and Zhao, 2012; Zhang et al., 2013) have been proposed and achieved high parsing accuracies on several treebanks, due in large part to the availability of dependency treebanks in a number of languages (McDonald et al., 2013).
    Page 1, “Introduction”
  5. (2011) proposed an approach for unsupervised dependency parsing with nonparallel multilingual guidance from one or more helper languages, in which parallel data is not used.
    Page 2, “Introduction”
  6. gap to fully supervised dependency parsing performance.
    Page 2, “Introduction”
  7. The focus of this work is on building dependency parsers for target languages, assuming that an accurate English dependency parser and some parallel text between the two languages are available.
    Page 2, “Our Approach”
  8. The probabilistic model for dependency parsing defines a family of conditional probability p over all y given sentence :13, with a log-linear form:
    Page 2, “Our Approach”
  9. One of the most common model training methods for supervised dependency parser is Maximum conditional likelihood estimation.
    Page 3, “Our Approach”
  10. For a supervised dependency parser with a set of training data }, the logarithm of the likelihood (aka.
    Page 3, “Our Approach”
  11. However, in our scenario we have no labeled training data for target languages but we have some parallel and unlabeled data plus an English dependency parser .
    Page 3, “Our Approach”

See all papers in Proc. ACL 2014 that mention dependency parsing.

See all papers in Proc. ACL that mention dependency parsing.

Back to top.

parsing model

Appears in 21 sentences as: Parsing Model (1) parsing model (17) parsing models (3)
In Unsupervised Dependency Parsing with Transferring Distribution via Parallel Guidance and Entropy Regularization
  1. We train probabilistic parsing models for resource-poor languages by transferring cross-lingual knowledge from resource-rich language with entropy regularization.
    Page 1, “Abstract”
  2. We train probabilistic parsing models for resource-poor languages by maximizing a combination of likelihood on parallel data and confidence on unlabeled data.
    Page 2, “Introduction”
  3. Central to our approach is a maximizing likelihood learning framework, in which we use an English parser and parallel text to estimate the “transferring distribution” of the target language parsing model (See Section 2.2 for more details).
    Page 2, “Our Approach”
  4. 2.1 Edge-Factored Parsing Model
    Page 2, “Our Approach”
  5. A common strategy to make this parsing model efficiently computable is to factor dependency trees into sets of edges:
    Page 3, “Our Approach”
  6. In our scenario, we have a set of aligned parallel data P = mg, a,} where ai is the word alignment for the pair of source-target sentences (mf, and a set of unlabeled sentences of the target language U = We also have a trained English parsing model pAE Then the K in equation (7) can be divided into two cases, according to whether 3:,- belongs to parallel data set P or unlabeled data set U.
    Page 3, “Our Approach”
  7. We define the transferring distribution by defining the transferring weight utilizing the English parsing model pAE (y Via parallel data with word alignments:
    Page 3, “Our Approach”
  8. where wE(-, is the weight function of the English parsing model pAE and egelex is the delexicalized form2 of the edge et.
    Page 3, “Our Approach”
  9. From the definition of the transferring weight, we can see that, if an edge et of the target language sentence is aligned to an edge e5 of the English sentence ref, we transfer the weight of edge et to the corresponding weight of edge e5 in the English parsing model pAE If the edge et is not aligned to any edges of the English sentence , we reduce the edge et to the delexicalized form and calculate the transferring weight in
    Page 3, “Our Approach”
  10. the English parsing model .
    Page 3, “Our Approach”
  11. First, by transferring the weight function to the corresponding weight in the well-developed English parsing model , we can project syntactic information across language boundaries.
    Page 4, “Our Approach”

See all papers in Proc. ACL 2014 that mention parsing model.

See all papers in Proc. ACL that mention parsing model.

Back to top.

unlabeled data

Appears in 16 sentences as: unlabeled data (18)
In Unsupervised Dependency Parsing with Transferring Distribution via Parallel Guidance and Entropy Regularization
  1. We train probabilistic parsing models for resource-poor languages by maximizing a combination of likelihood on parallel data and confidence on unlabeled data .
    Page 2, “Introduction”
  2. Another advantage of the learning framework is that it combines both the likelihood on parallel data and confidence on unlabeled data, so that both parallel text and unlabeled data can be utilized in our approach.
    Page 2, “Our Approach”
  3. However, in our scenario we have no labeled training data for target languages but we have some parallel and unlabeled data plus an English dependency parser.
    Page 3, “Our Approach”
  4. In our scenario, we have a set of aligned parallel data P = mg, a,} where ai is the word alignment for the pair of source-target sentences (mf, and a set of unlabeled sentences of the target language U = We also have a trained English parsing model pAE Then the K in equation (7) can be divided into two cases, according to whether 3:,- belongs to parallel data set P or unlabeled data set U.
    Page 3, “Our Approach”
  5. Prepare parallel text by running word alignment method to obtain word alignments,3 and prepare the unlabeled data .
    Page 5, “Our Approach”
  6. Train a parsing model for the target language by minimizing the objective K I function which is the combination of expected negative log-likelihood on parallel and unlabeled data .
    Page 5, “Our Approach”
  7. the dependency annotations off the training portion of each treebank, and use that as the unlabeled data for that target language.
    Page 7, “Experiments”
  8. -U: Our approach training on only parallel data without unlabeled data for the target language.
    Page 7, “Experiments”
  9. +U: Our approach training on both parallel and unlabeled data .
    Page 7, “Experiments”
  10. By adding entropy regularization from unlabeled data , our full model achieves average improvement of 0.29% over the “-U” setting.
    Page 8, “Experiments”
  11. We run two versions of our approach for each of the parallel data sets, one with unlabeled data (+U) and the other without them (-U).
    Page 8, “Experiments”

See all papers in Proc. ACL 2014 that mention unlabeled data.

See all papers in Proc. ACL that mention unlabeled data.

Back to top.

word alignments

Appears in 11 sentences as: word alignment (3) Word Alignments (1) word alignments (6) word alignments: (1)
In Unsupervised Dependency Parsing with Transferring Distribution via Parallel Guidance and Entropy Regularization
  1. In our scenario, we have a set of aligned parallel data P = mg, a,} where ai is the word alignment for the pair of source-target sentences (mf, and a set of unlabeled sentences of the target language U = We also have a trained English parsing model pAE Then the K in equation (7) can be divided into two cases, according to whether 3:,- belongs to parallel data set P or unlabeled data set U.
    Page 3, “Our Approach”
  2. We define the transferring distribution by defining the transferring weight utilizing the English parsing model pAE (y Via parallel data with word alignments:
    Page 3, “Our Approach”
  3. By reducing unaligned edges to their deleXicalized forms, we can still use those deleXicalized features, such as part-of-speech tags, for those unaligned edges, and can address problem that automatically generated word alignments include errors.
    Page 4, “Our Approach”
  4. Prepare parallel text by running word alignment method to obtain word alignments,3 and prepare the unlabeled data.
    Page 5, “Our Approach”
  5. 3The word alignment methods do not require additional resources besides parallel text.
    Page 5, “Our Approach”
  6. 3.2 Word Alignments
    Page 6, “Data and Tools”
  7. In our approach, word alignments for the parallel text are required.
    Page 6, “Data and Tools”
  8. We perform word alignments with the open source GIZA++ toolkit5.
    Page 6, “Data and Tools”
  9. Then we run GIZA++ with the default setting to generate word alignments in both directions.
    Page 6, “Data and Tools”
  10. We then make the intersection of the word alignments of two directions to generate one-to-one alignments.
    Page 6, “Data and Tools”
  11. By using IGT Data, not only can we obtain more accurate word alignments , but also extract useful cross-lingual information for the resource-poor language.
    Page 9, “Experiments”

See all papers in Proc. ACL 2014 that mention word alignments.

See all papers in Proc. ACL that mention word alignments.

Back to top.

POS tags

Appears in 9 sentences as: POS tag (1) POS Tagger (1) POS tagger (2) POS taggers (3) POS tags (5)
In Unsupervised Dependency Parsing with Transferring Distribution via Parallel Guidance and Entropy Regularization
  1. The set of POS tags needs to be consistent across languages and treebanks.
    Page 6, “Data and Tools”
  2. For this reason we use the universal POS tag set of Petrov et al.
    Page 6, “Data and Tools”
  3. POS tags are not available for parallel data in the Europarl and Kaist corpus, so we need to pro-
    Page 6, “Data and Tools”
  4. vide the POS tags for these data.
    Page 6, “Data and Tools”
  5. In our experiments, we train a Stanford POS Tagger (Toutanova et al., 2003) for each language.
    Page 6, “Data and Tools”
  6. The labeled training data for each POS tagger are extracted from the training portion of each Treebanks.
    Page 6, “Data and Tools”
  7. For the purpose of evaluation of our approach and comparison with previous work, we need to exploit the gold POS tags to train the POS taggers .
    Page 6, “Data and Tools”
  8. Fortunately, some recently proposed POS taggers, such as the POS tagger of Das and Petrov (2011), rely only on labeled training data for English and the same kind of parallel text in our approach.
    Page 6, “Data and Tools”
  9. In practice we can use this kind of POS taggers to predict POS tags , whose tagging accuracy is around 85%.
    Page 6, “Data and Tools”

See all papers in Proc. ACL 2014 that mention POS tags.

See all papers in Proc. ACL that mention POS tags.

Back to top.

CoNLL

Appears in 8 sentences as: CoNLL (8)
In Unsupervised Dependency Parsing with Transferring Distribution via Parallel Guidance and Entropy Regularization
  1. We perform experiments on three Data sets — Version 1.0 and version 2.0 of Google Universal Dependency Treebanks and Treebanks from CoNLL shared-tasks, across ten languages.
    Page 1, “Abstract”
  2. The treebanks from CoNLL shared-tasks on dependency parsing (Buchholz and Marsi, 2006; Nivre et al., 2007) appear to be another reasonable choice.
    Page 5, “Data and Tools”
  3. However, previous studies (McDonald et al., 2011; McDonald et al., 2013) have demonstrated that a homogeneous representation is critical for multilingual language technologies that require consistent cross-lingual analysis for downstream components, and the heterogenous representations used in CoNLL shared-tasks treebanks weaken any conclusion that can be drawn.
    Page 5, “Data and Tools”
  4. For comparison with previous studies, nevertheless, we also run experiments on CoNLL treebanks (see Section 4.4 for more details).
    Page 6, “Data and Tools”
  5. We evaluate our approach on three target languages from CoNLL shared task treebanks, which do not appear in Google Universal Treebanks.
    Page 6, “Data and Tools”
  6. 4.4 Experiments on CoNLL Treebanks
    Page 8, “Experiments”
  7. To make a thorough empirical comparison with previous studies, we also evaluate our system without unlabeled data (-U) on treebanks from CoNLL shared task on dependency parsing (Buchholz and Marsi, 2006; Nivre et al., 2007).
    Page 8, “Experiments”
  8. Table 6: Parsing results on treebanks from CoNLL shared tasks for eight target languages.
    Page 8, “Experiments”

See all papers in Proc. ACL 2014 that mention CoNLL.

See all papers in Proc. ACL that mention CoNLL.

Back to top.

dependency tree

Appears in 7 sentences as: dependency tree (4) Dependency trees (1) dependency trees (2)
In Unsupervised Dependency Parsing with Transferring Distribution via Parallel Guidance and Entropy Regularization
  1. Figure 1: An example dependency tree .
    Page 2, “Introduction”
  2. Dependency trees represent syntactic relationships through labeled directed edges between heads and their dependents.
    Page 2, “Our Approach”
  3. For example, Figure 1 shows a dependency tree for the sentence, Economic news had little efi‘ect on financial markets, with the sentence’s root-symbol as its root.
    Page 2, “Our Approach”
  4. In this paper, we will use the following notation: :13 represents a generic input sentence, and y represents a generic dependency tree .
    Page 2, “Our Approach”
  5. T(a:) is used to denote the set of possible dependency trees for sentence :13.
    Page 2, “Our Approach”
  6. A common strategy to make this parsing model efficiently computable is to factor dependency trees into sets of edges:
    Page 3, “Our Approach”
  7. That is, dependency tree 3/ is treated as a set of edges e and each feature function Fj (y, w) is equal to the sum of all the features fj (e,
    Page 3, “Our Approach”

See all papers in Proc. ACL 2014 that mention dependency tree.

See all papers in Proc. ACL that mention dependency tree.

Back to top.

model training

Appears in 6 sentences as: Model Training (2) model training (4)
In Unsupervised Dependency Parsing with Transferring Distribution via Parallel Guidance and Entropy Regularization
  1. 2.2 Model Training
    Page 3, “Our Approach”
  2. One of the most common model training methods for supervised dependency parser is Maximum conditional likelihood estimation.
    Page 3, “Our Approach”
  3. For the purpose of transferring cross-lingual information from the English parser via parallel text, we explore the model training method proposed by Smith and Eisner (2007), which presented a generalization of K function (Abney, 2004), and related it to another semi-supervised learning technique, entropy regularization (Jiao et al., 2006; Mann and McCallum, 2007).
    Page 3, “Our Approach”
  4. 2.3 Algorithms and Complexity for Model Training
    Page 4, “Our Approach”
  5. For projective parsing, several algorithms (McDonald and Pereira, 2006; Carreras, 2007; Koo and Collins, 2010; Ma and Zhao, 2012) have been proposed to solve the model training problems (calculation of objective function and gradient) for different factorizations.
    Page 9, “Experiments”
  6. By presenting a model training framework, our approach can utilize parallel text to estimate transferring distribution with the help of a well-developed resource-rich language dependency parser, and use unlabeled data as entropy regularization.
    Page 9, “Conclusion”

See all papers in Proc. ACL 2014 that mention model training.

See all papers in Proc. ACL that mention model training.

Back to top.

UAS

Appears in 5 sentences as: UAS (5)
In Unsupervised Dependency Parsing with Transferring Distribution via Parallel Guidance and Entropy Regularization
  1. Table 3: UAS for two versions of our approach, together with baseline and oracle systems on Google Universal Treebanks version 1.0.
    Page 6, “Data and Tools”
  2. Table 4: UAS for two versions of our approach, together with baseline and oracle systems on Google Universal Treebanks version 2.0.
    Page 6, “Data and Tools”
  3. Parsing accuracy is measured with unlabeled attachment score ( UAS ): the percentage of words with the correct head.
    Page 7, “Experiments”
  4. Moreover, our approach considerably bridges the gap to fully supervised dependency parsers, whose average UAS is 84.67%.
    Page 8, “Experiments”
  5. Table 5 illustrates the UAS of our approach trained on different amounts of parallel data, together with the results of the projected transfer parser re-implemented by us (PTPT).
    Page 8, “Experiments”

See all papers in Proc. ACL 2014 that mention UAS.

See all papers in Proc. ACL that mention UAS.

Back to top.

cross-lingual

Appears in 5 sentences as: cross-lingual (5)
In Unsupervised Dependency Parsing with Transferring Distribution via Parallel Guidance and Entropy Regularization
  1. We train probabilistic parsing models for resource-poor languages by transferring cross-lingual knowledge from resource-rich language with entropy regularization.
    Page 1, “Abstract”
  2. We extend this learning framework so that it can be used to transfer cross-lingual knowledge between different languages.
    Page 2, “Introduction”
  3. For the purpose of transferring cross-lingual information from the English parser via parallel text, we explore the model training method proposed by Smith and Eisner (2007), which presented a generalization of K function (Abney, 2004), and related it to another semi-supervised learning technique, entropy regularization (Jiao et al., 2006; Mann and McCallum, 2007).
    Page 3, “Our Approach”
  4. However, previous studies (McDonald et al., 2011; McDonald et al., 2013) have demonstrated that a homogeneous representation is critical for multilingual language technologies that require consistent cross-lingual analysis for downstream components, and the heterogenous representations used in CoNLL shared-tasks treebanks weaken any conclusion that can be drawn.
    Page 5, “Data and Tools”
  5. By using IGT Data, not only can we obtain more accurate word alignments, but also extract useful cross-lingual information for the resource-poor language.
    Page 9, “Experiments”

See all papers in Proc. ACL 2014 that mention cross-lingual.

See all papers in Proc. ACL that mention cross-lingual.

Back to top.

baseline systems

Appears in 5 sentences as: baseline systems (5)
In Unsupervised Dependency Parsing with Transferring Distribution via Parallel Guidance and Entropy Regularization
  1. Table 3 and Table 4 shows the parsing results of our approach, together with the results of the baseline systems and the oracle, on version 1.0 and version 2.0 of Google Universal Treebanks, respectively.
    Page 7, “Experiments”
  2. Our approaches significantly outperform all the baseline systems across all the seven target languages.
    Page 7, “Experiments”
  3. to those five baseline systems and the oracle (OR).
    Page 8, “Experiments”
  4. Our approach outperforms all these baseline systems and achieves state-of-the-art performance on all the eight languages.
    Page 8, “Experiments”
  5. Table 7 shows the results of our system and the results of baseline systems .
    Page 8, “Experiments”

See all papers in Proc. ACL 2014 that mention baseline systems.

See all papers in Proc. ACL that mention baseline systems.

Back to top.

objective function

Appears in 4 sentences as: objective function (6)
In Unsupervised Dependency Parsing with Transferring Distribution via Parallel Guidance and Entropy Regularization
  1. We introduce a multiplier 7 as a tradeoff between the two contributions (parallel and unsupervised) of the objective function K, and the final objective function K I has the following form:
    Page 4, “Our Approach”
  2. To train our parsing model, we need to find out the parameters A that minimize the objective function K I in equation (11).
    Page 4, “Our Approach”
  3. objective function and the gradient of the objective function .
    Page 4, “Our Approach”
  4. For projective parsing, several algorithms (McDonald and Pereira, 2006; Carreras, 2007; Koo and Collins, 2010; Ma and Zhao, 2012) have been proposed to solve the model training problems (calculation of objective function and gradient) for different factorizations.
    Page 9, “Experiments”

See all papers in Proc. ACL 2014 that mention objective function.

See all papers in Proc. ACL that mention objective function.

Back to top.

parallel sentences

Appears in 4 sentences as: parallel sentences (4)
In Unsupervised Dependency Parsing with Transferring Distribution via Parallel Guidance and Entropy Regularization
  1. We train our parsing model with different numbers of parallel sentences to analyze the influence of the amount of parallel data on the parsing performance of our approach.
    Page 7, “Experiments”
  2. The parallel data sets contain 500, 1000, 2000, 5000, 10000 and 20000 parallel sentences , respectively.
    Page 7, “Experiments”
  3. We randomly extract parallel sentences from each corpora, and smaller data sets are subsets of larger ones.
    Page 7, “Experiments”
  4. First, even the parsers trained with only 500 parallel sentences achieve considerably high parsing accuracies (average 70.10% for version 1.0 and 71.59% for version 2.0).
    Page 8, “Experiments”

See all papers in Proc. ACL 2014 that mention parallel sentences.

See all papers in Proc. ACL that mention parallel sentences.

Back to top.

part-of-speech

Appears in 4 sentences as: Part-of-Speech (1) part-of-speech (3)
In Unsupervised Dependency Parsing with Transferring Distribution via Parallel Guidance and Entropy Regularization
  1. By reducing unaligned edges to their deleXicalized forms, we can still use those deleXicalized features, such as part-of-speech tags, for those unaligned edges, and can address problem that automatically generated word alignments include errors.
    Page 4, “Our Approach”
  2. 3.3 Part-of-Speech Tagging
    Page 6, “Data and Tools”
  3. Several features in our parsing model involve part-of-speech (POS) tags of the input sentences.
    Page 6, “Data and Tools”
  4. As part-of-speech tags are also a form of syntactic analysis, this assumption weakens the applicability of our approach.
    Page 6, “Data and Tools”

See all papers in Proc. ACL 2014 that mention part-of-speech.

See all papers in Proc. ACL that mention part-of-speech.

Back to top.

part-of-speech tags

Appears in 3 sentences as: Part-of-Speech Tagging (1) part-of-speech tags (2)
In Unsupervised Dependency Parsing with Transferring Distribution via Parallel Guidance and Entropy Regularization
  1. By reducing unaligned edges to their deleXicalized forms, we can still use those deleXicalized features, such as part-of-speech tags , for those unaligned edges, and can address problem that automatically generated word alignments include errors.
    Page 4, “Our Approach”
  2. 3.3 Part-of-Speech Tagging
    Page 6, “Data and Tools”
  3. As part-of-speech tags are also a form of syntactic analysis, this assumption weakens the applicability of our approach.
    Page 6, “Data and Tools”

See all papers in Proc. ACL 2014 that mention part-of-speech tags.

See all papers in Proc. ACL that mention part-of-speech tags.

Back to top.

grammar induction

Appears in 3 sentences as: grammar induction (3)
In Unsupervised Dependency Parsing with Transferring Distribution via Parallel Guidance and Entropy Regularization
  1. This led to a vast amount of research on unsupervised grammar induction (Carroll and Chamiak, 1992; Klein and Manning, 2004; Smith and Eisner, 2005; Cohen and Smith, 2009; Spitkovsky et al., 2010; Blun-som and Cohn, 2010; Marecek and Straka, 2013; Spitkovsky et al., 2013), which appears to be a natural solution to this problem, as unsupervised methods require only unannotated text for training parsers.
    Page 1, “Introduction”
  2. Unfortunately, the unsupervised grammar induction systems’ parsing accuracies often significantly fall behind those of supervised systems (McDonald et al., 2011).
    Page 1, “Introduction”
  3. “PGI” is the phylogenetic grammar induction model of Berg-Kirkpatrick and Klein (2010).
    Page 8, “Experiments”

See all papers in Proc. ACL 2014 that mention grammar induction.

See all papers in Proc. ACL that mention grammar induction.

Back to top.

shared task

Appears in 3 sentences as: shared task (2) shared tasks (1)
In Unsupervised Dependency Parsing with Transferring Distribution via Parallel Guidance and Entropy Regularization
  1. We evaluate our approach on three target languages from CoNLL shared task treebanks, which do not appear in Google Universal Treebanks.
    Page 6, “Data and Tools”
  2. To make a thorough empirical comparison with previous studies, we also evaluate our system without unlabeled data (-U) on treebanks from CoNLL shared task on dependency parsing (Buchholz and Marsi, 2006; Nivre et al., 2007).
    Page 8, “Experiments”
  3. Table 6: Parsing results on treebanks from CoNLL shared tasks for eight target languages.
    Page 8, “Experiments”

See all papers in Proc. ACL 2014 that mention shared task.

See all papers in Proc. ACL that mention shared task.

Back to top.