Parsing with Compositional Vector Grammars
Socher, Richard and Bauer, John and Manning, Christopher D. and Andrew Y., Ng

Article Structure

Abstract

Natural language parsing has typically been done with small sets of discrete categories such as NP and VP, but this representation does not capture the full syntactic nor semantic richness of linguistic phrases, and attempts to improve on this by lexicalizing phrases or splitting categories only partly address the problem at the cost of huge feature spaces and sparseness.

Introduction

Syntactic parsing is a central task in natural language processing because of its importance in mediating between linguistic expression and meaning.

Topics

neural network

Appears in 18 sentences as: Neural Network (3) neural network (10) neural networks (6)
In Parsing with Compositional Vector Grammars
  1. Instead, we introduce a Compositional Vector Grammar (CVG), which combines PCFGs with a syntactically untied recursive neural network that learns syntactico-semantic, compositional vector representations.
    Page 1, “Abstract”
  2. The vectors for nonterminals are computed via a new type of recursive neural network which is conditioned on syntactic categories from a PCFG.
    Page 1, “Introduction”
  3. l. CVGs combine the advantages of standard probabilistic context free grammars (PCFG) with those of recursive neural networks (RNNs).
    Page 2, “Introduction”
  4. This requires the composition function to be extremely powerful, since it has to combine phrases with different syntactic head words, and it is hard to optimize since the parameters form a very deep neural network .
    Page 2, “Introduction”
  5. Deep Learning and Recursive Deep Learning Early attempts at using neural networks to describe phrases include Elman (1991), who used recurrent neural networks to create representations of sentences from a simple toy grammar and to analyze the linguistic expressiveness of the resulting representations.
    Page 2, “Introduction”
  6. Collobert and Weston (2008) showed that neural networks can perform well on sequence labeling language processing tasks while also learning appropriate features.
    Page 2, “Introduction”
  7. Henderson (2003) was the first to show that neural networks can be successfully used for large scale parsing.
    Page 3, “Introduction”
  8. (2003) apply recursive neural networks to re-rank possible phrase attachments in an incremental parser.
    Page 3, “Introduction”
  9. The idea is to construct a neural network that outputs high scores for windows that occur in a large unlabeled corpus and low scores for windows where one word is replaced by a random word.
    Page 3, “Introduction”
  10. The standard RNN essentially ignores all POS tags and syntactic categories and each nonterminal node is associated with the same neural network (i.e., the weights across nodes are fully tied).
    Page 4, “Introduction”
  11. Note that in order to replicate the neural network and compute node representations in a bottom up fashion, the parent must have the same dimensionality as the children: p E R”.
    Page 4, “Introduction”

See all papers in Proc. ACL 2013 that mention neural network.

See all papers in Proc. ACL that mention neural network.

Back to top.

recursive

Appears in 11 sentences as: Recursive (4) recursive (7)
In Parsing with Compositional Vector Grammars
  1. Instead, we introduce a Compositional Vector Grammar (CVG), which combines PCFGs with a syntactically untied recursive neural network that learns syntactico-semantic, compositional vector representations.
    Page 1, “Abstract”
  2. The vectors for nonterminals are computed via a new type of recursive neural network which is conditioned on syntactic categories from a PCFG.
    Page 1, “Introduction”
  3. l. CVGs combine the advantages of standard probabilistic context free grammars (PCFG) with those of recursive neural networks (RNNs).
    Page 2, “Introduction”
  4. sets of discrete states and recursive deep learning models that jointly learn classifiers and continuous feature representations for variable-sized inputs.
    Page 2, “Introduction”
  5. Deep Learning and Recursive Deep Learning Early attempts at using neural networks to describe phrases include Elman (1991), who used recurrent neural networks to create representations of sentences from a simple toy grammar and to analyze the linguistic expressiveness of the resulting representations.
    Page 2, “Introduction”
  6. However, their model is lacking in that it cannot represent the recursive structure inherent in natural language.
    Page 2, “Introduction”
  7. (2003) apply recursive neural networks to re-rank possible phrase attachments in an incremental parser.
    Page 3, “Introduction”
  8. Standard Recursive Neural Network
    Page 5, “Introduction”
  9. Figure 2: An example tree with a simple Recursive Neural Network: The same weight matrix is replicated and used to compute all nonterminal node representations.
    Page 5, “Introduction”
  10. Syntactically Untied Recursive Neural Network
    Page 5, “Introduction”
  11. The compositional vectors are learned with a new syntactically untied recursive neural network.
    Page 9, “Introduction”

See all papers in Proc. ACL 2013 that mention recursive.

See all papers in Proc. ACL that mention recursive.

Back to top.

vector representations

Appears in 9 sentences as: vector representation (2) Vector Representations (1) vector representations (5) vector representing (1)
In Parsing with Compositional Vector Grammars
  1. Instead, we introduce a Compositional Vector Grammar (CVG), which combines PCFGs with a syntactically untied recursive neural network that learns syntactico-semantic, compositional vector representations .
    Page 1, “Abstract”
  2. Previous RNN-based parsers used the same (tied) weights at all nodes to compute the vector representing a constituent (Socher et al., 2011b).
    Page 2, “Introduction”
  3. Therefore we combine syntactic and semantic information by giving the parser access to rich syntactico-semantic information in the form of distributional word vectors and compute compositional semantic vector representations for longer phrases (Costa et al., 2003; Menchetti et al., 2005; Socher et al., 2011b).
    Page 3, “Introduction”
  4. We will first briefly introduce single word vector representations and then describe the CVG objective function, tree scoring and inference.
    Page 3, “Introduction”
  5. 3.1 Word Vector Representations
    Page 3, “Introduction”
  6. In most systems that use a vector representation for words, such vectors are based on co-occurrence statistics of each word and its context (Turney and Pantel, 2010).
    Page 3, “Introduction”
  7. These vector representations capture interesting linear relationships (up to some accuracy), such as king—man+w0man % queen (Mikolov et al., 2013).
    Page 3, “Introduction”
  8. This index is used to retrieve the word’s vector representation aw using a simple multiplication with a binary vector 6, which is zero everywhere, except
    Page 3, “Introduction”
  9. Leaf nodes are n-dimensional vector representations of words.
    Page 5, “Introduction”

See all papers in Proc. ACL 2013 that mention vector representations.

See all papers in Proc. ACL that mention vector representations.

Back to top.

recursive neural

Appears in 8 sentences as: Recursive Neural (3) recursive neural (5)
In Parsing with Compositional Vector Grammars
  1. Instead, we introduce a Compositional Vector Grammar (CVG), which combines PCFGs with a syntactically untied recursive neural network that learns syntactico-semantic, compositional vector representations.
    Page 1, “Abstract”
  2. The vectors for nonterminals are computed via a new type of recursive neural network which is conditioned on syntactic categories from a PCFG.
    Page 1, “Introduction”
  3. l. CVGs combine the advantages of standard probabilistic context free grammars (PCFG) with those of recursive neural networks (RNNs).
    Page 2, “Introduction”
  4. (2003) apply recursive neural networks to re-rank possible phrase attachments in an incremental parser.
    Page 3, “Introduction”
  5. Standard Recursive Neural Network
    Page 5, “Introduction”
  6. Figure 2: An example tree with a simple Recursive Neural Network: The same weight matrix is replicated and used to compute all nonterminal node representations.
    Page 5, “Introduction”
  7. Syntactically Untied Recursive Neural Network
    Page 5, “Introduction”
  8. The compositional vectors are learned with a new syntactically untied recursive neural network.
    Page 9, “Introduction”

See all papers in Proc. ACL 2013 that mention recursive neural.

See all papers in Proc. ACL that mention recursive neural.

Back to top.

Recursive Neural Network

Appears in 8 sentences as: Recursive Neural Network (3) recursive neural network (3) recursive neural networks (2)
In Parsing with Compositional Vector Grammars
  1. Instead, we introduce a Compositional Vector Grammar (CVG), which combines PCFGs with a syntactically untied recursive neural network that learns syntactico-semantic, compositional vector representations.
    Page 1, “Abstract”
  2. The vectors for nonterminals are computed via a new type of recursive neural network which is conditioned on syntactic categories from a PCFG.
    Page 1, “Introduction”
  3. l. CVGs combine the advantages of standard probabilistic context free grammars (PCFG) with those of recursive neural networks (RNNs).
    Page 2, “Introduction”
  4. (2003) apply recursive neural networks to re-rank possible phrase attachments in an incremental parser.
    Page 3, “Introduction”
  5. Standard Recursive Neural Network
    Page 5, “Introduction”
  6. Figure 2: An example tree with a simple Recursive Neural Network : The same weight matrix is replicated and used to compute all nonterminal node representations.
    Page 5, “Introduction”
  7. Syntactically Untied Recursive Neural Network
    Page 5, “Introduction”
  8. The compositional vectors are learned with a new syntactically untied recursive neural network .
    Page 9, “Introduction”

See all papers in Proc. ACL 2013 that mention Recursive Neural Network.

See all papers in Proc. ACL that mention Recursive Neural Network.

Back to top.

beam search

Appears in 5 sentences as: beam search (5)
In Parsing with Compositional Vector Grammars
  1. The subsequent section will then describe a bottom-up beam search and its approximation for finding the optimal tree.
    Page 4, “Introduction”
  2. One could use a bottom-up beam search , keeping a k-best list at every cell of the chart, possibly for each syntactic category.
    Page 6, “Introduction”
  3. This beam search inference procedure is still considerably slower than using only the simplified base PCFG, especially since it has a small state space (see next section for
    Page 6, “Introduction”
  4. Then, the second pass is a beam search with the full CVG model (including the more expensive matrix multiplications of the SU-RNN).
    Page 6, “Introduction”
  5. This beam search only considers phrases that appear in the top 200 parses.
    Page 6, “Introduction”

See all papers in Proc. ACL 2013 that mention beam search.

See all papers in Proc. ACL that mention beam search.

Back to top.

highest scoring

Appears in 5 sentences as: highest score (1) highest scoring (4)
In Parsing with Compositional Vector Grammars
  1. This max-margin, structure-prediction objective (Taskar et al., 2004; Ratliff et al., 2007; Socher et al., 2011b) trains the CVG so that the highest scoring tree will be the correct tree: 9905,) = yi and its score will be larger up to a margin to other possible trees 3) 6 32cm):
    Page 4, “Introduction”
  2. Intuitively, to minimize this objective, the score of the correct tree 3/,- is increased and the score of the highest scoring incorrect tree 3) is decreased.
    Page 4, “Introduction”
  3. This score will be used to find the highest scoring tree.
    Page 5, “Introduction”
  4. To minimize the objective we want to increase the scores of the correct tree’s constituents and decrease the score of those in the highest scoring incorrect tree.
    Page 6, “Introduction”
  5. where gmax is the tree with the highest score .
    Page 7, “Introduction”

See all papers in Proc. ACL 2013 that mention highest scoring.

See all papers in Proc. ACL that mention highest scoring.

Back to top.

fine-grained

Appears in 4 sentences as: fine-grained (4)
In Parsing with Compositional Vector Grammars
  1. However, recent work has shown that parsing results can be greatly improved by defining more fine-grained syntactic
    Page 1, “Introduction”
  2. This gives a fine-grained notion of semantic similarity, which is useful for tackling problems like ambiguous attachment decisions.
    Page 1, “Introduction”
  3. The former can capture the discrete categorization of phrases into NP or PP while the latter can capture fine-grained syntactic and compositional-semantic information on phrases and words.
    Page 2, “Introduction”
  4. However, many parsing decisions show fine-grained semantic factors at work.
    Page 3, “Introduction”

See all papers in Proc. ACL 2013 that mention fine-grained.

See all papers in Proc. ACL that mention fine-grained.

Back to top.

lexicalized

Appears in 4 sentences as: lexicalized (3) lexicalizing (1)
In Parsing with Compositional Vector Grammars
  1. Natural language parsing has typically been done with small sets of discrete categories such as NP and VP, but this representation does not capture the full syntactic nor semantic richness of linguistic phrases, and attempts to improve on this by lexicalizing phrases or splitting categories only partly address the problem at the cost of huge feature spaces and sparseness.
    Page 1, “Abstract”
  2. Second, lexicalized parsers (Collins, 2003; Charniak, 2000) associate each category with a lexical item.
    Page 1, “Introduction”
  3. However, this approach necessitates complex shrinkage estimation schemes to deal with the sparsity of observations of the lexicalized categories.
    Page 1, “Introduction”
  4. Another approach is lexicalized parsers (Collins, 2003; Chamiak, 2000) that describe each category with a lexical item, usually the head word.
    Page 2, “Introduction”

See all papers in Proc. ACL 2013 that mention lexicalized.

See all papers in Proc. ACL that mention lexicalized.

Back to top.

natural language

Appears in 4 sentences as: Natural language (1) natural language (3)
In Parsing with Compositional Vector Grammars
  1. Natural language parsing has typically been done with small sets of discrete categories such as NP and VP, but this representation does not capture the full syntactic nor semantic richness of linguistic phrases, and attempts to improve on this by lexicalizing phrases or splitting categories only partly address the problem at the cost of huge feature spaces and sparseness.
    Page 1, “Abstract”
  2. Syntactic parsing is a central task in natural language processing because of its importance in mediating between linguistic expression and meaning.
    Page 1, “Introduction”
  3. In many natural language systems, single words and n-grams are usefully described by their distributional similarities (Brown et al., 1992), among many others.
    Page 1, “Introduction”
  4. However, their model is lacking in that it cannot represent the recursive structure inherent in natural language .
    Page 2, “Introduction”

See all papers in Proc. ACL 2013 that mention natural language.

See all papers in Proc. ACL that mention natural language.

Back to top.

POS tags

Appears in 4 sentences as: POS tags (4)
In Parsing with Compositional Vector Grammars
  1. 3.1 and the POS tags come from a PCFG.
    Page 4, “Introduction”
  2. The standard RNN essentially ignores all POS tags and syntactic categories and each nonterminal node is associated with the same neural network (i.e., the weights across nodes are fully tied).
    Page 4, “Introduction”
  3. While this results in a powerful composition function that essentially depends on the words being combined, the number of model parameters explodes and the composition functions do not capture the syntactic commonalities between similar POS tags or syntactic categories.
    Page 5, “Introduction”
  4. This reduces the number of states from 15,276 to 12,061 states and 602 POS tags .
    Page 7, “Introduction”

See all papers in Proc. ACL 2013 that mention POS tags.

See all papers in Proc. ACL that mention POS tags.

Back to top.

Deep Learning

Appears in 3 sentences as: Deep Learning (2) deep learning (2)
In Parsing with Compositional Vector Grammars
  1. sets of discrete states and recursive deep learning models that jointly learn classifiers and continuous feature representations for variable-sized inputs.
    Page 2, “Introduction”
  2. Deep Learning and Recursive Deep Learning Early attempts at using neural networks to describe phrases include Elman (1991), who used recurrent neural networks to create representations of sentences from a simple toy grammar and to analyze the linguistic expressiveness of the resulting representations.
    Page 2, “Introduction”
  3. The idea of untying has also been successfully used in deep learning applied to vision (Le et al., 2010).
    Page 3, “Introduction”

See all papers in Proc. ACL 2013 that mention Deep Learning.

See all papers in Proc. ACL that mention Deep Learning.

Back to top.

n-grams

Appears in 3 sentences as: n-grams (3)
In Parsing with Compositional Vector Grammars
  1. In many natural language systems, single words and n-grams are usefully described by their distributional similarities (Brown et al., 1992), among many others.
    Page 1, “Introduction”
  2. n-grams will never be seen during training, especially when n is large.
    Page 2, “Introduction”
  3. In this work, we present a new solution to learn features and phrase representations even for very long, unseen n-grams .
    Page 2, “Introduction”

See all papers in Proc. ACL 2013 that mention n-grams.

See all papers in Proc. ACL that mention n-grams.

Back to top.

objective function

Appears in 3 sentences as: objective function (3)
In Parsing with Compositional Vector Grammars
  1. We will first briefly introduce single word vector representations and then describe the CVG objective function , tree scoring and inference.
    Page 3, “Introduction”
  2. The main objective function in Eq.
    Page 6, “Introduction”
  3. The objective function is not differentiable due to the hinge loss.
    Page 7, “Introduction”

See all papers in Proc. ACL 2013 that mention objective function.

See all papers in Proc. ACL that mention objective function.

Back to top.

parse tree

Appears in 3 sentences as: parse tree (2) parse trees (1)
In Parsing with Compositional Vector Grammars
  1. The goal of supervised parsing is to learn a function 9 : 26 —> y, where X is the set of sentences and y is the set of all possible labeled binary parse trees .
    Page 4, “Introduction”
  2. The loss increases the more incorrect the proposed parse tree is (Goodman, 1998).
    Page 4, “Introduction”
  3. Assume, for now, we are given a labeled parse tree as shown in Fig.
    Page 4, “Introduction”

See all papers in Proc. ACL 2013 that mention parse tree.

See all papers in Proc. ACL that mention parse tree.

Back to top.