Index of papers in Proc. ACL that mention

**neural network**

Abstract | This model learns the syntax and semantics of the negator’s argument with a recursive neural network . |

Experimental results | Furthermore, modeling the syntax and semantics with the state-of-the-art recursive neural network (model 7 and 8) can dramatically improve the performance over model 6. |

Experimental results | Note that the two neural network based models incorporate the syntax and semantics by representing each node with a vector. |

Experimental results | Note that this is a special case of what the neural network based models can model. |

Introduction | This model learns the syntax and semantics of the negator’s argument with a recursive neural network . |

Related work | The more recent work of (Socher et al., 2012; Socher et al., 2013) proposed models based on recursive neural networks that do not rely on any heuristic rules. |

Related work | In principle neural network is able to fit very complicated functions (Mitchell, 1997), and in this paper, we adapt the state-of-the-art approach described in (Socher et al., 2013) to help understand the behavior of negators specifically. |

Semantics-enriched modeling | A recursive neural tensor network (RNTN) is a specific form of feed-forward neural network based on syntactic (phrasal-structure) parse tree to conduct compositional sentiment analysis. |

Semantics-enriched modeling | A major difference of RNTN from the conventional recursive neural network (RRN) (Socher et al., 2012) is the use of the tensor V in order to directly capture the multiplicative interaction of two input vectors, although the matrix W implicitly captures the nonlinear interaction between the input vectors. |

Semantics-enriched modeling | This is actually an interesting place to extend the current recursive neural network to consider extrinsic knowledge. |

neural network is mentioned in 12 sentences in this paper.

Topics mentioned in this paper:

- neural network (12)
- treebank (12)
- recursive (10)

Abstract | Specifically, we develop three neural networks to effectively incorporate the supervision from sentiment polarity of text (e.g. |

Introduction | To this end, we extend the existing word embedding learning algorithm (Collobert et al., 2011) and develop three neural networks to effectively incorporate the supervision from sentiment polarity of text (e.g. |

Introduction | 0 We develop three neural networks to learn sentiment-specific word embedding (SSWE) from massive distant-supervised tweets without any manual annotations; |

Related Work | propose Recursive Neural Network (RNN) (2011b), matrix-vector RNN (2012) and Recursive Neural Tensor Network (RNTN) (2013b) to learn the compositionality of phrases of any length based on the representation of each pair of children recursively. |

Related Work | (2011) that follow the probabilistic document model (Blei et al., 2003) and give an sentiment predictor function to each word, we develop neural networks and map each ngram to the sentiment polarity of sentence. |

Related Work | We extend the existing word embedding learning algorithm (Collobert et al., 2011) and develop three neural networks to learn SSWE. |

neural network is mentioned in 16 sentences in this paper.

Topics mentioned in this paper:

- sentiment classification (52)
- word embedding (46)
- neural networks (16)

Abstract | We present a new neural network architecture which 1) learns word embeddings that better capture the semantics of words by incorporating both local and global document context, and 2) accounts for homonymy and polysemy by learning multiple embeddings per word. |

Experiments | We use 10-word windows of text as the local context, 100 hidden units, and no weight regularization for both neural networks . |

Global Context-Aware Neural Language Model | In this section, we describe the training objective of our model, followed by a description of the neural network architecture, ending with a brief description of our model’s training method. |

Global Context-Aware Neural Language Model | We compute scores 9(8, d) and g(sw, d) where 8w is s with the last word replaced by word w, and g(-, is the scoring function that represents the neural networks used. |

Global Context-Aware Neural Language Model | 2.2 Neural Network Architecture |

neural network is mentioned in 13 sentences in this paper.

Topics mentioned in this paper:

- embeddings (19)
- neural network (13)
- word embeddings (13)

Abstract | Recently, neural network models for natural language processing tasks have been increasingly focused on for their ability to alleviate the burden of manual feature engineering. |

Abstract | In this paper, we propose a novel neural network model for Chinese word segmentation called Max-Margin Tensor Neural Network (MMTNN). |

Abstract | Experiments on the benchmark dataset show that our model achieves better performances than previous neural network models and that our model can achieve a competitive performance with minimal feature engineering. |

Introduction | Recently, neural network models have been increasingly focused on for their ability to minimize the effort in feature engineering. |

Introduction | Workable as previous neural network models seem, a limitation of them to be pointed out is that the tag-tag interaction, tag-character interaction and character-character interaction are not well modeled. |

Introduction | In previous neural network models, however, hardly can such interactional effects be fully captured relying only on the simple transition score and the single nonlinear transformation (See section 2). |

neural network is mentioned in 39 sentences in this paper.

Topics mentioned in this paper:

- neural network (39)
- embeddings (33)
- word segmentation (19)

Abstract | In this paper we present an alternative method to Tiered Tagging, based on local optimizations with Neural Networks and we show how, by properly encoding the input sequence in a general Neural Network architecture, we achieve results similar to the Tiered Tagging methodology, significantly faster and without requiring extensive linguistic knowledge as implied by the previously mentioned method. |

Abstract | In this article, we propose an alternative solution based on local optimizations with feed-forward neural networks . |

Abstract | 2 Large tagset part-of—speech tagging with feed-forward neural networks |

neural network is mentioned in 18 sentences in this paper.

Topics mentioned in this paper:

- hidden layer (20)
- neural network (18)
- data sparseness (3)

Abstract | The representation is integrated as features into a neural network that serves as a scorer for an easy-first POS tagger. |

Abstract | Parameters of the neural network are trained using guided learning in the second phase. |

Easy-first POS tagging with Neural Network | The neural network proposed in Section 3 is used for POS disambiguation by the easy-first POS tagger. |

Easy-first POS tagging with Neural Network | At each step, the algorithm adopts a scorer, the neural network in our case, to assign a score to each possible word-tag pair (212, t) , and then selects the highest score one (If), f) to tag (i.e., tag 21“) with f). |

Easy-first POS tagging with Neural Network | While previous work (Shen et al., 2007; Zhang and Clark, 2011; Goldberg and Elhadad, 2010) apply guided learning to train a linear classifier by using variants of the percep-tron algorithm, we are the first to combine guided learning with a neural network , by using a margin loss and a modified back-propagation algorithm. |

Introduction | We integrate the learned encoder with a set of well-established features for POS tagging (Ratnaparkhi, 1996; Collins, 2002) in a single neural network , which is applied as a scorer to an easy-first POS tagger. |

Introduction | To our knowledge, we are the first to investigate guided learning for neural networks . |

Neural Network for POS Disambiguation | We integrate the learned WRRBM into a neural network , which serves as a scorer for POS disambiguation. |

Neural Network for POS Disambiguation | The main challenge to designing the neural network structure is: on the one hand, we hope that the model can take the advantage of information provided by the learned WRRBM, which reflects general properties of web texts, so that the model generalizes well in the web domain; on the other hand, we also hope to improve the model’s discriminative power by utilizing well-established POS tagging features, such as those of Ratnaparkhi (1996). |

Neural Network for POS Disambiguation | Our approach is to leverage the two sources of information in one neural network by combining them though a shared output layer, as shown in Figure 1. |

neural network is mentioned in 15 sentences in this paper.

Topics mentioned in this paper:

- POS tagging (20)
- neural network (15)
- unlabelled data (11)

Abstract | A neural network is a reasonable method to address these pitfalls. |

Abstract | However, modeling SMT with a neural network is not trivial, especially when taking the decoding efficiency into consideration. |

Abstract | In this paper, we propose a variant of a neural network , i.e. |

Introduction | A neural network (Bishop, 1995) is a reasonable method to overcome the above shortcomings. |

Introduction | In the search procedure, frequent computation of the model score is needed for the search heuristic function, which will be challenged by the decoding efficiency for the neural network based translation model. |

Introduction | In this paper, we propose a variant of neural networks , i.e. |

neural network is mentioned in 34 sentences in this paper.

Topics mentioned in this paper:

- log-linear (34)
- neural network (34)
- log-linear model (27)

Abstract | Instead, we introduce a Compositional Vector Grammar (CVG), which combines PCFGs with a syntactically untied recursive neural network that learns syntactico-semantic, compositional vector representations. |

Introduction | The vectors for nonterminals are computed via a new type of recursive neural network which is conditioned on syntactic categories from a PCFG. |

Introduction | l. CVGs combine the advantages of standard probabilistic context free grammars (PCFG) with those of recursive neural networks (RNNs). |

Introduction | This requires the composition function to be extremely powerful, since it has to combine phrases with different syntactic head words, and it is hard to optimize since the parameters form a very deep neural network . |

neural network is mentioned in 18 sentences in this paper.

Topics mentioned in this paper:

- neural network (18)
- recursive (11)
- vector representations (9)

Abstract | In this paper, we explore a novel bilingual word alignment approach based on DNN (Deep Neural Network ), which has been proven to be very effective in various machine learning tasks (Collobert et al., 2011). |

DNN structures for NLP | The lookup process is called a lookup layer LT , which is usually the first layer after the input layer in neural network . |

DNN structures for NLP | Multilayer neural networks are trained with the standard back propagation algorithm (LeCun, 1985). |

DNN structures for NLP | Techniques such as layerwise pre-training(Bengio et al., 2007) and many tricks(LeCun et al., 1998) have been developed to train better neural networks . |

Introduction | Recent years research communities have seen a strong resurgent interest in modeling with deep (multilayer) neural networks . |

Introduction | For speech recognition, (Dahl et al., 2012) proposed context-dependent neural network with large vocabulary, which achieved 16.0% relative error reduction. |

Introduction | (Collobert et al., 2011) and (Socher et al., 2011) further apply Recursive Neural Networks to address the structural prediction tasks such as tagging and parsing, and (Socher et al., 2012) explores the compositional aspect of word representations. |

Related Work | (Seide et al., 2011) and (Dahl et al., 2012) apply Context-Dependent Deep Neural Network with HMM (CD-DNN-HMM) to speech recognition task, which significantly outperforms traditional models. |

Related Work | (Bengio et al., 2006) proposed to use multilayer neural network for language modeling task. |

neural network is mentioned in 33 sentences in this paper.

Topics mentioned in this paper:

- neural network (33)
- word alignment (28)
- word embeddings (23)

Abstract | Neural network language models are often trained by optimizing likelihood, but we would prefer to optimize for a task specific metric, such as BLEU in machine translation. |

Abstract | We show how a recurrent neural network language model can be optimized towards an expected BLEU loss instead of the usual cross-entropy criterion. |

Expected BLEU Training | We integrate the recurrent neural network language model as an additional feature into the standard log-linear framework of translation (Och, 2003). |

Expected BLEU Training | We summarize the weights of the recurrent neural network language model as 6 = {U, W, V} and add the model as an additional feature to the log-linear translation model using the simplified notation 89(10):) 2 8(wt|w1...wt_1,ht_1): |

Introduction | In this paper we focus on recurrent neural network architectures which have recently advanced the state of the art in language modeling (Mikolov et al., 2010; Mikolov et al., 2011; Sundermeyer et al., 2013) with several subsequent applications in machine translation (Auli et al., 2013; Kalchbrenner and Blunsom, 2013; Hu et al., 2014). |

Introduction | In practice, neural network models for machine translation are usually trained by maximizing the likelihood of the training data, either via a cross-entropy objective (Mikolov et al., 2010; Schwenk |

Introduction | Most previous work on neural networks for machine translation is based on a rescoring setup (Arisoy et al., 2012; Mikolov, 2012; Le et al., 2012a; Auli et al., 2013), thereby side stepping the algorithmic and engineering challenges of direct decoder-integration. |

Recurrent Neural Network LMs | Our model has a similar structure to the recurrent neural network language model of Mikolov et al. |

neural network is mentioned in 23 sentences in this paper.

Topics mentioned in this paper:

- BLEU (25)
- neural network (23)
- language model (16)

Abstract | In this paper, we propose a novel approach to learning topic representation for parallel data using a neural network architecture, where abundant topical contexts are embedded via topic relevant monolingual data. |

Background: Deep Learning | This technique began raising public awareness in the mid-2000s after researchers showed how a multilayer feed-forward neural network can be effectively trained. |

Introduction | These topic-related documents are utilized to learn a specific topic representation for each sentence using a neural network based approach. |

Introduction | Neural network is an effective technique for learning different levels of data representations. |

Introduction | The levels inferred from neural network correspond to distinct levels of concepts, where high-level representations are obtained from low-level bag-of-words input. |

Topic Similarity Model with Neural Network | In this section, we explain our neural network based topic similarity model in detail, as well as how to incorporate the topic similarity features into SMT decoding procedure. |

Topic Similarity Model with Neural Network | Neural Network Training |

Topic Similarity Model with Neural Network | Figure 1: Overview of neural network based topic similarity model. |

neural network is mentioned in 36 sentences in this paper.

Topics mentioned in this paper:

- neural network (36)
- parallel data (15)
- sentence pair (13)

Abstract | Recent work has shown success in using neural network language models (NNLMs) as features in MT systems. |

Abstract | Here, we present a novel formulation for a neural network joint model (NNJM), which augments the NNLM with a source context window. |

Introduction | In recent years, neural network models have become increasingly popular in NLP. |

Introduction | Initially, these models were primarily used to create n-gram neural network language models (NNLMs) for speech recognition and machine translation (Bengio et al., 2003; Schwenk, 2010). |

Introduction | In this paper we use a basic neural network architecture and a lexicalized probability model to create a powerful MT decoding feature. |

Neural Network Joint Model (NNJ M) | Fortunately, neural network language models are able to elegantly scale up and take advantage of arbitrarily large context sizes. |

Neural Network Joint Model (NNJ M) | 2.1 Neural Network Architecture |

Neural Network Joint Model (NNJ M) | Our neural network architecture is almost identical to the original feed-forward NNLM architecture described in Bengio et al. |

neural network is mentioned in 33 sentences in this paper.

Topics mentioned in this paper:

- BLEU (36)
- neural network (33)
- hidden layer (17)

Abstract | In this paper, we propose a novel recursive recurrent neural network (RZNN) to model the end-to-end decoding process for statistical machine translation. |

Abstract | RZNN is a combination of recursive neural network and recurrent neural network, and in turn integrates their respective capabilities: (1) new information can be used to generate the next hidden state, like recurrent neural networks, so that language model and translation model can be integrated naturally; (2) a tree structure can be built, as recursive neural networks , so as to generate the translation candidates in a bottom up manner. |

Introduction | Deep Neural Network (DNN), which essentially is a multilayer neural network , has regained more and more attentions these years. |

Introduction | Recurrent neural networks are leveraged to learn language model, and they keep the history information circularly inside the network for arbitrarily long time (Mikolov et al., 2010). |

Introduction | Recursive neural networks , which have the ability to generate a tree structured output, are applied to natural language parsing (Socher et al., 2011), and they are extended to recursive neural tensor networks to explore the compositional aspect of semantics (Socher et al., 2013). |

neural network is mentioned in 48 sentences in this paper.

Topics mentioned in this paper:

- neural network (48)
- recursive (37)
- phrase pair (36)

Abstract | We describe a convolutional architecture dubbed the Dynamic Convolutional Neural Network (DCNN) that we adopt for the semantic modelling of sentences. |

Background | Then we describe the operation of one-dimensional convolution and the classical Time-Delay Neural Network (TDNN) (Hinton, 1989; Waibel et al., 1990). |

Background | A model that adopts a more general structure provided by an external parse tree is the Recursive Neural Network (RecNN) (Pollack, 1990; Kiichler and Goller, 1996; Socher et al., 2011; Hermann and Blunsom, 2013). |

Background | The Recurrent Neural Network (RNN) is a special case of the recursive network where the structure that is followed is a simple linear chain (Gers and Schmidhuber, 2001; Mikolov et al., 2011). |

Introduction | Figure 1: Subgraph of a feature graph induced over an input sentence in a Dynamic Convolutional Neural Network . |

Introduction | A central class of models are those based on neural networks . |

Introduction | These range from basic neural bag-of-words or bag-of-n-grams models to the more structured recursive neural networks and to time-delay neural networks based on convolutional operations (Collobert and Weston, 2008; Socher et al., 2011; Kalchbrenner and Blunsom, 2013b). |

neural network is mentioned in 17 sentences in this paper.

Topics mentioned in this paper:

- Neural Network (17)
- n-grams (11)
- unigram (9)

Abstract | We propose Adaptive Recursive Neural Network (AdaRNN) for target-dependent Twitter sentiment classification. |

Conclusion | We propose Adaptive Recursive Neural Network (AdaRNN) for the target-dependent Twitter sentiment classification. |

Introduction | In this paper, we mainly focus on integrating target information with Recursive Neural Network (RNN) to leverage the ability of deep learning models. |

Introduction | We employ a novel adaptive multi-compositionality layer in recursive neural network , which is named as AdaRNN (Dong et al., 2014). |

Our Approach | Adaptive Recursive Neural Network is proposed to propagate the sentiments of words to the target node. |

Our Approach | In Section 3.2, we propose Adaptive Recursive Neural Network and use it for target-dependent sentiment analysis. |

Our Approach | 3.2 AdaRNN: Adaptive Recursive Neural Network |

RNN: Recursive Neural Network | Figure l: The composition process for “not very good” in Recursive Neural Network . |

neural network is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

- Recursive (14)
- Recursive Neural (11)
- dependency tree (8)

Conclusion | The neural network architecture emerged as the best performing approach, and our qualitative analysis revealed that it induced a categorical organization of concepts. |

Conclusion | Given the success of NN, we plan to experiment in the future with more sophisticated neural network architectures inspired by recent work in machine translation (Gao et al., 2013) and multimodal deep learning (Srivastava and Salakhut-dinov, 2012). |

Experimental Setup | Neural Network (NNet) The last model that we introduce is a neural network with one hidden layer. |

Introduction | This is achieved by means of a simple neural network trained to project image-extracted feature vectors to text-based vectors through a hidden layer that can be interpreted as a cross-modal semantic space. |

Related Work | (2013) learn to project unsupervised vector-based image representations onto a word-based semantic space using a neural network architecture. |

Related Work | (2013) rely on a supervised state-of-the-art method: They feed low-level features to a deep neural network trained on a supervised object recognition task (Krizhevsky et al., 2012). |

Results | For the neural network NN, we use prior knowledge |

neural network is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

- semantic space (18)
- neural network (7)
- SVD (7)

Adding Linguistic Knowledge to the Monte-Carlo Framework | As shown in Figure 2, our model is a four layer neural network . |

Adding Linguistic Knowledge to the Monte-Carlo Framework | The third feature layer f of the neural network is deterministically computed given the active units 3/, and zj of the softmax layers, and the values of the input layer. |

Adding Linguistic Knowledge to the Monte-Carlo Framework | In the second layer of the neural network , the units 23’ represent a predicate labeling (5;- of every sentence yz- E d. However, our intention is to incorporate, into action-value function Q, information from only the most relevant sentence. |

Introduction | We employ a multilayer neural network where the hidden layers represent sentence relevance and predicate parsing decisions. |

neural network is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

- neural network (7)
- Hidden layer (5)
- dependency parse (4)

Conclusion and Future Work | A semantic similarity graph is built to capture lexical semantic clue, and a convolutional neural network is used to encode contextual semantic clue. |

Experiments | F W-5 uses a traditional neural network with a fixed window size of 5 to replace the CNN in CONT, and the candidate term to be classified is placed in the center of the window. |

The Proposed Method | Then, a semantic similarity graph is created to capture lexical semantic clue, and a Convolutional Neural Network (CNN) (Collobert et al., 2011) is trained in each bootstrapping iteration to encode contextual semantic clue. |

The Proposed Method | 3.3 Encoding Contextual Semantic Clue Using Convolutional Neural Network |

The Proposed Method | 3.3.1 The architecture of the Convolutional Neural Network |

neural network is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

- lexical semantic (17)
- semantic similarity (17)
- word embedding (10)

Autoencoders for Grounded Semantics | Autoencoders An autoencoder is an unsupervised neural network which is trained to reconstruct a given input from its latent representation (Bengio, 2009). |

Autoencoders for Grounded Semantics | Stacked Autoencoders Several (denoising) autoencoders can be used as building blocks to form a deep neural network (Bengio et al., 2007; Vincent et al., 2010). |

Conclusions | To the best of our knowledge, our model is novel in its use of attribute-based input in a deep neural network . |

Experimental Setup | Finally, we also compare to the word embeddings obtained using Mikolov et al.’s (2011) recurrent neural network based language model. |

Related Work | A large body of work has focused on projecting words and images into a common space using a variety of deep learning methods ranging from deep and restricted Boltzman machines (Srivastava and Salakhutdinov, 2012; Feng et al., 2013), to autoencoders (Wu et al., 2013), and recursive neural networks (Socher et al., 2013b). |

Related Work | Secondly, our problem setting is different from the former studies, which usually deal with classification tasks and fine-tune the deep neural networks using training data with explicit class labels; in contrast we fine-tune our autoencoders using a semi-supervised criterion. |

neural network is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

- meaning representations (11)
- SVD (8)
- semantic similarity (7)

Distributed representations | Word embeddings are typically induced using neural language models, which use neural networks as the underlying predictive model (Bengio, 2008). |

Distributed representations | We predict a score s(x) for x by passing e(x) through a single hidden layer neural network . |

Distributed representations | We minimize this loss stochastically over the n-grams in the corpus, doing gradient descent simultaneously over the neural network parameters and the embedding lookup table. |

neural network is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

- embeddings (51)
- word representations (46)
- NER (24)

Abstract | Taking inspiration from recent work in sentiment analysis that successfully models the compositional aspect of language, we apply a recursive neural network (RNN) framework to the task of identifying the political position evinced by a sentence. |

Conclusion | In this paper we apply recursive neural networks to political ideology detection, a problem where previous work relies heavily on bag-of-words models and hand-designed lexica. |

Introduction | Building from those insights, we introduce a recursive neural network (RNN) to detect ideological bias on the sentence level. |

Recursive Neural Networks | Recursive neural networks (RNNs) are machine learning models that capture syntactic and semantic composition. |

neural network is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

- bag-of-words (7)
- sentence-level (6)
- embeddings (4)

Discussion | Secondly, for the Holmes data, we can state that LSA total similarity beats the recurrent neural network , which in turn is better than the baseline n—gram model. |

Discussion | It is an interesting research question if this could be done implicitly with a machine learning technique, for example recurrent or recursive neural networks . |

Experimental Results 5.1 Data Resources | In this case, the best combination is to blend LSA, the Good—Turing language model, and the recurrent neural network . |

Introduction | Also in the language modeling vein, but with potentially global context, we evaluate the use of a recurrent neural network language model. |

neural network is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

- language model (27)
- development set (6)
- cosine similarity (5)

Previous work | (2004) used a Gaussian mixture model for acoustic-prosodic information and neural network based syntactic-prosodic model and achieved pitch accent detection accuracy of 84% and IPB detection accuracy of 90% at the word level. |

Previous work | The experiments of Ananthakrishnan and Narayanan (2008) with neural network based acoustic-prosodic model and a factored n-gram syntactic model reported 87% accuracy on accent and break index detection at the syllable level. |

Prosodic event detection method | Our previous supervised learning approach (Jeon and Liu, 2009) showed that a combined model using Neural Network (NN) classifier for acoustic-prosodic evidence and Support Vector Machine (SVM) classifier for syntactic-prosodic evidence performed better than other classifiers. |

neural network is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

- semi-supervised (10)
- unlabeled data (10)
- labeled data (5)

Introduction | In this paper, we strive to effectively address the above two shortcomings, and systematically explore the possibility of learning new features using deep (multilayer) neural networks (DNN, which is usually referred under the name Deep Learning) for SMT. |

Related Work | (2013) presented a joint language and translation model based on a recurrent neural network which predicts target words based on an unbounded history of both source and target words. |

Related Work | (2013) went beyond the log-linear model for SMT and proposed a novel additive neural networks based translation model, which overcome some of the shortcomings suffered by the log-linear model: linearity and the lack of deep interpretation and representation in features. |

neural network is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

- phrase pair (13)
- semi-supervised (12)
- phrase table (11)

Abstract | We propose a novel entity disambiguation model, based on Deep Neural Network (DNN). |

Introduction | Deep neural networks (Hinton et al., 2006; Bengio et al., 2007) are built in a hierarchical manner, and allow us to compare context and entity at some higher level abstraction; while at lower levels, general concepts are shared across entities, resulting in compact models. |

Learning Representation for Contextual Document | BTS is a variant of the general backpropagation algorithm for structured neural network . |

neural network is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

- similarity measure (7)
- “hidden layer (4)
- knowledge base (3)

Abstract | This study proposes a word alignment model based on a recurrent neural network (RNN), in which an unlimited alignment history is represented by recurrently connected hidden layers. |

Introduction | (2013) adapted the Context-Dependent Deep Neural Network for HMM (CD-DNN-HMM) (Dahl et al., 2012), a type of feed-forward neural network (FFNN)-based model, to |

Introduction | Recurrent neural network (RNN)-based models have recently demonstrated state-of-the-art performance that outperformed FFNN-based models for various tasks (Mikolov et al., 2010; Mikolov and Zweig, 2012; Auli et al., 2013; Kalchbrenner and Blunsom, 2013; Sundermeyer et al., 2013). |

neural network is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

- word alignment (28)
- alignment model (25)
- hidden layer (25)