Index of papers in Proc. ACL 2014 that mention

**neural network**

Abstract | Neural network language models are often trained by optimizing likelihood, but we would prefer to optimize for a task specific metric, such as BLEU in machine translation. |

Abstract | We show how a recurrent neural network language model can be optimized towards an expected BLEU loss instead of the usual cross-entropy criterion. |

Expected BLEU Training | We integrate the recurrent neural network language model as an additional feature into the standard log-linear framework of translation (Och, 2003). |

Expected BLEU Training | We summarize the weights of the recurrent neural network language model as 6 = {U, W, V} and add the model as an additional feature to the log-linear translation model using the simplified notation 89(10):) 2 8(wt|w1...wt_1,ht_1): |

Introduction | In this paper we focus on recurrent neural network architectures which have recently advanced the state of the art in language modeling (Mikolov et al., 2010; Mikolov et al., 2011; Sundermeyer et al., 2013) with several subsequent applications in machine translation (Auli et al., 2013; Kalchbrenner and Blunsom, 2013; Hu et al., 2014). |

Introduction | In practice, neural network models for machine translation are usually trained by maximizing the likelihood of the training data, either via a cross-entropy objective (Mikolov et al., 2010; Schwenk |

Introduction | Most previous work on neural networks for machine translation is based on a rescoring setup (Arisoy et al., 2012; Mikolov, 2012; Le et al., 2012a; Auli et al., 2013), thereby side stepping the algorithmic and engineering challenges of direct decoder-integration. |

Recurrent Neural Network LMs | Our model has a similar structure to the recurrent neural network language model of Mikolov et al. |

neural network is mentioned in 23 sentences in this paper.

Topics mentioned in this paper:

- BLEU (25)
- neural network (23)
- language model (16)

Abstract | In this paper, we propose a novel approach to learning topic representation for parallel data using a neural network architecture, where abundant topical contexts are embedded via topic relevant monolingual data. |

Background: Deep Learning | This technique began raising public awareness in the mid-2000s after researchers showed how a multilayer feed-forward neural network can be effectively trained. |

Introduction | These topic-related documents are utilized to learn a specific topic representation for each sentence using a neural network based approach. |

Introduction | Neural network is an effective technique for learning different levels of data representations. |

Introduction | The levels inferred from neural network correspond to distinct levels of concepts, where high-level representations are obtained from low-level bag-of-words input. |

Topic Similarity Model with Neural Network | In this section, we explain our neural network based topic similarity model in detail, as well as how to incorporate the topic similarity features into SMT decoding procedure. |

Topic Similarity Model with Neural Network | Neural Network Training |

Topic Similarity Model with Neural Network | Figure 1: Overview of neural network based topic similarity model. |

neural network is mentioned in 36 sentences in this paper.

Topics mentioned in this paper:

- neural network (36)
- parallel data (15)
- sentence pair (13)

Abstract | Recent work has shown success in using neural network language models (NNLMs) as features in MT systems. |

Abstract | Here, we present a novel formulation for a neural network joint model (NNJM), which augments the NNLM with a source context window. |

Introduction | In recent years, neural network models have become increasingly popular in NLP. |

Introduction | Initially, these models were primarily used to create n-gram neural network language models (NNLMs) for speech recognition and machine translation (Bengio et al., 2003; Schwenk, 2010). |

Introduction | In this paper we use a basic neural network architecture and a lexicalized probability model to create a powerful MT decoding feature. |

Neural Network Joint Model (NNJ M) | Fortunately, neural network language models are able to elegantly scale up and take advantage of arbitrarily large context sizes. |

Neural Network Joint Model (NNJ M) | 2.1 Neural Network Architecture |

Neural Network Joint Model (NNJ M) | Our neural network architecture is almost identical to the original feed-forward NNLM architecture described in Bengio et al. |

neural network is mentioned in 33 sentences in this paper.

Topics mentioned in this paper:

- BLEU (36)
- neural network (33)
- hidden layer (17)

Abstract | We describe a convolutional architecture dubbed the Dynamic Convolutional Neural Network (DCNN) that we adopt for the semantic modelling of sentences. |

Background | Then we describe the operation of one-dimensional convolution and the classical Time-Delay Neural Network (TDNN) (Hinton, 1989; Waibel et al., 1990). |

Background | A model that adopts a more general structure provided by an external parse tree is the Recursive Neural Network (RecNN) (Pollack, 1990; Kiichler and Goller, 1996; Socher et al., 2011; Hermann and Blunsom, 2013). |

Background | The Recurrent Neural Network (RNN) is a special case of the recursive network where the structure that is followed is a simple linear chain (Gers and Schmidhuber, 2001; Mikolov et al., 2011). |

Introduction | Figure 1: Subgraph of a feature graph induced over an input sentence in a Dynamic Convolutional Neural Network . |

Introduction | A central class of models are those based on neural networks . |

Introduction | These range from basic neural bag-of-words or bag-of-n-grams models to the more structured recursive neural networks and to time-delay neural networks based on convolutional operations (Collobert and Weston, 2008; Socher et al., 2011; Kalchbrenner and Blunsom, 2013b). |

neural network is mentioned in 17 sentences in this paper.

Topics mentioned in this paper:

- Neural Network (17)
- n-grams (11)
- unigram (9)

Abstract | In this paper, we propose a novel recursive recurrent neural network (RZNN) to model the end-to-end decoding process for statistical machine translation. |

Abstract | RZNN is a combination of recursive neural network and recurrent neural network, and in turn integrates their respective capabilities: (1) new information can be used to generate the next hidden state, like recurrent neural networks, so that language model and translation model can be integrated naturally; (2) a tree structure can be built, as recursive neural networks , so as to generate the translation candidates in a bottom up manner. |

Introduction | Deep Neural Network (DNN), which essentially is a multilayer neural network , has regained more and more attentions these years. |

Introduction | Recurrent neural networks are leveraged to learn language model, and they keep the history information circularly inside the network for arbitrarily long time (Mikolov et al., 2010). |

Introduction | Recursive neural networks , which have the ability to generate a tree structured output, are applied to natural language parsing (Socher et al., 2011), and they are extended to recursive neural tensor networks to explore the compositional aspect of semantics (Socher et al., 2013). |

neural network is mentioned in 48 sentences in this paper.

Topics mentioned in this paper:

- neural network (48)
- recursive (37)
- phrase pair (36)

Abstract | The representation is integrated as features into a neural network that serves as a scorer for an easy-first POS tagger. |

Abstract | Parameters of the neural network are trained using guided learning in the second phase. |

Easy-first POS tagging with Neural Network | The neural network proposed in Section 3 is used for POS disambiguation by the easy-first POS tagger. |

Easy-first POS tagging with Neural Network | At each step, the algorithm adopts a scorer, the neural network in our case, to assign a score to each possible word-tag pair (212, t) , and then selects the highest score one (If), f) to tag (i.e., tag 21“) with f). |

Easy-first POS tagging with Neural Network | While previous work (Shen et al., 2007; Zhang and Clark, 2011; Goldberg and Elhadad, 2010) apply guided learning to train a linear classifier by using variants of the percep-tron algorithm, we are the first to combine guided learning with a neural network , by using a margin loss and a modified back-propagation algorithm. |

Introduction | We integrate the learned encoder with a set of well-established features for POS tagging (Ratnaparkhi, 1996; Collins, 2002) in a single neural network , which is applied as a scorer to an easy-first POS tagger. |

Introduction | To our knowledge, we are the first to investigate guided learning for neural networks . |

Neural Network for POS Disambiguation | We integrate the learned WRRBM into a neural network , which serves as a scorer for POS disambiguation. |

Neural Network for POS Disambiguation | The main challenge to designing the neural network structure is: on the one hand, we hope that the model can take the advantage of information provided by the learned WRRBM, which reflects general properties of web texts, so that the model generalizes well in the web domain; on the other hand, we also hope to improve the model’s discriminative power by utilizing well-established POS tagging features, such as those of Ratnaparkhi (1996). |

Neural Network for POS Disambiguation | Our approach is to leverage the two sources of information in one neural network by combining them though a shared output layer, as shown in Figure 1. |

neural network is mentioned in 15 sentences in this paper.

Topics mentioned in this paper:

- POS tagging (20)
- neural network (15)
- unlabelled data (11)

Abstract | Recently, neural network models for natural language processing tasks have been increasingly focused on for their ability to alleviate the burden of manual feature engineering. |

Abstract | In this paper, we propose a novel neural network model for Chinese word segmentation called Max-Margin Tensor Neural Network (MMTNN). |

Abstract | Experiments on the benchmark dataset show that our model achieves better performances than previous neural network models and that our model can achieve a competitive performance with minimal feature engineering. |

Introduction | Recently, neural network models have been increasingly focused on for their ability to minimize the effort in feature engineering. |

Introduction | Workable as previous neural network models seem, a limitation of them to be pointed out is that the tag-tag interaction, tag-character interaction and character-character interaction are not well modeled. |

Introduction | In previous neural network models, however, hardly can such interactional effects be fully captured relying only on the simple transition score and the single nonlinear transformation (See section 2). |

neural network is mentioned in 39 sentences in this paper.

Topics mentioned in this paper:

- neural network (39)
- embeddings (33)
- word segmentation (19)

Abstract | Specifically, we develop three neural networks to effectively incorporate the supervision from sentiment polarity of text (e.g. |

Introduction | To this end, we extend the existing word embedding learning algorithm (Collobert et al., 2011) and develop three neural networks to effectively incorporate the supervision from sentiment polarity of text (e.g. |

Introduction | 0 We develop three neural networks to learn sentiment-specific word embedding (SSWE) from massive distant-supervised tweets without any manual annotations; |

Related Work | propose Recursive Neural Network (RNN) (2011b), matrix-vector RNN (2012) and Recursive Neural Tensor Network (RNTN) (2013b) to learn the compositionality of phrases of any length based on the representation of each pair of children recursively. |

Related Work | (2011) that follow the probabilistic document model (Blei et al., 2003) and give an sentiment predictor function to each word, we develop neural networks and map each ngram to the sentiment polarity of sentence. |

Related Work | We extend the existing word embedding learning algorithm (Collobert et al., 2011) and develop three neural networks to learn SSWE. |

neural network is mentioned in 16 sentences in this paper.

Topics mentioned in this paper:

- sentiment classification (52)
- word embedding (46)
- neural networks (16)

Abstract | This model learns the syntax and semantics of the negator’s argument with a recursive neural network . |

Experimental results | Furthermore, modeling the syntax and semantics with the state-of-the-art recursive neural network (model 7 and 8) can dramatically improve the performance over model 6. |

Experimental results | Note that the two neural network based models incorporate the syntax and semantics by representing each node with a vector. |

Experimental results | Note that this is a special case of what the neural network based models can model. |

Introduction | This model learns the syntax and semantics of the negator’s argument with a recursive neural network . |

Related work | The more recent work of (Socher et al., 2012; Socher et al., 2013) proposed models based on recursive neural networks that do not rely on any heuristic rules. |

Related work | In principle neural network is able to fit very complicated functions (Mitchell, 1997), and in this paper, we adapt the state-of-the-art approach described in (Socher et al., 2013) to help understand the behavior of negators specifically. |

Semantics-enriched modeling | A recursive neural tensor network (RNTN) is a specific form of feed-forward neural network based on syntactic (phrasal-structure) parse tree to conduct compositional sentiment analysis. |

Semantics-enriched modeling | A major difference of RNTN from the conventional recursive neural network (RRN) (Socher et al., 2012) is the use of the tensor V in order to directly capture the multiplicative interaction of two input vectors, although the matrix W implicitly captures the nonlinear interaction between the input vectors. |

Semantics-enriched modeling | This is actually an interesting place to extend the current recursive neural network to consider extrinsic knowledge. |

neural network is mentioned in 12 sentences in this paper.

Topics mentioned in this paper:

- neural network (12)
- treebank (12)
- recursive (10)

Abstract | We propose Adaptive Recursive Neural Network (AdaRNN) for target-dependent Twitter sentiment classification. |

Conclusion | We propose Adaptive Recursive Neural Network (AdaRNN) for the target-dependent Twitter sentiment classification. |

Introduction | In this paper, we mainly focus on integrating target information with Recursive Neural Network (RNN) to leverage the ability of deep learning models. |

Introduction | We employ a novel adaptive multi-compositionality layer in recursive neural network , which is named as AdaRNN (Dong et al., 2014). |

Our Approach | Adaptive Recursive Neural Network is proposed to propagate the sentiments of words to the target node. |

Our Approach | In Section 3.2, we propose Adaptive Recursive Neural Network and use it for target-dependent sentiment analysis. |

Our Approach | 3.2 AdaRNN: Adaptive Recursive Neural Network |

RNN: Recursive Neural Network | Figure l: The composition process for “not very good” in Recursive Neural Network . |

neural network is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

- Recursive (14)
- Recursive Neural (11)
- dependency tree (8)

Conclusion | The neural network architecture emerged as the best performing approach, and our qualitative analysis revealed that it induced a categorical organization of concepts. |

Conclusion | Given the success of NN, we plan to experiment in the future with more sophisticated neural network architectures inspired by recent work in machine translation (Gao et al., 2013) and multimodal deep learning (Srivastava and Salakhut-dinov, 2012). |

Experimental Setup | Neural Network (NNet) The last model that we introduce is a neural network with one hidden layer. |

Introduction | This is achieved by means of a simple neural network trained to project image-extracted feature vectors to text-based vectors through a hidden layer that can be interpreted as a cross-modal semantic space. |

Related Work | (2013) learn to project unsupervised vector-based image representations onto a word-based semantic space using a neural network architecture. |

Related Work | (2013) rely on a supervised state-of-the-art method: They feed low-level features to a deep neural network trained on a supervised object recognition task (Krizhevsky et al., 2012). |

Results | For the neural network NN, we use prior knowledge |

neural network is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

- semantic space (18)
- neural network (7)
- SVD (7)

Conclusion and Future Work | A semantic similarity graph is built to capture lexical semantic clue, and a convolutional neural network is used to encode contextual semantic clue. |

Experiments | F W-5 uses a traditional neural network with a fixed window size of 5 to replace the CNN in CONT, and the candidate term to be classified is placed in the center of the window. |

The Proposed Method | Then, a semantic similarity graph is created to capture lexical semantic clue, and a Convolutional Neural Network (CNN) (Collobert et al., 2011) is trained in each bootstrapping iteration to encode contextual semantic clue. |

The Proposed Method | 3.3 Encoding Contextual Semantic Clue Using Convolutional Neural Network |

The Proposed Method | 3.3.1 The architecture of the Convolutional Neural Network |

neural network is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

- lexical semantic (17)
- semantic similarity (17)
- word embedding (10)

Autoencoders for Grounded Semantics | Autoencoders An autoencoder is an unsupervised neural network which is trained to reconstruct a given input from its latent representation (Bengio, 2009). |

Autoencoders for Grounded Semantics | Stacked Autoencoders Several (denoising) autoencoders can be used as building blocks to form a deep neural network (Bengio et al., 2007; Vincent et al., 2010). |

Conclusions | To the best of our knowledge, our model is novel in its use of attribute-based input in a deep neural network . |

Experimental Setup | Finally, we also compare to the word embeddings obtained using Mikolov et al.’s (2011) recurrent neural network based language model. |

Related Work | A large body of work has focused on projecting words and images into a common space using a variety of deep learning methods ranging from deep and restricted Boltzman machines (Srivastava and Salakhutdinov, 2012; Feng et al., 2013), to autoencoders (Wu et al., 2013), and recursive neural networks (Socher et al., 2013b). |

Related Work | Secondly, our problem setting is different from the former studies, which usually deal with classification tasks and fine-tune the deep neural networks using training data with explicit class labels; in contrast we fine-tune our autoencoders using a semi-supervised criterion. |

neural network is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

- meaning representations (11)
- SVD (8)
- semantic similarity (7)

Abstract | Taking inspiration from recent work in sentiment analysis that successfully models the compositional aspect of language, we apply a recursive neural network (RNN) framework to the task of identifying the political position evinced by a sentence. |

Conclusion | In this paper we apply recursive neural networks to political ideology detection, a problem where previous work relies heavily on bag-of-words models and hand-designed lexica. |

Introduction | Building from those insights, we introduce a recursive neural network (RNN) to detect ideological bias on the sentence level. |

Recursive Neural Networks | Recursive neural networks (RNNs) are machine learning models that capture syntactic and semantic composition. |

neural network is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

- bag-of-words (7)
- sentence-level (6)
- embeddings (4)

Introduction | In this paper, we strive to effectively address the above two shortcomings, and systematically explore the possibility of learning new features using deep (multilayer) neural networks (DNN, which is usually referred under the name Deep Learning) for SMT. |

Related Work | (2013) presented a joint language and translation model based on a recurrent neural network which predicts target words based on an unbounded history of both source and target words. |

Related Work | (2013) went beyond the log-linear model for SMT and proposed a novel additive neural networks based translation model, which overcome some of the shortcomings suffered by the log-linear model: linearity and the lack of deep interpretation and representation in features. |

neural network is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

- phrase pair (13)
- semi-supervised (12)
- phrase table (11)

Abstract | This study proposes a word alignment model based on a recurrent neural network (RNN), in which an unlimited alignment history is represented by recurrently connected hidden layers. |

Introduction | (2013) adapted the Context-Dependent Deep Neural Network for HMM (CD-DNN-HMM) (Dahl et al., 2012), a type of feed-forward neural network (FFNN)-based model, to |

Introduction | Recurrent neural network (RNN)-based models have recently demonstrated state-of-the-art performance that outperformed FFNN-based models for various tasks (Mikolov et al., 2010; Mikolov and Zweig, 2012; Auli et al., 2013; Kalchbrenner and Blunsom, 2013; Sundermeyer et al., 2013). |

neural network is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

- word alignment (28)
- alignment model (25)
- hidden layer (25)