Index of papers in Proc. ACL 2014 that mention
  • hidden layer
Devlin, Jacob and Zbib, Rabih and Huang, Zhongqiang and Lamar, Thomas and Schwartz, Richard and Makhoul, John
Introduction
When used in conjunction with a precomputed hidden layer , these techniques speed up NNJ M computation by a factor of 10,000X, with only a small reduction on MT accuracy.
Neural Network Joint Model (NNJ M)
We use two 512-dimensional hidden layers with tanh activation functions.
Neural Network Joint Model (NNJ M)
We chose these values for the hidden layer size, vocabulary size, and source window size because they seemed to work best on our data sets — larger sizes did not improve results, while smaller sizes degraded results.
Neural Network Joint Model (NNJ M)
2.4 Pre-Computing the Hidden Layer
hidden layer is mentioned in 17 sentences in this paper.
Topics mentioned in this paper:
Tamura, Akihiro and Watanabe, Taro and Sumita, Eiichiro
Abstract
This study proposes a word alignment model based on a recurrent neural network (RNN), in which an unlimited alignment history is represented by recurrently connected hidden layers .
Introduction
An RNN has a hidden layer with recurrent connections that propagates its own previous signals.
RNN-based Alignment Model
The model consists of a lookup layer, a hidden layer , and an output layer, which have weight
RNN-based Alignment Model
2Consecutive l hidden layers can be used: 2; = f (H; X 21—1 -|- B H1).
RNN-based Alignment Model
For simplicity, this paper describes the model with 1 hidden layer .
Related Work
21 Hidden Layer | htanh(H><20+BH) l
Related Work
Figure 1 shows the network structure with one hidden layer for computing a lexical translation probability tle$(fj, ea].
Related Work
The model consists of a lookup layer, a hidden layer , and an output layer, which have weight matrices.
hidden layer is mentioned in 25 sentences in this paper.
Topics mentioned in this paper:
Auli, Michael and Gao, Jianfeng
Decoder Integration
To solve this problem, we follow previous work on lattice rescoring with recurrent networks that maintained the usual n-gram context but kept a beam of hidden layer configurations at each state (Auli et al., 2013).
Decoder Integration
In fact, to make decoding as efficient as possible, we only keep the single best scoring hidden layer configuration.
Decoder Integration
As future cost estimate we score each phrase in isolation, resetting the hidden layer at the beginning of a phrase.
Experiments
The hidden layer uses 100 neurons unless otherwise stated.
Experiments
To keep training times manageable, we reduce the hidden layer size to 30 neurons, thereby greatly increasing speed.
Recurrent Neural Network LMs
(2010) which is factored into an input layer, a hidden layer with recurrent connections, and an output layer (Figure 1).
Recurrent Neural Network LMs
The hidden layer state ht encodes the history of all words observed in the sequence up to time step t. The state of the hidden layer is determined by the input layer and the hidden layer configuration of the previous time step ht_1.
Recurrent Neural Network LMs
The weights of the connections between the layers are summarized in a number of matrices: U represents weights from the input layer to the hidden layer, and W represents connections from the previous hidden layer to the current hidden layer .
hidden layer is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Lu, Shixiang and Chen, Zhenbiao and Xu, Bo
Abstract
Moreover, to learn high dimensional feature representation, we introduce a natural horizontal composition of more DAEs for large hidden layers feature learning.
Introduction
Moreover, to learn high dimensional feature representation, we introduce a natural horizontal composition for DAEs (HCDAE) that can be used to create large hidden layer representations simply by horizontally combining two (or more) DAEs (Baldi, 2012), which shows further improvement compared with single DAE in our experiments.
Semi-Supervised Deep Auto-encoder Features Learning for SMT
The connection weight W, hidden layer biases c and visible layer biases b can be learned efficiently using the contrastive divergence (Hinton, 2002; Carreira-Perpinan and Hinton, 2005).
Semi-Supervised Deep Auto-encoder Features Learning for SMT
When given a hidden layer h, factorial conditional distribution of visible layer 2) can be estimated by
Semi-Supervised Deep Auto-encoder Features Learning for SMT
Moreover, although we have introduced another four types of phrase features (X2, X3, X4 and X 5), the only 16 features in X are a bottleneck for learning large hidden layers feature representation, because it has limited information, the performance of the high-dimensional DAE features which are directly learned from single DAE is not very satisfactory.
hidden layer is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Cui, Lei and Zhang, Dongdong and Liu, Shujie and Chen, Qiming and Li, Mu and Zhou, Ming and Yang, Muyun
Experiments
The vocabulary size for the input layer is 100,000, and we choose different lengths for the hidden layer as L = {100, 300, 600, 1000} in the experiments.
Experiments
4.3 Effect of retrieved documents and length of hidden layers
Experiments
We illustrate the relationship among translation accuracy (BLEU), the number of retrieved documents (N) and the length of hidden layers (L) on different testing datasets.
Topic Similarity Model with Neural Network
Assuming that the dimension of the 9(X) is L, the linear layer forms a L x V matriX W which projects the n-of-V vector to a L-dimensional hidden layer .
Topic Similarity Model with Neural Network
Training neural networks involves many factors such as the learning rate and the length of hidden layers .
hidden layer is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Silberer, Carina and Lapata, Mirella
Autoencoders for Grounded Semantics
To further optimize the parameters of the network, a supervised criterion can be imposed on top of the last hidden layer such as the minimization of a prediction error on a supervised task (Bengio, 2009).
Autoencoders for Grounded Semantics
We first train SAEs with two hidden layers (codings) for each modality separately.
Autoencoders for Grounded Semantics
Then, we join these two SAEs by feeding their respective second coding simultaneously to another autoencoder, whose hidden layer thus yields the fused meaning representation.
Experimental Setup
This model has the following architecture: the textual autoencoder (see Figure 1, left-hand side) consists of 700 hidden units which are then mapped to the second hidden layer with 500 units (the corruption parameter was set to v = 0.1); the visual autoencoder (see Figure 1, right-hand side) has 170 and 100 hidden units, in the first and second layer, respectively.
hidden layer is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Lazaridou, Angeliki and Bruni, Elia and Baroni, Marco
Experimental Setup
Neural Network (NNet) The last model that we introduce is a neural network with one hidden layer .
Experimental Setup
where (9va consists of the model weights 6(1) 6 Rdeh and 6(2) 6 Rthw that map the input image-based vectors V8 first to the hidden layer and then to the output layer in order to obtain text-based vectors, i.e., W8 2 0(2)(0(1)(V36(1))6(2)), where 0(1) and 0(2) are
Introduction
This is achieved by means of a simple neural network trained to project image-extracted feature vectors to text-based vectors through a hidden layer that can be interpreted as a cross-modal semantic space.
Results
In order to gain qualitative insights into the performance of the projection process of NN, we attempt to investigate the role and interpretability of the hidden layer .
Results
hidden layer acts as a cross-modal concept cate-gorizatiorflorganization system.
hidden layer is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Liu, Shujie and Yang, Nan and Li, Mu and Zhou, Ming
Our Model
As shown in Figure l, the network contains three layers, an input layer, a hidden layer , and an output layer.
Our Model
previous history ht_1 to generate the current hidden layer , which is a new history vector ht .
Phrase Pair Embedding
The neural network is used to reduce the space dimension of sparse features, and the hidden layer of the network is used as the phrase pair embedding.
Phrase Pair Embedding
The length of the hidden layer is empirically set to 20.
hidden layer is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Ma, Ji and Zhang, Yue and Zhu, Jingbo
Experiments
For each data set, we investigate an extensive set of combinations of hyper-parameters: the n-gram window (l,r) in {(1, 1), (2,1), (1,2), (2,2)}; the hidden layer size in {200, 300, 400}; the learning rate in {0.1, 0.01, 0.001}.
Learning from Web Text
The affine form of E with respect to v and h implies that the visible variables are conditionally independent with each other given the hidden layer units, and vice versa.
Learning from Web Text
For each position j, there is a weight matrix W0) 6 RHXD, which is used to model the interaction between the hidden layer and the word projection in position j.
Neural Network for POS Disambiguation
The web-feature module, shown in the lower left part of Figure 1, consists of a input layer and two hidden layers .
hidden layer is mentioned in 4 sentences in this paper.
Topics mentioned in this paper: