Index of papers in Proc. ACL that mention
  • hidden layer
Boros, Tiberiu and Ion, Radu and Tufis, Dan
Abstract
In our experiments, we used a fully connected, feed forward neural network with 3 layers (1 input layer, 1 hidden layer and 1 output layer)
Abstract
In order to fully characterize our system, we took into account the following parameters: accuracy, runtime speed, training speed, hidden layer configuration and the number of optimal training iterations.
Abstract
For example, the accuracy, the optimal number of training iterations, the training and the runtime speed are all highly dependent on the hidden layer configuration.
hidden layer is mentioned in 20 sentences in this paper.
Topics mentioned in this paper:
Yang, Nan and Liu, Shujie and Li, Mu and Zhou, Ming and Yu, Nenghai
DNN structures for NLP
Besides that, neural network training also involves some hyperparameters such as learning rate, the number of hidden layers .
Experiments and Results
Table 3: Effect of different number of hidden layers .
Experiments and Results
Two hidden layers outperform one hidden layer, while three hidden layers do not bring further improvement.
Experiments and Results
6.4.3 Effect of number of hidden layers
Training
Tunable parameters in neural network alignment model include: word embeddings in lookup table LT, parameters Wl, bl for linear transformations in the hidden layers of the neural network, and distortion parameters 3d of jump distance.
Training
We set word embedding length to 20, window size to 5, and the length of the only hidden layer to 40.
Training
To make our model concrete, there are still hyper-parameters to be determined: the window size 3212 and tw, the length of each hidden layer Ll.
hidden layer is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Devlin, Jacob and Zbib, Rabih and Huang, Zhongqiang and Lamar, Thomas and Schwartz, Richard and Makhoul, John
Introduction
When used in conjunction with a precomputed hidden layer , these techniques speed up NNJ M computation by a factor of 10,000X, with only a small reduction on MT accuracy.
Neural Network Joint Model (NNJ M)
We use two 512-dimensional hidden layers with tanh activation functions.
Neural Network Joint Model (NNJ M)
We chose these values for the hidden layer size, vocabulary size, and source window size because they seemed to work best on our data sets — larger sizes did not improve results, while smaller sizes degraded results.
Neural Network Joint Model (NNJ M)
2.4 Pre-Computing the Hidden Layer
hidden layer is mentioned in 17 sentences in this paper.
Topics mentioned in this paper:
Tamura, Akihiro and Watanabe, Taro and Sumita, Eiichiro
Abstract
This study proposes a word alignment model based on a recurrent neural network (RNN), in which an unlimited alignment history is represented by recurrently connected hidden layers .
Introduction
An RNN has a hidden layer with recurrent connections that propagates its own previous signals.
RNN-based Alignment Model
The model consists of a lookup layer, a hidden layer , and an output layer, which have weight
RNN-based Alignment Model
2Consecutive l hidden layers can be used: 2; = f (H; X 21—1 -|- B H1).
RNN-based Alignment Model
For simplicity, this paper describes the model with 1 hidden layer .
Related Work
21 Hidden Layer | htanh(H><20+BH) l
Related Work
Figure 1 shows the network structure with one hidden layer for computing a lexical translation probability tle$(fj, ea].
Related Work
The model consists of a lookup layer, a hidden layer , and an output layer, which have weight matrices.
hidden layer is mentioned in 25 sentences in this paper.
Topics mentioned in this paper:
Auli, Michael and Gao, Jianfeng
Decoder Integration
To solve this problem, we follow previous work on lattice rescoring with recurrent networks that maintained the usual n-gram context but kept a beam of hidden layer configurations at each state (Auli et al., 2013).
Decoder Integration
In fact, to make decoding as efficient as possible, we only keep the single best scoring hidden layer configuration.
Decoder Integration
As future cost estimate we score each phrase in isolation, resetting the hidden layer at the beginning of a phrase.
Experiments
The hidden layer uses 100 neurons unless otherwise stated.
Experiments
To keep training times manageable, we reduce the hidden layer size to 30 neurons, thereby greatly increasing speed.
Recurrent Neural Network LMs
(2010) which is factored into an input layer, a hidden layer with recurrent connections, and an output layer (Figure 1).
Recurrent Neural Network LMs
The hidden layer state ht encodes the history of all words observed in the sequence up to time step t. The state of the hidden layer is determined by the input layer and the hidden layer configuration of the previous time step ht_1.
Recurrent Neural Network LMs
The weights of the connections between the layers are summarized in a number of matrices: U represents weights from the input layer to the hidden layer, and W represents connections from the previous hidden layer to the current hidden layer .
hidden layer is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Lu, Shixiang and Chen, Zhenbiao and Xu, Bo
Abstract
Moreover, to learn high dimensional feature representation, we introduce a natural horizontal composition of more DAEs for large hidden layers feature learning.
Introduction
Moreover, to learn high dimensional feature representation, we introduce a natural horizontal composition for DAEs (HCDAE) that can be used to create large hidden layer representations simply by horizontally combining two (or more) DAEs (Baldi, 2012), which shows further improvement compared with single DAE in our experiments.
Semi-Supervised Deep Auto-encoder Features Learning for SMT
The connection weight W, hidden layer biases c and visible layer biases b can be learned efficiently using the contrastive divergence (Hinton, 2002; Carreira-Perpinan and Hinton, 2005).
Semi-Supervised Deep Auto-encoder Features Learning for SMT
When given a hidden layer h, factorial conditional distribution of visible layer 2) can be estimated by
Semi-Supervised Deep Auto-encoder Features Learning for SMT
Moreover, although we have introduced another four types of phrase features (X2, X3, X4 and X 5), the only 16 features in X are a bottleneck for learning large hidden layers feature representation, because it has limited information, the performance of the high-dimensional DAE features which are directly learned from single DAE is not very satisfactory.
hidden layer is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Cui, Lei and Zhang, Dongdong and Liu, Shujie and Chen, Qiming and Li, Mu and Zhou, Ming and Yang, Muyun
Experiments
The vocabulary size for the input layer is 100,000, and we choose different lengths for the hidden layer as L = {100, 300, 600, 1000} in the experiments.
Experiments
4.3 Effect of retrieved documents and length of hidden layers
Experiments
We illustrate the relationship among translation accuracy (BLEU), the number of retrieved documents (N) and the length of hidden layers (L) on different testing datasets.
Topic Similarity Model with Neural Network
Assuming that the dimension of the 9(X) is L, the linear layer forms a L x V matriX W which projects the n-of-V vector to a L-dimensional hidden layer .
Topic Similarity Model with Neural Network
Training neural networks involves many factors such as the learning rate and the length of hidden layers .
hidden layer is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Silberer, Carina and Lapata, Mirella
Autoencoders for Grounded Semantics
To further optimize the parameters of the network, a supervised criterion can be imposed on top of the last hidden layer such as the minimization of a prediction error on a supervised task (Bengio, 2009).
Autoencoders for Grounded Semantics
We first train SAEs with two hidden layers (codings) for each modality separately.
Autoencoders for Grounded Semantics
Then, we join these two SAEs by feeding their respective second coding simultaneously to another autoencoder, whose hidden layer thus yields the fused meaning representation.
Experimental Setup
This model has the following architecture: the textual autoencoder (see Figure 1, left-hand side) consists of 700 hidden units which are then mapped to the second hidden layer with 500 units (the corruption parameter was set to v = 0.1); the visual autoencoder (see Figure 1, right-hand side) has 170 and 100 hidden units, in the first and second layer, respectively.
hidden layer is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Branavan, S.R.K and Silver, David and Barzilay, Regina
Adding Linguistic Knowledge to the Monte-Carlo Framework
) layer: f(87 a7 d7 yi: f <—Output layer Hidden layer encoding
Adding Linguistic Knowledge to the Monte-Carlo Framework
sentence relevance _, Hidden layer encoding / .
Adding Linguistic Knowledge to the Monte-Carlo Framework
The first hidden layer contains two disjoint sets of units g’ and 2’ corresponding to linguistic analyzes of the strategy document.
Introduction
We employ a multilayer neural network where the hidden layers represent sentence relevance and predicate parsing decisions.
hidden layer is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Lazaridou, Angeliki and Bruni, Elia and Baroni, Marco
Experimental Setup
Neural Network (NNet) The last model that we introduce is a neural network with one hidden layer .
Experimental Setup
where (9va consists of the model weights 6(1) 6 Rdeh and 6(2) 6 Rthw that map the input image-based vectors V8 first to the hidden layer and then to the output layer in order to obtain text-based vectors, i.e., W8 2 0(2)(0(1)(V36(1))6(2)), where 0(1) and 0(2) are
Introduction
This is achieved by means of a simple neural network trained to project image-extracted feature vectors to text-based vectors through a hidden layer that can be interpreted as a cross-modal semantic space.
Results
In order to gain qualitative insights into the performance of the projection process of NN, we attempt to investigate the role and interpretability of the hidden layer .
Results
hidden layer acts as a cross-modal concept cate-gorizatiorflorganization system.
hidden layer is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
He, Zhengyan and Liu, Shujie and Li, Mu and Zhou, Ming and Zhang, Longkai and Wang, Houfeng
Experiments and Analysis
Training settings: In pre-training stage, input layer has 100,000 units, all hidden layers have 1,000 units with rectifier function masc(0, Following (Glorot et al., 2011), for the first reconstruction layer, we use sigmoid activation function and cross-entropy error function.
Learning Representation for Contextual Document
This stage we optimize the learned representation ( “hidden layer 11” in Fig.
Learning Representation for Contextual Document
The network weights below “hidden layer 11” are initialized with the pre-training stage.
Learning Representation for Contextual Document
hidden layer n
hidden layer is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Liu, Shujie and Yang, Nan and Li, Mu and Zhou, Ming
Our Model
As shown in Figure l, the network contains three layers, an input layer, a hidden layer , and an output layer.
Our Model
previous history ht_1 to generate the current hidden layer , which is a new history vector ht .
Phrase Pair Embedding
The neural network is used to reduce the space dimension of sparse features, and the hidden layer of the network is used as the phrase pair embedding.
Phrase Pair Embedding
The length of the hidden layer is empirically set to 20.
hidden layer is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Ma, Ji and Zhang, Yue and Zhu, Jingbo
Experiments
For each data set, we investigate an extensive set of combinations of hyper-parameters: the n-gram window (l,r) in {(1, 1), (2,1), (1,2), (2,2)}; the hidden layer size in {200, 300, 400}; the learning rate in {0.1, 0.01, 0.001}.
Learning from Web Text
The affine form of E with respect to v and h implies that the visible variables are conditionally independent with each other given the hidden layer units, and vice versa.
Learning from Web Text
For each position j, there is a weight matrix W0) 6 RHXD, which is used to model the interaction between the hidden layer and the word projection in position j.
Neural Network for POS Disambiguation
The web-feature module, shown in the lower left part of Figure 1, consists of a input layer and two hidden layers .
hidden layer is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Huang, Eric and Socher, Richard and Manning, Christopher and Ng, Andrew
Global Context-Aware Neural Language Model
To compute the score of local context, scorel, we use a neural network with one hidden layer:
Global Context-Aware Neural Language Model
where [£131,132, ...,;vm] is the concatenation of the m word embeddings representing sequence 8, f is an element-wise activation function such as tanh, a1 6 Rh“ is the activation of the hidden layer with h hidden nodes, W1 6 WW) and W2 6 nlxh are respectively the first and second layer weights of the neural network, and b1, ()2 are the biases of each layer.
Global Context-Aware Neural Language Model
the hidden layer with h(9) hidden nodes, ng) E Rh(g)x(2n) and ng) 6 Emmy) are respectively the first and second layer weights of the neural network, and big), bég) are the biases of each layer.
hidden layer is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: