Abstract | In our experiments, we used a fully connected, feed forward neural network with 3 layers (1 input layer, 1 hidden layer and 1 output layer) |
Abstract | In order to fully characterize our system, we took into account the following parameters: accuracy, runtime speed, training speed, hidden layer configuration and the number of optimal training iterations. |
Abstract | For example, the accuracy, the optimal number of training iterations, the training and the runtime speed are all highly dependent on the hidden layer configuration. |
DNN structures for NLP | Besides that, neural network training also involves some hyperparameters such as learning rate, the number of hidden layers . |
Experiments and Results | Table 3: Effect of different number of hidden layers . |
Experiments and Results | Two hidden layers outperform one hidden layer, while three hidden layers do not bring further improvement. |
Experiments and Results | 6.4.3 Effect of number of hidden layers |
Training | Tunable parameters in neural network alignment model include: word embeddings in lookup table LT, parameters Wl, bl for linear transformations in the hidden layers of the neural network, and distortion parameters 3d of jump distance. |
Training | We set word embedding length to 20, window size to 5, and the length of the only hidden layer to 40. |
Training | To make our model concrete, there are still hyper-parameters to be determined: the window size 3212 and tw, the length of each hidden layer Ll. |
Introduction | When used in conjunction with a precomputed hidden layer , these techniques speed up NNJ M computation by a factor of 10,000X, with only a small reduction on MT accuracy. |
Neural Network Joint Model (NNJ M) | We use two 512-dimensional hidden layers with tanh activation functions. |
Neural Network Joint Model (NNJ M) | We chose these values for the hidden layer size, vocabulary size, and source window size because they seemed to work best on our data sets — larger sizes did not improve results, while smaller sizes degraded results. |
Neural Network Joint Model (NNJ M) | 2.4 Pre-Computing the Hidden Layer |
Abstract | This study proposes a word alignment model based on a recurrent neural network (RNN), in which an unlimited alignment history is represented by recurrently connected hidden layers . |
Introduction | An RNN has a hidden layer with recurrent connections that propagates its own previous signals. |
RNN-based Alignment Model | The model consists of a lookup layer, a hidden layer , and an output layer, which have weight |
RNN-based Alignment Model | 2Consecutive l hidden layers can be used: 2; = f (H; X 21—1 -|- B H1). |
RNN-based Alignment Model | For simplicity, this paper describes the model with 1 hidden layer . |
Related Work | 21 Hidden Layer | htanh(H><20+BH) l |
Related Work | Figure 1 shows the network structure with one hidden layer for computing a lexical translation probability tle$(fj, ea]. |
Related Work | The model consists of a lookup layer, a hidden layer , and an output layer, which have weight matrices. |
Decoder Integration | To solve this problem, we follow previous work on lattice rescoring with recurrent networks that maintained the usual n-gram context but kept a beam of hidden layer configurations at each state (Auli et al., 2013). |
Decoder Integration | In fact, to make decoding as efficient as possible, we only keep the single best scoring hidden layer configuration. |
Decoder Integration | As future cost estimate we score each phrase in isolation, resetting the hidden layer at the beginning of a phrase. |
Experiments | The hidden layer uses 100 neurons unless otherwise stated. |
Experiments | To keep training times manageable, we reduce the hidden layer size to 30 neurons, thereby greatly increasing speed. |
Recurrent Neural Network LMs | (2010) which is factored into an input layer, a hidden layer with recurrent connections, and an output layer (Figure 1). |
Recurrent Neural Network LMs | The hidden layer state ht encodes the history of all words observed in the sequence up to time step t. The state of the hidden layer is determined by the input layer and the hidden layer configuration of the previous time step ht_1. |
Recurrent Neural Network LMs | The weights of the connections between the layers are summarized in a number of matrices: U represents weights from the input layer to the hidden layer, and W represents connections from the previous hidden layer to the current hidden layer . |
Abstract | Moreover, to learn high dimensional feature representation, we introduce a natural horizontal composition of more DAEs for large hidden layers feature learning. |
Introduction | Moreover, to learn high dimensional feature representation, we introduce a natural horizontal composition for DAEs (HCDAE) that can be used to create large hidden layer representations simply by horizontally combining two (or more) DAEs (Baldi, 2012), which shows further improvement compared with single DAE in our experiments. |
Semi-Supervised Deep Auto-encoder Features Learning for SMT | The connection weight W, hidden layer biases c and visible layer biases b can be learned efficiently using the contrastive divergence (Hinton, 2002; Carreira-Perpinan and Hinton, 2005). |
Semi-Supervised Deep Auto-encoder Features Learning for SMT | When given a hidden layer h, factorial conditional distribution of visible layer 2) can be estimated by |
Semi-Supervised Deep Auto-encoder Features Learning for SMT | Moreover, although we have introduced another four types of phrase features (X2, X3, X4 and X 5), the only 16 features in X are a bottleneck for learning large hidden layers feature representation, because it has limited information, the performance of the high-dimensional DAE features which are directly learned from single DAE is not very satisfactory. |
Experiments | The vocabulary size for the input layer is 100,000, and we choose different lengths for the hidden layer as L = {100, 300, 600, 1000} in the experiments. |
Experiments | 4.3 Effect of retrieved documents and length of hidden layers |
Experiments | We illustrate the relationship among translation accuracy (BLEU), the number of retrieved documents (N) and the length of hidden layers (L) on different testing datasets. |
Topic Similarity Model with Neural Network | Assuming that the dimension of the 9(X) is L, the linear layer forms a L x V matriX W which projects the n-of-V vector to a L-dimensional hidden layer . |
Topic Similarity Model with Neural Network | Training neural networks involves many factors such as the learning rate and the length of hidden layers . |
Autoencoders for Grounded Semantics | To further optimize the parameters of the network, a supervised criterion can be imposed on top of the last hidden layer such as the minimization of a prediction error on a supervised task (Bengio, 2009). |
Autoencoders for Grounded Semantics | We first train SAEs with two hidden layers (codings) for each modality separately. |
Autoencoders for Grounded Semantics | Then, we join these two SAEs by feeding their respective second coding simultaneously to another autoencoder, whose hidden layer thus yields the fused meaning representation. |
Experimental Setup | This model has the following architecture: the textual autoencoder (see Figure 1, left-hand side) consists of 700 hidden units which are then mapped to the second hidden layer with 500 units (the corruption parameter was set to v = 0.1); the visual autoencoder (see Figure 1, right-hand side) has 170 and 100 hidden units, in the first and second layer, respectively. |
Adding Linguistic Knowledge to the Monte-Carlo Framework | ) layer: f(87 a7 d7 yi: f <—Output layer Hidden layer encoding |
Adding Linguistic Knowledge to the Monte-Carlo Framework | sentence relevance _, Hidden layer encoding / . |
Adding Linguistic Knowledge to the Monte-Carlo Framework | The first hidden layer contains two disjoint sets of units g’ and 2’ corresponding to linguistic analyzes of the strategy document. |
Introduction | We employ a multilayer neural network where the hidden layers represent sentence relevance and predicate parsing decisions. |
Experimental Setup | Neural Network (NNet) The last model that we introduce is a neural network with one hidden layer . |
Experimental Setup | where (9va consists of the model weights 6(1) 6 Rdeh and 6(2) 6 Rthw that map the input image-based vectors V8 first to the hidden layer and then to the output layer in order to obtain text-based vectors, i.e., W8 2 0(2)(0(1)(V36(1))6(2)), where 0(1) and 0(2) are |
Introduction | This is achieved by means of a simple neural network trained to project image-extracted feature vectors to text-based vectors through a hidden layer that can be interpreted as a cross-modal semantic space. |
Results | In order to gain qualitative insights into the performance of the projection process of NN, we attempt to investigate the role and interpretability of the hidden layer . |
Results | hidden layer acts as a cross-modal concept cate-gorizatiorflorganization system. |
Experiments and Analysis | Training settings: In pre-training stage, input layer has 100,000 units, all hidden layers have 1,000 units with rectifier function masc(0, Following (Glorot et al., 2011), for the first reconstruction layer, we use sigmoid activation function and cross-entropy error function. |
Learning Representation for Contextual Document | This stage we optimize the learned representation ( “hidden layer 11” in Fig. |
Learning Representation for Contextual Document | The network weights below “hidden layer 11” are initialized with the pre-training stage. |
Learning Representation for Contextual Document | hidden layer n |
Our Model | As shown in Figure l, the network contains three layers, an input layer, a hidden layer , and an output layer. |
Our Model | previous history ht_1 to generate the current hidden layer , which is a new history vector ht . |
Phrase Pair Embedding | The neural network is used to reduce the space dimension of sparse features, and the hidden layer of the network is used as the phrase pair embedding. |
Phrase Pair Embedding | The length of the hidden layer is empirically set to 20. |
Experiments | For each data set, we investigate an extensive set of combinations of hyper-parameters: the n-gram window (l,r) in {(1, 1), (2,1), (1,2), (2,2)}; the hidden layer size in {200, 300, 400}; the learning rate in {0.1, 0.01, 0.001}. |
Learning from Web Text | The affine form of E with respect to v and h implies that the visible variables are conditionally independent with each other given the hidden layer units, and vice versa. |
Learning from Web Text | For each position j, there is a weight matrix W0) 6 RHXD, which is used to model the interaction between the hidden layer and the word projection in position j. |
Neural Network for POS Disambiguation | The web-feature module, shown in the lower left part of Figure 1, consists of a input layer and two hidden layers . |
Global Context-Aware Neural Language Model | To compute the score of local context, scorel, we use a neural network with one hidden layer: |
Global Context-Aware Neural Language Model | where [£131,132, ...,;vm] is the concatenation of the m word embeddings representing sequence 8, f is an element-wise activation function such as tanh, a1 6 Rh“ is the activation of the hidden layer with h hidden nodes, W1 6 WW) and W2 6 nlxh are respectively the first and second layer weights of the neural network, and b1, ()2 are the biases of each layer. |
Global Context-Aware Neural Language Model | the hidden layer with h(9) hidden nodes, ng) E Rh(g)x(2n) and ng) 6 Emmy) are respectively the first and second layer weights of the neural network, and big), bég) are the biases of each layer. |