Index of papers in Proc. ACL 2013 that mention
  • hidden layer
Boros, Tiberiu and Ion, Radu and Tufis, Dan
Abstract
In our experiments, we used a fully connected, feed forward neural network with 3 layers (1 input layer, 1 hidden layer and 1 output layer)
Abstract
In order to fully characterize our system, we took into account the following parameters: accuracy, runtime speed, training speed, hidden layer configuration and the number of optimal training iterations.
Abstract
For example, the accuracy, the optimal number of training iterations, the training and the runtime speed are all highly dependent on the hidden layer configuration.
hidden layer is mentioned in 20 sentences in this paper.
Topics mentioned in this paper:
Yang, Nan and Liu, Shujie and Li, Mu and Zhou, Ming and Yu, Nenghai
DNN structures for NLP
Besides that, neural network training also involves some hyperparameters such as learning rate, the number of hidden layers .
Experiments and Results
Table 3: Effect of different number of hidden layers .
Experiments and Results
Two hidden layers outperform one hidden layer, while three hidden layers do not bring further improvement.
Experiments and Results
6.4.3 Effect of number of hidden layers
Training
Tunable parameters in neural network alignment model include: word embeddings in lookup table LT, parameters Wl, bl for linear transformations in the hidden layers of the neural network, and distortion parameters 3d of jump distance.
Training
We set word embedding length to 20, window size to 5, and the length of the only hidden layer to 40.
Training
To make our model concrete, there are still hyper-parameters to be determined: the window size 3212 and tw, the length of each hidden layer Ll.
hidden layer is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
He, Zhengyan and Liu, Shujie and Li, Mu and Zhou, Ming and Zhang, Longkai and Wang, Houfeng
Experiments and Analysis
Training settings: In pre-training stage, input layer has 100,000 units, all hidden layers have 1,000 units with rectifier function masc(0, Following (Glorot et al., 2011), for the first reconstruction layer, we use sigmoid activation function and cross-entropy error function.
Learning Representation for Contextual Document
This stage we optimize the learned representation ( “hidden layer 11” in Fig.
Learning Representation for Contextual Document
The network weights below “hidden layer 11” are initialized with the pre-training stage.
Learning Representation for Contextual Document
hidden layer n
hidden layer is mentioned in 4 sentences in this paper.
Topics mentioned in this paper: