SciSurf: Index of "hidden layer" in Proc. ACL

Index of papers in Proc. ACL that mention

hidden layer

Seen in text as:

hidden layer (104)
hidden layers (31)

Seen in 128 sentences in 14 papers.

1. Large tagset labeling using Feed Forward Neural Networks. Case study on Romanian Language

Boros, Tiberiu and Ion, Radu and Tufis, Dan

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	In our experiments, we used a fully connected, feed forward neural network with 3 layers (1 input layer, 1 hidden layer and 1 output layer)
Abstract	In order to fully characterize our system, we took into account the following parameters: accuracy, runtime speed, training speed, hidden layer configuration and the number of optimal training iterations.
Abstract	For example, the accuracy, the optimal number of training iterations, the training and the runtime speed are all highly dependent on the hidden layer configuration.

hidden layer is mentioned in 20 sentences in this paper.

Topics mentioned in this paper:

2. Word Alignment Modeling with Context Dependent Deep Neural Network

Yang, Nan and Liu, Shujie and Li, Mu and Zhou, Ming and Yu, Nenghai

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

DNN structures for NLP	Besides that, neural network training also involves some hyperparameters such as learning rate, the number of hidden layers .
Experiments and Results	Table 3: Effect of different number of hidden layers .
Experiments and Results	Two hidden layers outperform one hidden layer, while three hidden layers do not bring further improvement.
Experiments and Results	6.4.3 Effect of number of hidden layers
Training	Tunable parameters in neural network alignment model include: word embeddings in lookup table LT, parameters Wl, bl for linear transformations in the hidden layers of the neural network, and distortion parameters 3d of jump distance.
Training	We set word embedding length to 20, window size to 5, and the length of the only hidden layer to 40.
Training	To make our model concrete, there are still hyper-parameters to be determined: the window size 3212 and tw, the length of each hidden layer Ll.

hidden layer is mentioned in 11 sentences in this paper.

Topics mentioned in this paper:

3. Fast and Robust Neural Network Joint Models for Statistical Machine Translation

Devlin, Jacob and Zbib, Rabih and Huang, Zhongqiang and Lamar, Thomas and Schwartz, Richard and Makhoul, John

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	When used in conjunction with a precomputed hidden layer , these techniques speed up NNJ M computation by a factor of 10,000X, with only a small reduction on MT accuracy.
Neural Network Joint Model (NNJ M)	We use two 512-dimensional hidden layers with tanh activation functions.
Neural Network Joint Model (NNJ M)	We chose these values for the hidden layer size, vocabulary size, and source window size because they seemed to work best on our data sets — larger sizes did not improve results, while smaller sizes degraded results.
Neural Network Joint Model (NNJ M)	2.4 Pre-Computing the Hidden Layer

hidden layer is mentioned in 17 sentences in this paper.

Topics mentioned in this paper:

4. Recurrent Neural Networks for Word Alignment Model

Tamura, Akihiro and Watanabe, Taro and Sumita, Eiichiro

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	This study proposes a word alignment model based on a recurrent neural network (RNN), in which an unlimited alignment history is represented by recurrently connected hidden layers .
Introduction	An RNN has a hidden layer with recurrent connections that propagates its own previous signals.
RNN-based Alignment Model	The model consists of a lookup layer, a hidden layer , and an output layer, which have weight
RNN-based Alignment Model	2Consecutive l hidden layers can be used: 2; = f (H; X 21—1 -\|- B H1).
RNN-based Alignment Model	For simplicity, this paper describes the model with 1 hidden layer .
Related Work	21 Hidden Layer \| htanh(H><20+BH) l
Related Work	Figure 1 shows the network structure with one hidden layer for computing a lexical translation probability tle$(fj, ea].
Related Work	The model consists of a lookup layer, a hidden layer , and an output layer, which have weight matrices.

hidden layer is mentioned in 25 sentences in this paper.

Topics mentioned in this paper:

5. Decoder Integration and Expected BLEU Training for Recurrent Neural Network Language Models

Auli, Michael and Gao, Jianfeng

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Decoder Integration	To solve this problem, we follow previous work on lattice rescoring with recurrent networks that maintained the usual n-gram context but kept a beam of hidden layer configurations at each state (Auli et al., 2013).
Decoder Integration	In fact, to make decoding as efficient as possible, we only keep the single best scoring hidden layer configuration.
Decoder Integration	As future cost estimate we score each phrase in isolation, resetting the hidden layer at the beginning of a phrase.
Experiments	The hidden layer uses 100 neurons unless otherwise stated.
Experiments	To keep training times manageable, we reduce the hidden layer size to 30 neurons, thereby greatly increasing speed.
Recurrent Neural Network LMs	(2010) which is factored into an input layer, a hidden layer with recurrent connections, and an output layer (Figure 1).
Recurrent Neural Network LMs	The hidden layer state ht encodes the history of all words observed in the sequence up to time step t. The state of the hidden layer is determined by the input layer and the hidden layer configuration of the previous time step ht_1.
Recurrent Neural Network LMs	The weights of the connections between the layers are summarized in a number of matrices: U represents weights from the input layer to the hidden layer, and W represents connections from the previous hidden layer to the current hidden layer .

hidden layer is mentioned in 10 sentences in this paper.

Topics mentioned in this paper:

6. Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machine Translation

Lu, Shixiang and Chen, Zhenbiao and Xu, Bo

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Moreover, to learn high dimensional feature representation, we introduce a natural horizontal composition of more DAEs for large hidden layers feature learning.
Introduction	Moreover, to learn high dimensional feature representation, we introduce a natural horizontal composition for DAEs (HCDAE) that can be used to create large hidden layer representations simply by horizontally combining two (or more) DAEs (Baldi, 2012), which shows further improvement compared with single DAE in our experiments.
Semi-Supervised Deep Auto-encoder Features Learning for SMT	The connection weight W, hidden layer biases c and visible layer biases b can be learned efficiently using the contrastive divergence (Hinton, 2002; Carreira-Perpinan and Hinton, 2005).
Semi-Supervised Deep Auto-encoder Features Learning for SMT	When given a hidden layer h, factorial conditional distribution of visible layer 2) can be estimated by
Semi-Supervised Deep Auto-encoder Features Learning for SMT	Moreover, although we have introduced another four types of phrase features (X2, X3, X4 and X 5), the only 16 features in X are a bottleneck for learning large hidden layers feature representation, because it has limited information, the performance of the high-dimensional DAE features which are directly learned from single DAE is not very satisfactory.

hidden layer is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

7. Learning Topic Representation for SMT with Neural Networks

Cui, Lei and Zhang, Dongdong and Liu, Shujie and Chen, Qiming and Li, Mu and Zhou, Ming and Yang, Muyun

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	The vocabulary size for the input layer is 100,000, and we choose different lengths for the hidden layer as L = {100, 300, 600, 1000} in the experiments.
Experiments	4.3 Effect of retrieved documents and length of hidden layers
Experiments	We illustrate the relationship among translation accuracy (BLEU), the number of retrieved documents (N) and the length of hidden layers (L) on different testing datasets.
Topic Similarity Model with Neural Network	Assuming that the dimension of the 9(X) is L, the linear layer forms a L x V matriX W which projects the n-of-V vector to a L-dimensional hidden layer .
Topic Similarity Model with Neural Network	Training neural networks involves many factors such as the learning rate and the length of hidden layers .

hidden layer is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

8. Learning Grounded Meaning Representations with Autoencoders

Silberer, Carina and Lapata, Mirella

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Autoencoders for Grounded Semantics	To further optimize the parameters of the network, a supervised criterion can be imposed on top of the last hidden layer such as the minimization of a prediction error on a supervised task (Bengio, 2009).
Autoencoders for Grounded Semantics	We first train SAEs with two hidden layers (codings) for each modality separately.
Autoencoders for Grounded Semantics	Then, we join these two SAEs by feeding their respective second coding simultaneously to another autoencoder, whose hidden layer thus yields the fused meaning representation.
Experimental Setup	This model has the following architecture: the textual autoencoder (see Figure 1, left-hand side) consists of 700 hidden units which are then mapped to the second hidden layer with 500 units (the corruption parameter was set to v = 0.1); the visual autoencoder (see Figure 1, right-hand side) has 170 and 100 hidden units, in the first and second layer, respectively.

hidden layer is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

9. Learning to Win by Reading Manuals in a Monte-Carlo Framework

Branavan, S.R.K and Silver, David and Barzilay, Regina

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Adding Linguistic Knowledge to the Monte-Carlo Framework	) layer: f(87 a7 d7 yi: f <—Output layer Hidden layer encoding
Adding Linguistic Knowledge to the Monte-Carlo Framework	sentence relevance _, Hidden layer encoding / .
Adding Linguistic Knowledge to the Monte-Carlo Framework	The first hidden layer contains two disjoint sets of units g’ and 2’ corresponding to linguistic analyzes of the strategy document.
Introduction	We employ a multilayer neural network where the hidden layers represent sentence relevance and predicate parsing decisions.

hidden layer is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

10. Is this a wampimuk? Cross-modal mapping between distributional semantics and the visual world

Lazaridou, Angeliki and Bruni, Elia and Baroni, Marco

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Setup	Neural Network (NNet) The last model that we introduce is a neural network with one hidden layer .
Experimental Setup	where (9va consists of the model weights 6(1) 6 Rdeh and 6(2) 6 Rthw that map the input image-based vectors V8 first to the hidden layer and then to the output layer in order to obtain text-based vectors, i.e., W8 2 0(2)(0(1)(V36(1))6(2)), where 0(1) and 0(2) are
Introduction	This is achieved by means of a simple neural network trained to project image-extracted feature vectors to text-based vectors through a hidden layer that can be interpreted as a cross-modal semantic space.
Results	In order to gain qualitative insights into the performance of the projection process of NN, we attempt to investigate the role and interpretability of the hidden layer .
Results	hidden layer acts as a cross-modal concept cate-gorizatiorflorganization system.

hidden layer is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

11. Learning Entity Representation for Entity Disambiguation

He, Zhengyan and Liu, Shujie and Li, Mu and Zhou, Ming and Zhang, Longkai and Wang, Houfeng

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments and Analysis	Training settings: In pre-training stage, input layer has 100,000 units, all hidden layers have 1,000 units with rectifier function masc(0, Following (Glorot et al., 2011), for the first reconstruction layer, we use sigmoid activation function and cross-entropy error function.
Learning Representation for Contextual Document	This stage we optimize the learned representation ( “hidden layer 11” in Fig.
Learning Representation for Contextual Document	The network weights below “hidden layer 11” are initialized with the pre-training stage.
Learning Representation for Contextual Document	hidden layer n

hidden layer is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

12. A Recursive Recurrent Neural Network for Statistical Machine Translation

Liu, Shujie and Yang, Nan and Li, Mu and Zhou, Ming

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Our Model	As shown in Figure l, the network contains three layers, an input layer, a hidden layer , and an output layer.
Our Model	previous history ht_1 to generate the current hidden layer , which is a new history vector ht .
Phrase Pair Embedding	The neural network is used to reduce the space dimension of sparse features, and the hidden layer of the network is used as the phrase pair embedding.
Phrase Pair Embedding	The length of the hidden layer is empirically set to 20.

hidden layer is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

13. Tagging The Web: Building A Robust Web Tagger with Neural Network

Ma, Ji and Zhang, Yue and Zhu, Jingbo

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	For each data set, we investigate an extensive set of combinations of hyper-parameters: the n-gram window (l,r) in {(1, 1), (2,1), (1,2), (2,2)}; the hidden layer size in {200, 300, 400}; the learning rate in {0.1, 0.01, 0.001}.
Learning from Web Text	The affine form of E with respect to v and h implies that the visible variables are conditionally independent with each other given the hidden layer units, and vice versa.
Learning from Web Text	For each position j, there is a weight matrix W0) 6 RHXD, which is used to model the interaction between the hidden layer and the word projection in position j.
Neural Network for POS Disambiguation	The web-feature module, shown in the lower left part of Figure 1, consists of a input layer and two hidden layers .

hidden layer is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

14. Improving Word Representations via Global Context and Multiple Word Prototypes

Huang, Eric and Socher, Richard and Manning, Christopher and Ng, Andrew

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Global Context-Aware Neural Language Model	To compute the score of local context, scorel, we use a neural network with one hidden layer:
Global Context-Aware Neural Language Model	where [£131,132, ...,;vm] is the concatenation of the m word embeddings representing sequence 8, f is an element-wise activation function such as tanh, a1 6 Rh“ is the activation of the hidden layer with h hidden nodes, W1 6 WW) and W2 6 nlxh are respectively the first and second layer weights of the neural network, and b1, ()2 are the biases of each layer.
Global Context-Aware Neural Language Model	the hidden layer with h(9) hidden nodes, ng) E Rh(g)x(2n) and ng) 6 Emmy) are respectively the first and second layer weights of the neural network, and big), bég) are the biases of each layer.

hidden layer is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: