Learning to Win by Reading Manuals in a Monte-Carlo Framework
Branavan, S.R.K and Silver, David and Barzilay, Regina

Article Structure

Abstract

This paper presents a novel approach for leveraging automatically extracted textual knowledge to improve the performance of control applications such as games.

Introduction

In this paper, we study the task of grounding linguistic analysis in control applications such as computer games.

Related Work

Our work fits into the broad area of grounded language acquisition where the goal is to learn linguistic analysis from a situated context (Oates, 2001; Siskind, 2001; Yu and Ballard, 2004; Fleischman and Roy, 2005; Mooney, 2008a; Mooney, 2008b; Branavan et al., 2009; Vogel and Jurafsky, 2010).

Monte-Carlo Framework for Computer Games

Our method operates within the Monte-Carlo search framework (Tesauro and Galperin, 1996), which has been successfully applied to complex computer games such as Go, Poker, Scrabble, multiplayer card games, and real-time strategy games, among others (Gelly et al., 2006; Tesauro and Galperin, 1996; Billings et al., 1999; Sheppard, 2002; Schafer, 2008; Sturtevant, 2008; Balla and Fern, 2009).

Adding Linguistic Knowledge to the Monte-Carlo Framework

In this section we describe how we inform the simulation-based player with information automatically extracted from text — in terms of both model structure and parameter estimation.

Topics

neural network

Appears in 7 sentences as: neural network (7)
In Learning to Win by Reading Manuals in a Monte-Carlo Framework
  1. We employ a multilayer neural network where the hidden layers represent sentence relevance and predicate parsing decisions.
    Page 2, “Introduction”
  2. As shown in Figure 2, our model is a four layer neural network .
    Page 4, “Adding Linguistic Knowledge to the Monte-Carlo Framework”
  3. The third feature layer f of the neural network is deterministically computed given the active units 3/, and zj of the softmax layers, and the values of the input layer.
    Page 5, “Adding Linguistic Knowledge to the Monte-Carlo Framework”
  4. In the second layer of the neural network , the units 23’ represent a predicate labeling (5;- of every sentence yz- E d. However, our intention is to incorporate, into action-value function Q, information from only the most relevant sentence.
    Page 5, “Adding Linguistic Knowledge to the Monte-Carlo Framework”
  5. The second baseline, latent variable, extends the linear action-value function Q(s, a) of the game only baseline with a set of latent variables — i.e., it is a four layer neural network , where the second layer’s units are activated only based on game information.
    Page 8, “Adding Linguistic Knowledge to the Monte-Carlo Framework”
  6. As such, it focuses on the initial portion of the game, providing little strategy advice relevant to subsequence game play.8 If this is the reason for the observed sentence relevance trend, we would also expect the final layer of the neural network to emphasize game features over text features after the first 25 steps of the game.
    Page 8, “Adding Linguistic Knowledge to the Monte-Carlo Framework”
  7. Figure 6: Difference between the norms of the text features and game features of the output layer of the neural network .
    Page 9, “Adding Linguistic Knowledge to the Monte-Carlo Framework”

See all papers in Proc. ACL 2011 that mention neural network.

See all papers in Proc. ACL that mention neural network.

Back to top.

Hidden layer

Appears in 5 sentences as: Hidden layer (2) hidden layer (2) hidden layers (1)
In Learning to Win by Reading Manuals in a Monte-Carlo Framework
  1. We employ a multilayer neural network where the hidden layers represent sentence relevance and predicate parsing decisions.
    Page 2, “Introduction”
  2. ) layer: f(87 a7 d7 yi: f <—Output layer Hidden layer encoding
    Page 4, “Adding Linguistic Knowledge to the Monte-Carlo Framework”
  3. sentence relevance _, Hidden layer encoding / .
    Page 4, “Adding Linguistic Knowledge to the Monte-Carlo Framework”
  4. The first hidden layer contains two disjoint sets of units g’ and 2’ corresponding to linguistic analyzes of the strategy document.
    Page 4, “Adding Linguistic Knowledge to the Monte-Carlo Framework”
  5. The units of the second hidden layer f (s, a, d, y,, are a set of fixed real valued feature functions on s, a, d and the active units y,- and z,- of g’ and 2’ respectively.
    Page 4, “Adding Linguistic Knowledge to the Monte-Carlo Framework”

See all papers in Proc. ACL 2011 that mention Hidden layer.

See all papers in Proc. ACL that mention Hidden layer.

Back to top.

dependency parse

Appears in 4 sentences as: dependency parse (4)
In Learning to Win by Reading Manuals in a Monte-Carlo Framework
  1. Given sentence 3/, and its dependency parse qi, we model the distribution over predicate labels (5;- as:
    Page 5, “Adding Linguistic Knowledge to the Monte-Carlo Framework”
  2. The feature function 2; used for predicate labeling on the other hand operates only on a given sentence and its dependency parse .
    Page 6, “Adding Linguistic Knowledge to the Monte-Carlo Framework”
  3. It computes features which are the Cartesian product of the candidate predicate label with word attributes such as type, part-of—speech tag, and dependency parse information.
    Page 6, “Adding Linguistic Knowledge to the Monte-Carlo Framework”
  4. The Stanford parser (de Marneffe et al., 2006) was used to generate the dependency parse information for sentences in the game manual.
    Page 7, “Adding Linguistic Knowledge to the Monte-Carlo Framework”

See all papers in Proc. ACL 2011 that mention dependency parse.

See all papers in Proc. ACL that mention dependency parse.

Back to top.

Latent variable

Appears in 3 sentences as: Latent variable (2) latent variable (1) latent variables (1)
In Learning to Win by Reading Manuals in a Monte-Carlo Framework
  1. Game only 17.3 5.3 i: 2.7 Sentence relevance 46.7 2.8 i: 3.5 Full model 53.7 5.9 i: 3.5 Random text 40.3 4.3 i: 3.4 Latent variable 26.1 3.7 i: 3.1
    Page 7, “Adding Linguistic Knowledge to the Monte-Carlo Framework”
  2. Method % Wins Standard Error Game only 45.7 i: 7.0 Latent variable 62.2 i: 6.9 Full model 78.8 i: 5.8
    Page 7, “Adding Linguistic Knowledge to the Monte-Carlo Framework”
  3. The second baseline, latent variable, extends the linear action-value function Q(s, a) of the game only baseline with a set of latent variables — i.e., it is a four layer neural network, where the second layer’s units are activated only based on game information.
    Page 8, “Adding Linguistic Knowledge to the Monte-Carlo Framework”

See all papers in Proc. ACL 2011 that mention Latent variable.

See all papers in Proc. ACL that mention Latent variable.

Back to top.

model parameters

Appears in 3 sentences as: model parameters (3)
In Learning to Win by Reading Manuals in a Monte-Carlo Framework
  1. Since our model is a nonlinear approximation of the underlying action-value function of the game, we learn model parameters by applying nonlinear regression to the observed final utilities from the simulated roll-outs.
    Page 5, “Adding Linguistic Knowledge to the Monte-Carlo Framework”
  2. The resulting update to model parameters 6 is of the form:
    Page 5, “Adding Linguistic Knowledge to the Monte-Carlo Framework”
  3. We use the same experimental settings across all methods, and all model parameters are initialized to zero.
    Page 7, “Adding Linguistic Knowledge to the Monte-Carlo Framework”

See all papers in Proc. ACL 2011 that mention model parameters.

See all papers in Proc. ACL that mention model parameters.

Back to top.

weight vector

Appears in 3 sentences as: weight vector (3)
In Learning to Win by Reading Manuals in a Monte-Carlo Framework
  1. Here f (s, a) E R” is a real-valued feature function, and U7 is a weight vector .
    Page 4, “Monte-Carlo Framework for Computer Games”
  2. where y,- is the ith hidden unit of 37, and {ii is the weight vector corresponding to yi.
    Page 4, “Adding Linguistic Knowledge to the Monte-Carlo Framework”
  3. Q(8t7a’t7 Z “7 ° f7 where 7.3 is the weight vector .
    Page 5, “Adding Linguistic Knowledge to the Monte-Carlo Framework”

See all papers in Proc. ACL 2011 that mention weight vector.

See all papers in Proc. ACL that mention weight vector.

Back to top.