Exploration in non-stationary bandits | Specifically, when an option is not sampled its variance grows because of the nonstationarity of the underlying generative model , effectively driving it backwards in the tree (Fig. |
Information sampling | G. Example sequence of samples and estimates of mean and variance, v = 0.90 for means drawn from the generative model . |
Introduction | MDPs provide a general modeling framework, useful in tasks where the future depends upon what one chooses in the present. |