Index of papers in PLOS Comp. Biol. that mention
  • reinforcement learning
Vassilios Christopoulos, James Bonaiuto, Richard A. Andersen
Computing policy desirability
One of the novelties of the present framework is that the weights of the connections between the goods value field and the motor plan formation field are plastic and are modified using a simple reinforcement learning rule.
Discussion
The model presented here bears some similarity to decision-making models in the hierarchical reinforcement learning (HRL) framework [59].
Introduction
Unlike classic neurodynamical models that used hardwired associations between sensory inputs and motor outputs, we allow for plasticity in the connections between specific dynamic neural fields (DNFs) and, using reinforcement learning mechanisms, we show how action-selection is influenced by trained sensorimotor associations or changing reward contingencies.
Learning sensorimotor associations and reward contingencies
This is due to reinforcement learning on connection weights between the goods value field and context cue field and the motor plan formation DNF, and simulates the process by which cortico-basal ganglia networks map contextual cues and reward expectancy onto actions [60].
Motor plan formation field
The activity in the field Ureward was computed by multiplying a multivariate Gaussian encoding stimulus position with a set of weights, Wreward, which were also updated using reinforcement learning following every trial T:
Supporting Information
This layer was fully connected With both reach and saccade fields, With weights initially randomized with 10W random values and trainable through reinforcement learning (see main manuscript for more details).
Visuomotor decisions with competing alternatives
This layer was fully connected with each motor plan formation DNF, with weights initially randomized with low random values and trainable through reinforcement learning (SI Fig.
Visuomotor decisions with competing alternatives
These weights were updated using reinforcement learning following every trial, T, with a reward given for moving to a target with the cued effector:
reinforcement learning is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Kai Olav Ellefsen, Jean-Baptiste Mouret, Jeff Clune
A Connection Cost Increases Performance and Modularity
Networks evolved in the P&CC treatment tend to create a separate reinforcement learning module that contains the reward and punishment inputs and most or all neuromodu-latory neurons (Fig.
Abstract
Modularity can further improve learning by having a reinforcement learning module separate from sensory processing modules, allowing learning to happen only in response to a positive or negative reward.
Abstract
We show that this connection cost technique causes modularity, confirming a previous result, and that such sparsely connected, modular networks have higher overall performance because they learn new skills faster while retaining old skills more and because they have a separate reinforcement learning module.
Background
This type of plasticity-controlling neuromodulation has been successfully applied when evolving neural networks that solve reinforcement learning problems [25, 46], and a comparison found that evolution was able to solve more complex tasks with neuromodu-lated Hebbian learning than with Hebbian learning alone [25].
Li - im Illfiflllmlllfifllll EMEIIMEIIMMHIMEI
The final two inputs are for reinforcement learning : inputs 9 and 10 are reward and punishment signals that fire when a nutritious or poisonous food item is eaten, respectively.
Neural Network Model Details
We utilize a standard network model common in previous studies of the evolution of modularity [23, 57], extended with neuromodulatory neurons to add reinforcement learning dynamics [25, 69].
The Importance of Neuromodulation
This finding is in line with previous work demonstrating that neuromodulation allows evolution to solve more compleX reinforcement learning problems than purely Hebbian learning [25].
reinforcement learning is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Jaldert O. Rombouts, Sander M. Bohte, Pieter R. Roelfsema
Comparison to previous modeling approaches
There has been substantial progress in biologically inspired reinforcement learning models with spiking neurons [68—71] and with models that approximate population activity with continuous variables [14,16,21,44,67,72—74].
Conclusions
The finding that a single network can be trained by trial and error to perform these diverse tasks implies that these learning problems now fit into a unified reinforcement learning framework.
Introduction
We here outline AuGMEnT (Attention-Gated MEmory Tagging), a new reinforcement learning [7] scheme that explains the formation of working memories during trial-and-error learning and that is inspired by the role of attention and neuromodulatory systems in the gating of neuronal plasticity.
Introduction
AuGMEnT solves this problem like previous temporal-difference reinforcement learning (RL) theories [7].
reinforcement learning is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Bruno B. Averbeck
Conclusion
For example, the beta or inverse temperature parameter in delta-rule reinforcement learning (DRRL) is often thought to control the “explore-exploit” tradeoff.
Exploration in a stationary two-armed bandit
When the environment is unknown, and model-free reinforcement learning (RL) is used to learn the environment [27], exploration can be used to drive the RL algorithm to sample from the complete space of possible options.
Introduction
Correspondingly, even reinforcement learning tasks, where choices do affect the information that will be available for future choices, are often modeled using delta rule reinforcement learning (DRRL) or logistic regression, neither of which provides a normative description of the task.
reinforcement learning is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: