A Biologically Plausible Computational Theory for Value Integration and Action Selection in Decisions with Competing Alternatives
Vassilios Christopoulos, James Bonaiuto, Richard A. Andersen

Abstract

Although decisions can be as simple as choosing a goal and then pursuing it, humans and animals usually have to make decisions in dynamic environments where the value and the availability of an option change unpredictably with time and previous actions. A predator chasing multiple prey exemplifies how goals can dynamically change and compete during ongoing actions. Classical psychological theories posit that decision making takes place within frontal areas and is a separate process from perception and action. However, recent findings argue for additional mechanisms and suggest the decisions between actions often emerge through a continuous competition within the same brain regions that plan and guide action execution. According to these findings, the sensorimotor system generates concurrent action-plans for competing goals and uses online information to bias the competition until a single goal is pursued. This information is diverse, relating to both the dynamic value of the goal and the cost of acting, creating a challenging problem in integrating information across these diverse variables in real time. We introduce a computational framework for dynamically integrating value information from disparate sources in decision tasks with competing actions. We evaluated the framework in a series of oculomotor and reaching decision tasks and found that it captures many features of choice/motor behavior, as well as its neural underpinnings that previously have eluded a common explanation.

Author Summary

In high-pressure situations, such as driving on a highway or flying a plane, people have limited time to select between competing options While acting. Each option is usually accompanied with reward benefits (e.g., avoid traffic) and action costs (e.g., fuel consumption) that characterize the value of the option. The value and the availability of an option can change dynamically even during ongoing actions which compounds the decision-making challenge. How the brain dynamically integrates value information from disparate sources and selects between competing options is still poorly understood. In the current study, we present a neurodynamical framework to show how a distributed brain network can solve the problem of value integration and action selection in decisions with competing alternatives. It combines dynamic neural field theory with stochastic optimal control theory, and includes circuitry for perception, eXpected reward, effort cost and decision-making. It provides a principled way to eXplain both the neural and the behavioral findings from a series of visuomotor decision tasks in human and animal studies. For instance, the model shows how the competitive interactions between populations of neurons within and between sensorimotor regions can result in “spatial-averaging” movements, and how decision-variables influence neural activity and choice behavior.

Introduction

Selecting between alternatives requires assigning and integrating values along a multitude of dimensions with different currencies, like the energetic cost of movement and monetary reward. Solving this problem requires integrating disparate value dimensions into a single variable that characterizes the “attractiveness” of each option. In dynamic decisions, in which the environment changes over time, this multidimensional integration must be updated across time. Despite the significant progress that has been made in understanding the mechanisms underlying dynamic decisions, little is known on how the brain integrates information online and while acting to select the best option at any moment.

According to this theory, multiple decision determinants are integrated into a subjective economic value at the time of choice. The subjective values are independently computed for each alternative option and compared within the space of goods, independent of the sensorimotor contingencies of choice. Once a decision is made, the action planning begins. This view is in accordance with evidence suggesting the convergence of subjective value in the orbitofrontal corteX (OFC) and ventromedial prefrontal cortex (vaFC), where the best alternative is selected (for a review in “goods-based” theory see [4]). Despite the intuitive appeal of this theory, it is limited by the serial order assumption. Although many economic decisions can be as simple as choosing a goal and pursuing it, like choosing between renting or buying a house, humans evolved to survive in hostile and dynamic environments, where goal availability and value can change with time and previous actions, entangling goal decisions with action selection. Consider a hypothetical hunting scenario, in which a predator is faced with multiple alternative valuable goods (i.e., prey). Once the chase begins, both the relative value of the goods and the cost of the actions to pursue these goods will change continuously (e.g., a new prey may appear or a current prey may escape from the field), and what is currently the best option may not be the best or even available in the near future. In such situations, the goals dynamically compete during movement, and it is not possible to clearly separate goal decision-making from action selection.

According to this “action-based” theory, when the brain is faced with multiple goals, it initiates concurrent and partially prepared action-plans that compete for selection and uses value information accumulated during ongoing actions to bias this competition, until a single goal is pursued [6—11]. This theory received support from neurophysiological [8, 12—15] and behavioral [9, 10, 16—21] studies, and it is in accord with the continuous flow model of perception, which suggests that response preparation can begin even before the goal is fully identified and a decision is made [22—24]. However, it is vague as to how action costs are dynamically integrated with good values and other types of information.

It allows dynamic integration of value information from disparate sources, and provides a framework that is rich enough to explain a broad range of phenomena that have previously eluded common explanation. It builds on successful models in dynamic neural field theory [25] and stochastic optimal control theory [26] and includes circuitry for perception, expected reward, selection bias, decision-making and effort cost. We show that the complex problem of action selection in the presence of multiple competing goals can be decomposed into a weighted mixture of individual control policies, each of which produces optimal action-plans (i.e., sequences of actions) to pursue particular goals. The key novelty is a relative desirability computation that dynamically integrates value information to a single variable, which reflects how “desirable” it is to follow a policy (i.e., move in a particular direction), and acts as a weighting factor on each individual policy. Because desirability is state and time dependent, the weighted mixture of policies automatically produces a range of behavior, from winner-take-all to weighted averaging. Another important characteristic of this framework is that it is able to learn sensorimotor associations and adapt the choice behavior to changes in decision values of the goals. Unlike classic neurodynamical models that used hardwired associations between sensory inputs and motor outputs, we allow for plasticity in the connections between specific dynamic neural fields (DNFs) and, using reinforcement learning mechanisms, we show how action-selection is influenced by trained sensorimotor associations or changing reward contingencies.

It provides insights to a variety of findings in neurophysiological and behavioral studies, such as the competitive interactions between populations of neurons within [27, 28] and between [29] brain regions that result frequently in spatial averaging movements [10, 18, 30] and the effects of decisions variables on neuronal activity [31, 32]. We also make novel predictions concerning how changes in reward contingencies or introduction of new rules (i.e., assigning behavioral relevance to arbitrary stimuli) influence the network plasticity and the choice behavior.

Results

Model architecture

Each DNF simulates the dynamic evolution of firing rate activity Within a neural population. It is based on the concept of population coding, in Which each neuron has a response tuning curve over some set of inputs, such as the location of a good or the endpoint of a planned movement, and the responses of a neuronal ensemble represent the values of the inputs. The functional properties of each DNF are determined by the lateral interactions Within the field and the connections With the other fields in the architecture. Some of these connections are static and predefined, Whereas others are dynamic and change during the task. The projections between the fields are topologically organized, that is, each neuron in one field drives activation of the corresponding neuron (coding for the same direction) in the fields to Which it projects. Let’s consider the hunting scenario, in Which a predator is faced With multiple goods (i.e., prey) located at different distances and directions from the current state Xt. The architectural organization of the present framework to model this type of problem is shown in Fig. 1. The “spa-tial sensory input field” encodes the angular spatial representation of the alternative goods in an egocentric reference frame. The “goods value field” encodes the goods values of pursuing each prey animal irrespective of sensorimotor contingencies (e.g., effort). The “context cue field” represents information related to the contextual requirements of the task. The outputs of these three fields send excitatory projections to the “motor plan formation field” in a topological manner. Each neuron in the motor plan formation field is linked with a motor control schema that generates both a direction-specific optimal policy 71,-, which is a mapping between states and best-ac-tions, and an action cost function Vni that computes the expected control cost to move in the direction gbz- from any state (see Methods section and SI text for more details). It is important to note that a policy is not a particular sequence of actions—it is rather a function that calculates the best action-plan u,- (i.e., sequences of actions/motor commands) to take from the current state X, to move in the direction gbl- for tend time-steps (i.e., 71,-(xt) = u,- = [ub um, . . ., utend] ).

Once a motor schema is active, it generates a policy 71(xt) towards the preferred direction of the corresponding neuron. The associated action cost is encoded by the action cost field. The “action cost field” in turn inhibits the motor plan formation field via topological projections. Therefore, the role of the motor plan formation field is two fold: i) trigger the motor schemes to generate policies and ii) integrate information associated with actions, goals and contextual requirements into a single value that characterizes the “at-tractiveness” of each of the available policies. The output of the motor plan formation field encodes what we call the relative desirability of each policy and the activity of the field is used to weigh the influence of each policy in the final action. As soon as the chase begins, the action costs and the estimates of the values related to the goals will change continuously—e.g., a new prey may appear at the field modulating the relative reward of the alternatives. The advantage of our theory is that it integrates value information from disparate sources dynamically while the action unfolds. Hence, the relative desirability of each policy is state and time dependent and the weighted mixture of policies produces a range of behavior from winner-take-all selection of a policy to averaging of several policies.

Visuomotor decisions with competing alternatives

The general framework described above can be translated into more concrete and testable scenarios, such as Visuomotor decision tasks with competing goals and/ or effectors for comparison with experimental data. We show how the proposed computational model can be extended to involve motor plan formation DNFs for various effectors which interact competitively to implement effector as well as spatial decision-making.

In the current section, we present a simplified version of the framework that involves only reaching, Fig. 2. The reach motor plan formation DNF encodes the direction of intended arm movements in a spatial reference frame centered on the hand. Various decision values are therefore transformed from allocentric to egocentric representations centered on the hand before being input to the reach motor plan formation DNF. The DNF receives input encoding the location of the stimulus, the expected reward associated with moving in each direction and the action cost. The location of each stimulus is encoded in the spatial sensory input field as a Gaussian population code centered on the direction of the stimulus with respect to the hand. The expected reward for moving to given locations is effector-independent and encoded as a multivariate Gaussian population in allocentric coordinates. This representation is transformed into a one dimensional vector in the goods value field representing the expected reward for moving in each direction, centered on the hand position.

The DNF is run until it reaches a peak activity level above 7/, after which its values are used to weigh the policy of each motor schema, resulting in a reach movement. The operation of the framework can be easily understood in the context of particular reaching choice tasks that involve action selection in the presence of competing targets. We designed and simulated an eXperiment that involves rapid reaching movements towards multiple potential targets, with only one being cued after movement onset [9, 10, 18—21]. We implemented the rapid responses required in this task by reducing the activation threshold 7/, so that the controllers are triggered shortly after the targets are displayed. In some of these trials, a single target was presented—the model knew in advance the actual location of the target. The results showed that the present framework can reproduce many characteristics of neuronal activity and behavior reported in the eXperimental literature. Particularly, the presence of two potential targets caused an increase of the activity of two ensembles of neurons, each of them tuned to one of the targets, Fig. 3. Importantly, the motor plan formation DNF activity was weaker compared to activity generated in single-target trials (i.e., no uncertainty about the actual location of the target), due to competitive field interactions, Fig. 3. When one of the targets was cued for action, the activity of the corresponding neuronal ensemble increased, while the activity of the non-cued target was reduced. These findings are consistent with neurophysiological studies on hand reaching tasks, showing the existence of simultaneous discrete directional signals associated with the targets in sensorimotor regions, before making a decision between them [7, 8, 13, 27, 33]. The neuronal activity was weaker while both potential targets were presented prior to the reach onset, compared to the single-target trials [27, 33].

In particular, in the two-target trials, the neuronal ensembles in the motor plan formation field remained active and competed against each other for a period of time, until the actual target was cued, and the activity of the non-cued target vanished, Fig. 3. Finally, the competition between the two populations of neurons caused spatial averaging movements, as opposed to straight reaches in the single-target trials, Fig. 3. This finding is in agreement with experimental studies, which showed that when multiple potential targets are presented simultaneously, and participants are forced to rapidly act upon them without knowing which one will be the actual target, they start reaching towards an intermediate location, before correcting the trajectories “in-flight” to the cued target location [9, 10, 16, 18—21]. In a similar manner, the present framework can be easily extended to model other rapid reaching choice experiments, such as eye and hand movements made to targets in the presence of non-target visual distractors [16, 34, 35].

So far, we have focused on visuomotor decision tasks with competing targets and shown that decisions emerge through mutual inhibition of populations of neurons that encode the direction of movements to the targets. However, the motor system is inherently redundant and frequently the brain has to solve another competition problem—which effector to use to pursue an action. Neurophysiological studies in nonhuman primates have shown that when the animals are presented with a target, which can be pursued either by reaching or saccades, both the parietal reach region (PRR) and lateral intra-parietal area (LIP), which are specialized for planning reaching and saccadic movements, respectively, are simultaneously active before an effector is selected [29, 36]. Once an effector is chosen, the neuronal activity in the corresponding cortical area increases, while the activity in the area selective for the non-selected effector decreases. Similar findings have also been reported in human studies for arm selection in unimanual reaching tasks [37]. These results suggest that a similar mechanism for action selection with competing goals also accounts for selecting between competing effectors.

Input to the saccade network was encoded in eye-centered coordinates, while input to the reach network was encoded in hand-centered coordinates. Each motor plan formation DNF also received inhibitory projections from every neuron in the other motor plan formation DNF, implementing effector selection in addition to the spatial selection performed by each DNF in isolation. The context cue field encoded the task context with half of its neurons responding to a saccade cue and half responding to a reach cue. This layer was fully connected with each motor plan formation DNF, with weights initially randomized with low random values and trainable through reinforcement learning (SI Fig. in supporting information shows analytically the architecture of the framework for effector choice tasks).

Fig. 4 depicts a trial from an “effector-decision” task with a single target similar to the neurophysiological study described above [29]. A target is presented 50 time-steps after the trial onset, followed by a “green” cue signal 20 time-steps later. Consistent with the experimental findings, once the target is presented, the activity of the populations of neurons tuned to the direction of the target increases in the motor plan formation DNFs for both reaching and sac-cadic movements, since the model does not yet know whether it is a “reach” or “saccade” trial. Notice that during that time, the neuronal activity in the reach DNF is lower than the activity in this field for the single target trial with no effector competition (see Fig. 3 upper panel), due to the inhibitory interactions with the motor plan formation DNF for saccadic movements. Once the green cue is presented, the neuronal activity in the reach DNF becomes strong enough to inhibit the saccade DNF and conclusively wins the competition, and the model generates a direct reaching movement to the target.

This raises the question of how the brain may solve a competition that involves both multiple goals and effectors. To address this question, we designed and simulated a novel visuomotor decision task that involves multiple competing targets which can be acquired by either saccadic or reaching movements. To the best of our knowledge, this task has not been studied yet with experiments in humans or animals. The left and right panels in mi illustrate the simulated neural activity in motor plan formation DNFs for eye and hand movements for a characteristic trial in “free-choice” and ‘cued-reaching” sessions, respectively. In the free-choice condition, two equally rewarded targets are presented in both hemif1elds 5O time-steps after the beginning of the trial followed by a free-choice cue (i.e., red and green cues simultaneously presented on the center of the screen) 50 time steps later. In this condition, the framework is free to choose either of the two effectors to acquire any of the targets. In the cued condition, a green or a red cue is presented 50 time steps after the onset of the targets, indicating Which effector should be used to acquire either of the targets. There are several interesting features in the simulated neuronal activities and behavior:

This finding is shown better in Fig. 6A and B that depict the average activity of the two populations of neurons tuned to the selected (solid black lines) and the non-selected (discontinuous black lines) targets from both DNFs that plan reaching (green) and eye (red) movements in the free-choice and cued-reaching sessions, respectively. Notice that in both sessions, the framework chooses first which effector to use (i.e., hand or eye) and then it selects which target to acquire. This is eXplained by the massive inhibitory interconnections between the DNFs that plan reaches and saccades. (ii) The effector competition takes more time to be resolved in free-choice trials than in cued trials. (iii) Because it takes longer to decide Which effector to use in free-choice trials, the competition between the two targets is usually resolved before the movement onset, frequently resulting in direct reaching or saccadic movements to the selected target (the green trace in the left panel of Fig. 5 is a characteristic example of a reaching trajectory from free-choice trials).

Once the cue appears, the activity of the neurons Which are tuned to both targets in the motor plan formation DNF of the cued effector increases, inhibiting the neurons in the motor plan formation DNF of the non-cued effector. However, the excitation of the populations of neurons in the DNF of the cued effector amplifies the spatial competition Which selects a particular target (see also Fig. 6B). Frequently this competition is not resolved before the movement onset, resulting in curved trajectories (i.e., spatial averaging movements) (see the green trace in Fig. 5 right panel).

The results illustrated in Fig. 6C and D show that free-choice trials are characterized by slower response times and straighter trajectories, whereas cued trials are characterized by faster response times and more highly curved trajectories to the selected targets.

In such a scenario, the motor plan formation DNF for the effector associated with the given cue received enough input to almost completely inhibit the other DNF. Once the targets appeared, the decision activity proceeded exclusively in the DNF for the cued effector and the only thing that remained to be resolved was the competition between the targets. Fig. 7 depicts such a scenario with 3 targets, in which the cue is presented 50 time steps after the trial starts, followed by the target onset 50 time steps later. Notice that the effector competition is resolved almost immediately after cue onset. All of these findings are novel and have not been validated in human or animal eXperimental studies, suggesting new avenues to understand the neural and behavioral mechanisms of action-selection in decisions with competing alternatives.

Although the present framework makes qualitative predictions of many features of neuronal activity and behavior in an effector choice task, it assumes that the brain knows a priori, the association between color cues and actions. How the brain learns sensorimotor associations remains poorly understood. We studied the neural mechanisms underlying sensorimotor association learning by training the model to distinguish between two cues which signal whether it is required to perform a reach (“green” cue) or saccade (“red” cue) to a target. During each trial, a cue was presented, followed by a target randomly presented in one of three positions. The context cue field with half of its neurons selective for the reach cue and half for the saccade cue was fully connected with each motor plan formation DNF. These weights were updated using reinforcement learning following every trial, T, with a reward given for moving to a target with the cued effector: Where Wcue is the weight matrix for connections between the cue-selective population and the motor plan formation DNF for the effector, awe is the learning rate, r is the reward signal (1 if the task was successfully performed, 0 otherwise), ucue is the vector of cue-selective neuron

Motor plan formation field

activity, and Eeficector is the eligibility trace for the effector (a decaying copy of the associated motor plan formation DNF activity) [38].

A characteristic example of incorrect trials during the learning period is shown in Fig. 8. The “green” cue is presented at about 50 time-steps after the trial starts, increasing the activity in both DNFs that plan saccadic and reach movements, because the framework is still learning the sensorimotor associations. Once the target appears, the DNF that forms the eye movements wins the competition and the model performs a saccade, although a “green” cue is presented. The evolution of the context cue connection weights during training is shown in Fig. 9A-D. Fig. 9A shows the average connection weights between the red cue (cue 1) population and saccade motor plan formation DNF as training progressed for 500 trials. Similarly, Fig. 9B shows the mean connection weights from the green cue (cue 2) neurons to the saccade motor plan formation, and Fig. 9C and 9D show the mean connection weights from the red cue and the green cue populations to the reach motor plan formation DNF. After just over 200 training trials, the model learned the sensorimotor associations and its performance reached 100% (Fig. 9E).

Thus far, we have considered only cases in which the competing targets are equally rewarded for correct choices. However, people frequently make decisions between alternative options with different values. In the current section, we show how decision variables, such as the eXpected reward attached to a target, influence choice/motor behavior and its neural underpinnings in decisions with multiple competing alternatives. We designed and simulated a reaching eXperiment that included choices between two targets which appeared simultaneously in the left and right hemif1elds at equal distance from the starting hand position. Both targets appeared 50 time-steps after the trial onset and had either the same eXpected reward (“equal-reward” trials) or one of them had higher eXpected reward than the other (“unequal-reward” trials).

Particularly, we found that choices were biased towards the higher valued target, and the movement time was significantly lower when choosing the most preferred option over the other in the two-target trials with unequal eXpected reward. To illustrate this, consider a scenario where two targets are presented simultaneously in both visual fields, and the left target has 3 times higher eXpected reward than the right one in the unequal-reward trials. Fig. 10A depicts the proportion of choices to the left and the right target, in both equal-reward and unequal-reward conditions. Notice the significant choice bias to the higher valued target in the unequal-reward choices. Fig. 10B illustrates the distribution of the movement time after 100 trials for equal-reward (gray bars) and unequal reward (black bars) choices. The movement time distribution is approximately Gaussian when choices are made between equally rewarded options. However, it becomes increasingly skewed to the right in unequal reward choices (two-sample Kolmogorov-Smirnov test, p = 0.0131). We also computed the average movement time for selecting the left and the right target both in equal-reward and unequal-reward trials. The results presented in Fig. 10C show that the movement time was about the same when selecting either target in the equal-reward choices (two-sample t-test p = 0.3075). However, the movement time significantly decreased when choosing the most favored option over then less favored option in the unequal-reward trials (two-sample t-test p < 10‘6). Similar results were found for saccade choices (results are not shown here for the sake of brevity). These predictions have been extensively documented in a variety of visuomotor tasks, which showed that reward eXpectancy modulates both the choice and the motor behavior. Particularly, when subjects had to decide among options with different eXpected reward values, the choices were more likely to be allocated to the most rewarded option [32, 39]. Moreover, psychophysical experiments in humans and animals showed that the response time (i.e., movement time) is negatively correlated with the eXpected value of the targets [39—41].

Fig. 10D and E illustrate the time course of the average activity of the two populations of neurons from 100 trials in the “equal-reward” and “unequal-reward” conditions, respectively. When both targets are associated with the same expected reward, both populations have about the same average activity across time resulting in a strong competition for selecting between the two targets and hence no significant choice bias and slower movement times. On the other hand, when one of the targets provides more reward than the other, the average activity of the neuronal ensemble selective for that target is significantly higher than the activity of the neurons tuned to the alternative target. Thus, the competition is usually resolved faster, resulting frequently in a selection bias towards to the most valuable target and faster movement time. Occasionally, the neuronal population in the reach DNF, which is selective to the target with the lower eXpected reward, wins the competition, and the lower valued target is selected. However, it takes more time on average to win the competition (i.e., longer movement time), because it receives weaker excitatory inputs from the goods-value field compared to the neuronal ensemble, which is selective for the higher valued target (see 82 Fig. in supporting information for more details.) These findings are consistent with a series of neurophysiological studies in nonhuman primates, which showed that decision variables, such as the eXpected reward, reward received over a recent period of time (local income), hazard rate and others modulate neurons in parietal corteX and premotor corteX both in reaching and saccade tasks [Q, Q, fl—fi]. The proposed computational framework suggests that this effect occurs due to the excitatory inputs from the goods value field that encodes the goods-related decision variables. When one of the alternative options is associated with higher eXpected reward, the “goods value” neurons that are tuned to this target have higher activity than the neurons related to less valuable options. Hence, the competition is biased towards to the alternative associated with greater reward. Note that similar findings can be predicted by the present framework for effort-based decisions, when one of the alternatives requires more effort to be acquired than the rest of the options. Particularly, the framework predicts that when two equally rewarded targets are simultaneously presented in the field, the choices are biased towards the “less-expensive” (i.e., less effort) target, and reaching movements become faster when they are made to that target (results are not shown here for the sake of brevity).

In the previous analysis, we assumed that the framework has already learned the values of the alternative options before taking any action. However, in many conditions, people and animals have to eXplore the environment to learn the values of the options. In the current section, we present the mechanism that the framework uses to learn reward contingencies. Each motor plan formation DNF received input signalling the eXpected reward for moving to a particular location. The eXpected reward was represented in a two-di-mensional neural field, Urewmd, in allocentric coordinates and converted to an egocentric representation in the goods value field centered on the appropriate effector before being input to each motor plan formation DNF (see Methods section). The activity in the field Ureward was computed by multiplying a multivariate Gaussian encoding stimulus position with a set of weights, Wreward, which were also updated using reinforcement learning following every trial T: where areward is the learning rate, r is the reward signal (1 or O), and Espatial is the eligibility trace. In this case the reward is effector-independent, so the eligibility trace simply encodes the location of the last movement (whether reach or saccade) as a multivariate Guassian.

The evolution of the weights, Wreward, over 500 training trials is shown in Fig. 11A converted to a two-dimensional egocentric frame. The weights projecting to neurons representing each target were initialized with equal levels of eXpected reward. After approximately 300 training trials, the weights to neurons encoding the right target decreased enough to reach almost 100 percent accuracy (Fig. 11B). Because the eXpected reward signal was broadcast to both motor plan formation DNFs, the model made both reaching and saccade movements at equal frequency.

Discussion

Classic serial theories, such as the “goods-based” model, suggest that decision-making is an independent cognitive process from the sensorimotor processes that implement the resulting choice [1]. According to this view, decisions are made in higher cognitive centers by integrating all the decision values of an option into a subjective value and selecting the best alternative option. Once a decision is made, the sensorimotor system implements the selected choice [2—4]. Clinical and neurophysiological studies have shown that a neural representation of subjective values exists in certain brain areas, most notably in the orbitofrontal corteX (OFC) and ventromedial prefrontal corteX (vaFC) [3, 45].

This raises the question of how we decide between competing actions in dynamic environments, where the value and the availability of an option can change unpredictably. One possibility that has been suggested by a number of researchers is that decisions between actions are made through a continuous competition between concurrent action-plans related to the alternative goals [8, 11, 13, 46]. According to this “action-based” theory, the competition is biased by decision variables that may come from higher cognitive regions, such as the frontal cortex, but the sensorimotor system is the one that decides which action to select [8, 15]. This view has received apparent support from series of neurophysiological studies, which found neural correlates of decision variables within sensorimotor brain regions (for review, see [7, 13, 15] ). Additionally, recent studies in nonhuman primates have shown that reversible pharmacological inactivation of cortical and sub-cortical regions, which have been associated with action planning or multisen-sory integration, such as lateral intraparietal area (LIP) [47, 48], superior colliculus (SC) [49], and dorsal pulvinar [50], cause decision biases towards targets in the intact visual field. These findings suggest that sensorimotor and midbrain regions may be causally involved in the process of decision-making and action-selection.

Several computational frameworks exist that explain how information from disparate sources, such as goods values, action costs, prior knowledge, and perceptual information, are integrated dynamically to evaluate and compare the available options [51—53]. However none of these generate continuous actions which change the desirability and availability of options as they unfold. Instead, they generate discrete choices and not population codes that can be used to guide action in continuous parameter spaces, and they do not capture the interactions between networks of brain regions in any realistic way (see below). In the current study, we propose a neurodynamical framework that models the neural mechanisms of value integration and action-selection in decisions with competing options. It is comprised of a series of dynamic neural fields (DNFs) that simulate the neural processes underlying motor plan formation, expected reward and effort cost. Each neuron in the motor plan formation DNF is linked with a stochastic optimal control schema that generates policies towards the preferred direction of the neuron. A policy is a function that maps the current state into optimal sequences of actions. The key novelty of the framework is that information related to goals, actions and contextual requirements is dynamically integrated by the motor plan formation DNF neurons and that this information changes and is re-eval-uated as the action is performed. The current activity of each of these DNF neurons encodes what we call “relative desirability” of the alternative policies, because it reflects how “desirable” it is to follow a particular policy (i.e., move in a particular direction) with respect to the alternative options at a current state, and weights the influence of each individual policy in the final action-plan.

Although the framework also employs an “accumulator” mechanism to dynamically integrate value information and decide between options, and an action threshold to determine when to trigger motor schemas, it is quite different from these classic models. The classic models assume that populations of neurons in particular brain areas accumulate the sensory evidence and other brain areas compare the accumulated evidence to make decisions. For instance, studies in oculomotor decision-making suggest that LIP neurons accumulate the evidence associated with the alternative options and the accumulated activity of these neurons is compared by “decision brain areas” to select an option [13, 56]. Unlike the classic models, in the present framework the decisions are made through a dynamic transition from “weighted averaging” of individual policies to “winner-take-all”—i.e., a set of policies associated with an available goal dominates the rest of the alternatives—within the same population of neurons in the motor plan formation field. It does not assign populations of neurons to individual options, rather the alternative options emerge within a distributed population of neurons based on the current sensory input and prior knowledge. Once any neuron reaches the action threshold its associated motor schema is activated, but the decision process continues and other motor schemas can become active during the performance of an action. Because of these characteristics, the present framework can handle not only binary choices, but also decisions with multiple competing alternatives, does not require a “decision” threshold (although it does use a pre-def1ned “movement initiation” threshold), and can model decisions in which subjects cannot wait to accumulate evidence before selecting an action; rather they have to make a decision while acting.

However, these approaches do not incorporate the idea of dynamically integrating value information from disparate sources (with the exception of Cisek’s (2006) model [58] , which demonstrated how other regions, including prefrontal cortex, influence the competition), they do not model action selection tasks with competing effectors, and they do not model the eye/hand movement trajectories generated to acquire the choices. By combining dynamic neural fields with stochastic optimal control systems the present framework explains a broad range of findings from experimental studies in both humans and animals, such as the influence of decision variables on the neuronal activity in parietal and premotor cortex areas, the effect of action competition on both motor and decision behavior, and the influence of effector competition on the neuronal activity in cortical areas that plan eye and hand movements.

HRL suggests that decision-making takes place at various levels of abstraction, with higher levels determining overall goals and lower levels determining sub-sequences of actions to achieve them. HRL does capture the dynamic aspect of decision-making shown by our model in that it reevaluates all goals and selects the best current goal at each time point, but there are three major differences. The first is that HRL chooses a single policy and typically pursues it until all actions in a sequence are performed, whereas our model uses a weighted mixture of policies and has no notion of a fixed sequence of actions. The second is that HRL uses a softmax rule to choose a goal, while our model uses both the “goods-” and “action-based” signals. Third, while an HRL framework could be used in place of our model’s motor plan formation DNFs to select an effector and target, our model goes beyond that by interfacing this system with optimal motor control models to control the arm and eyes in real time.

For instance, we predict that when the brain is faced with multiple equally rewarded goals and also has to select between competing effectors to implement the choice, it first resolves the effector competition (i.e., decides which effector to use) before selecting which goal to pursue. This effect is related to the inhibitory interactions within and between the fields that plan the movements. The motor plan formation fields are characterized by local excitatory and global one-to-many inhibitory connections. However, the interactions between the motor plan formation fields of the competitive effectors are characterized by global inhibitory interactions in which each neuron in one motor plan formation field inhibits all neurons in the other motor plan formation field, resulting in greater net inhibition between rather than within fields. It is because of this architecture that the effector competition is resolved prior to target competition. Although this prediction has not been tested in experimental studies, a recent fMRI study in humans showed that when people have to select whether to use the left or the right hand for reaching to a single target, the effector selection precedes the planning of the reaching movement in the dorsal parietofrontal corteX [37]. We should point out that this prediction is made when both targets have the same eXpected value and requires the same effort. By varying the value of the targets or the action cost, it is possible that the framework will first select the most “desirable” target, and then it will choose the best effector to implement the choice.

Learning sensorimotor associations and reward contingencies

This is due to reinforcement learning on connection weights between the goods value field and context cue field and the motor plan formation DNF, and simulates the process by which cortico-basal ganglia networks map contextual cues and reward expectancy onto actions [60]. An unexpected result of our simulations was that although the model was trained on the cued effector task with targets in multiple locations, it could only learn the task when it was presented with a single target in one of the possible locations on each trial, rather than multiple targets at once. This was due to interference between the sensorimotor association and reward contingency learning processes. As the model was rewarded for reaching to a particular target with the correct effector, it began to associate that target with reward in addition to the effector. While various parameter settings which increased the noise levels in the network would promote exploration and thus spatial generalization of the effector cue, the learning proceeded optimally when a single target was presented in one of many possible locations on each trial. This effect has been extensively documented in many studies and is known as “dual-task interfer-ence” [61]—when conflicting and interfering stream of information must be processed simultaneously in dual tasks (e.g., talking on the phone while driving), the performance of the tasks is deteriorated substantially.

Mapping to neurophysiology

The computational framework presented in the current study is a systems-level framework aimed to qualitatively model and predict response patterns of neuronal activities in ensembles of neurons, as well as decision and motor behavior in action selection tasks With competing alternatives. It is not intended to serve as a rigorous anatomical model and because of this we avoid making any strict association between the components of the framework (i.e., individual DNFs and control schemes) and particular cortical and subcortical regions. However, it captures many features of neuronal activity recorded from different cortical areas such as the parietal reach region (PRR), area 5, lateral intraparietal area (LIP), premotor cortex, prefrontal cortex (PFC) and orbitofrontal cortex (OFC) in nonhuman primates that perform reaching and saccadic decision tasks with competing options.

The “spatial sensory input field” encodes the spatial location of the targets in an egocentric reference frame and mimics the organization of the posterior parietal cortex (PPC). The “context cue field” represents information related to the task context (i.e., which effector to use to acquire the targets). Several neurophysiological studies have reported context-dependent neurons in the lateral PFC (LPFC) in nonhuman primates [62—64]. These neurons respond differently to the same stimulus when it requires different responses depending on the task context, whereas they are not sensitive to the color or pattern of the cue. The “goods value field” integrates the good values of the alternative options and represents how “desirable” it is to select a policy towards a particular direction without taking into account the sensorimotor contingencies of the choices (i.e., the action cost to implement the policy). The goods value field can be equated to ventromedial PFC (vaFC) and OFC, which according to neurophysiological and clinical studies, have an important role in computation and integration of good values [3, 4, 65]. The action cost is encoded by the “ac-tion cost field”. Although it is not clear how action costs are encoded and computed in the brain, recent findings suggest that the anterior cingulate cortex (ACC) is involved in encoding action costs [11, 66, 67]. However, other studies have shown that ACC neurons also encode good values, such as the payoff of a choice and the probability that a choice will yield a particular outcome [68].

Hence, it could be equated with parts of the premotor cortex and parietal cortex, especially the parietal reach region (PRR) and dorsal area 5, for reaches and lateral intraparietal area (LIP) for saccades, which are involved in planning of hand and eye movements, respectively. One of the novelties of the proposed computational framework is that the motor plan formation field dynamically integrates all the decision variables into a single variable named relative desirability, which describes the contribution of each individual policy to the motor decision. Because of this property, the simulated neuronal activity in this field is modulated by decision variables, such as expected reward, probability outcome and action cost. This is consistent with series of neurophysiological studies, which show that the activity of neurons in LIP and premotor dorsal area (PMd) is modulated by the probability that a particular response will result in reward and the relative reward between competing targets, respectively [12, 32]. To reduce the complexity of the framework, we included only some of the main brain areas that are involved in visuomotor tasks and omitted other relevant cortical regions, such as the primary motor cortex (M1), the somatosensory cortex, the supplementary motor areas, as well as subcortical regions such as the basal ganglia. However, important motor and cognitive processes in action-selection, such as the execution of actions and the learning of sensorimotor associations and reward contingencies are implemented using techniques that mimic neural processes of brain areas that are not included in the framework.

Conclusions

It is based on the concept that decisions between actions are not made in the medial frontal corteX through an abstract representation of values, but instead they are made within the sensorimotor corteX through a continuous competition between potential actions. By combining dynamic neural field theory with stochastic optimal control theory, we provide a principled way to understand how this competition takes place in the cerebral corteX for a variety of visuomotor decision tasks. The framework makes a series of predictions regarding the cell activity in different cortical areas and the choice/motor behavior, suggesting new avenues of research for elucidating the neurobiology of decisions between competing actions.

Methods

This section analytically describes the computational framework developed in this study to model the behavioral and neural mechanism underlying decisions between multiple potential actions.

Dynamic neural fields

Instead of some anatomically defined space, DNF models are defined over the space that is spanned by the parameters of the tasks. They are based on the concept of neuronal population coding, in Which the values of task parameters, such as the location of a stimulus or movement parameters, are determined by the distribution of the neuronal activity Within a population of neurons. Activation in DNFs is distributed continuously over the space of the encoded parameter and evolves continuously through time under the influence of external inputs, local excitation and lateral inhibition interactions as described by Equation (3): Where u(x, t) is the local activity of the neuronal population at position x and time t, and it (x, t) is the rate of change of the activation field over time scaled by a time constant T. In the absence of any external input S(x, t), the field converges over time to the resting level 11 from the current level of activation. The interactions between the neurons in the field are defined through the kernel function w(x — x’), Which consists of local excitatory and global inhibitory components, Equation (4):

In this study we used a narrow Gaussian kernel for the excitatory interactions and a broader Gaussian kernel for the inhibitory interactions, such that neurons With similar tuning curves co-excite one another, Whereas neurons With dissimilar tuning curves inhibit one another. The only fields With competitive interactions in this study were the reach and saccade motor plan formation DNFs; the other fields had cm and cinh set to zero and therefore combined and encoded their inputs Without performing further computation. The kernel function is convolved With a sigmoidal transformation of the field activity fl u(xl , t)] , such that only neurons With an activity level that exceeds a threshold participate in the intrafield interactions, Equation (5): where fl controls the steepness of the sigmoid function.

The context cue field consists of 100 neurons, in which half of them respond to a saccade cue and half of them respond to a reach cue. The following sections describe how we integrate dynamic neural field theory with stochastic optimal control theory to develop a computational framework that can eXplain both neural and behavioral mechanisms underlying a wide variety of Visuomotor decision tasks.

Action selection for decision tasks with competing alternatives

It involves solving for a policy 71’ that maps states onto actions at = 71(xt) by minimizing a loss function penalizing costly actions (i.e., effort) and deviations (i.e., accuracy) from the goal. Despite the growing popularity of stochastic optimal control models, the preponderance of them are limited only to single goals. However, real environments present people and animals at any moment with multiple competing options and demands for actions. It is still unclear how to define control policies in the presence of competing options.

The core component of the present framework is the “motor plan formation” DNF that integrates value information from disparate sources and plans the movements to acquire the targets. Each neuron in this DNF is linked with a stochastic optimal control system. When the activity of this neuron exceeds a threshold 7/ at the current state xt, the controller suggests an optimal policy 71* that results in a sequence of actions (11, = 71*(xt) = [uh um, ut+T]) to drive the effector from the current state towards the preferred direction of the neuron from a period of time T. Note that the policy 71’ is related to the preferred direction of the neuron and not to the location of the target. This is the main difference between the optimal control module used in the present framework and classic optimal control studies. However, the mathematical formulation of the optimal control theory requires defining an “end” (i.e., goal) state. In the present framework any “active” controller 1' generates a sequence of actions to move in the preferred direction of the neuron gbi for distance r—where r is the distance between the current location of the effector and the location of the stimulus in the field encoded by that neuron. For instance, let’s consider a scenario in which a target is located at a distance r from the current hand location. The control schema with a preferred direction gbi will suggest an optimal policy 7tj‘, which is given by the minimization of the loss function in Equation (6):

. ., Ti] to move the effector towards the gbl- direction; T1- is the time-to-arrive at the position p; p,- is goal-position of the effector, i.e., the position that the effector is planned to arrive at the end of the movement and is given as: p- = [rcos(¢,~), rsin(¢,-)]; XTi is the state vector at the end of the movement; 8 matriX picks out the actual position of the effector and goal-position p,- at the end of the movement from the state vector. Finally, QTi and R define the precision and the control dependent cost, respectively (see 81 text for more details). The first term of the loss function in Equation (6) determines the current goal of the controller, which is related to the neuron with preferred direction gbl- i.e., to move the effector at a distance r from the current location, towards the preferred direction gbi. The second term is the motor command cost (i.e., the action cost) that penalizes the effort required to move the effector towards this direction, for T,- time-steps.

This variable encodes the overall relative desirability of each policy with respect to the alternative options—in other words, it categorizes the state space into regions, where following one of the policies is the best option. We can write the loss function as a v-weighted mixture of individual loss functions Ii’s, Equation (7):

When there is no uncertainty as to which policy to follow at a given time—i.e., only the activity of a single neuron exceeds the threshold y—the v-weighted loss function in Equation (7) is equivalent to Equation (6) with 11,-(Xt) = 1 for the best current direction, and 111-7; ,(Xt) = 0 for the rest of the alternative options. However, when more than one neuron is active, there is uncertainty about which policy to follow at each time and state. In this case, the framework follows a weighted average of the individual policies flmix to move the effector from the current state to a new one, Equation (8): where M is the number of neurons that are currently active, TC: (xt) is the optimal policy to move in the preferred direction of the ith-neuron from the current state Xt, and v,- is the relative desirability of the optimal policy TC: that determines the contribution of this policy to the weighted mixture of policies. For notational simplicity, we omit the * sign, and from now on 71,-(xt) will indicate the optimal policy related to neuron i.

According to this strategy, the framework implements only the initial portion of the sequence of actions generated by flmix(Xt) for a short period of time k (k = 10 in this study) and then recomputes the individual optimal policies 71,-(Xt+k) from time t+ k to t+ k + T,- and remixes them. This strategy continues until the effector arrives at the selected target.

Computing policy desirability

Recall that each policy 71,- is associated With a cost Vfli(xt), Which represents the action cost—i.e., cost that is expected to accumulate While moving from the current state Xt in the direction gbi under the policy 71,-. The cost to implement each individual policy is represented by the “action cost” field, such that the higher the activity of the neurons in this field, the higher the action cost to move in the preferred direction of the neurons. Therefore, the output activity of this field new is projected to the “motor plan formation” field through one-to-one inhibitory connections in order to suppress the activity of neurons with “costly” preferred directions at a given time and state.

The present framework uses the “goods value” field to encode the good values (e.g., reward, outcome probability) associated with the available options. In the current study we assume that the goods value field represents the expected reward of the available goals, although it is straightforward to extend the field to encode other goods-related features. The expected reward is represented in a two-dimensional neural field in which each neuron is selective for a particular goal position in an allocentric reference frame, Urewmd. For each effector (eye and hand in these simulations), its position is used to convert activity in this two-dimensional field into a one-dimensional neural field encoding an effector-centered representation of the goods value of each goal. The output activity of each effector-centered field, ureward, projects to the corresponding motor plan formation field with one-to-one excitatory connections. It thus excites neurons in each motor plan formation field that drive the effector towards locations with high expected rewards. One of the novelties of the present framework is that the weights of the connections between the goods value field and the motor plan formation field are plastic and are modified using a simple reinforcement learning rule.

The model includes a “context cue field” with neurons which respond noisily to the presence of one of the cues in the task. We used a context cue field with 100 neurons, half of which responded to the reach cue and the rest responded to the saccade cue. The output of this field, ucue, projects to each of the motor plan formation fields with one-to-all excitatory connections. Once the weights of these connections have been learned (see “Sensori-motor association learning in effector choice tasks” section), the field excites all neurons in the motor plan formation field corresponding to the cued effector.

This is provided by the “stimulus input” DNFs whose neurons had Gaussian tuning curves and preferred directions from 0 to 180 degrees. The output of this field, uW-s, projected to the corresponding motor plan formation DNF via one-to-one excitatory connections. The input to the motor plan formation DNF for each effector SmotorQ‘), is a sum of the outputs of the fields encoding the visual stimulus, cues, estimated cost, and expected reward, corrupted by additive noise {which follows a Gaussian distribution: The parameters 17m, ncue, most, and nreward scale the influence of the input stimulus, cue, cost, and expected reward inputs, respectively. While some studies attempt to find values for these parameters that capture the tradeoff subjects make between cost and reward [72, 73], we set them empirically in order to allow the model to successfully perform the task (see 81 Table, 82 Table, S3 Table and S4 Table in the supporting information for the values of the model parameters used in the current study). For simulations of tasks using multiple effectors, each effector had its own copy of the cue weights and motor plan formation, cost, and effector-centered goods value fields. Competition between effectors was implemented via massive all-to-all inhibitory connections between their motor plan formation fields: Where neffector scales the inhibitory influence of the motor plan formation DNFs on each other.

Supporting Information

We extended the present computational theory to model effector choice tasks by duplicating the architecture of the framework and designating one network for sac-cades and one for reaches. Input to the saccade network is encoded in eye-centered coordinates, Whereas input to the reach network is encoded in hand-centered coordinates. We call the motor plan formation DNFs for hand and eye movements the reach field and saccade field, respectively. The reach field receives inhibitory projections from every neuron in the saccade field and vice-versa, implementing the competitive interactions between potential saccade and reach plans, as reported by neurophysiological studies [29, 36]. We also introduced the context cue field that encodes the task context, With half of its neurons responding to the saccade cue (i.e., red cue) and half responding to a reach cue (i.e., green cue). This layer was fully connected With both reach and saccade fields, With weights initially randomized with 10W random values and trainable through reinforcement learning (see main manuscript for more details). (TIF)

Let’s consider a two-target trials scenario With unequal rewards, such as EV(left target) : 3EV(right target), Where EV(.) denotes “expected reward”. Panel A illustrates the time course of the average activity of the two neuronal ensembles tuned to the two targets, in a trial in Which the higher valued target was selected. Notice that the competition is resolved (i.e., neural activity exceeds the action threshold 7/) almost immediately after the target onset. Panel B depicts an infrequent trial, in Which the lower valued target (i.e., right target) Wins the competition. Notice that it takes considerably more time to solve the competition, resulting in slower movement time. For more details, see the section “The effects of decision variables on action selection” in the main manuscript. (TIF)

Stochastic optimal control theory. A detailed description of the stochastic optimal control theory used to model reach and eye movements to single targets. (PDF)

Model parameters. The values of the model parameters used in the simulations. (PDF)

Stimulus input field parameters. The values of the stimulus input field parameters used in the simulations. (PDF)

Expected reward field parameters. The values of the expected reward (i.e., goods field) parameters used in the simulations. (PDF)

Motor plan formation field parameters. The values of the motor plan formation field parameters used in the simulations. (PDF)

Author Contributions

Conceived and designed the experiments: VC IB RAA. Performed the experiments: VC B. Analyzed the data: VC IB. Contributed reagents/materials/ analysis tools: VC IB. Wrote the paper: VC IB RAA. Model design and development: VC IB.

Topics

optimal control

Appears in 15 sentences as: optimal control (16)
In A Biologically Plausible Computational Theory for Value Integration and Action Selection in Decisions with Competing Alternatives
  1. It combines dynamic neural field theory with stochastic optimal control theory, and includes circuitry for perception, eXpected reward, effort cost and decision-making.
    Page 2, “Author Summary”
  2. It builds on successful models in dynamic neural field theory [25] and stochastic optimal control theory [26] and includes circuitry for perception, expected reward, selection bias, decision-making and effort cost.
    Page 3, “Introduction”
  3. Each neuron in the motor plan formation DNF is linked with a stochastic optimal control schema that generates policies towards the preferred direction of the neuron.
    Page 19, “Discussion”
  4. By combining dynamic neural fields with stochastic optimal control systems the present framework explains a broad range of findings from experimental studies in both humans and animals, such as the influence of decision variables on the neuronal activity in parietal and premotor cortex areas, the effect of action competition on both motor and decision behavior, and the influence of effector competition on the neuronal activity in cortical areas that plan eye and hand movements.
    Page 20, “Discussion”
  5. By combining dynamic neural field theory with stochastic optimal control theory, we provide a principled way to understand how this competition takes place in the cerebral corteX for a variety of visuomotor decision tasks.
    Page 23, “Conclusions”
  6. The following sections describe how we integrate dynamic neural field theory with stochastic optimal control theory to develop a computational framework that can eXplain both neural and behavioral mechanisms underlying a wide variety of Visuomotor decision tasks.
    Page 24, “Dynamic neural fields”
  7. Stochastic optimal control theory has proven quite successful at modeling goal-directed movements such as reaching [26], grasping [69] and saccades [70].
    Page 24, “Action selection for decision tasks with competing alternatives”
  8. Despite the growing popularity of stochastic optimal control models, the preponderance of them are limited only to single goals.
    Page 24, “Action selection for decision tasks with competing alternatives”
  9. In the current study we decompose the complex problem of action selection with competing alternatives into a mixture of optimal control systems that generate policies 71’ s to move the effector towards specif1c directions.
    Page 24, “Action selection for decision tasks with competing alternatives”
  10. Each neuron in this DNF is linked with a stochastic optimal control system.
    Page 24, “Action selection for decision tasks with competing alternatives”
  11. This is the main difference between the optimal control module used in the present framework and classic optimal control studies.
    Page 24, “Action selection for decision tasks with competing alternatives”

See all papers in March 2015 that mention optimal control.

See all papers in PLOS Comp. Biol. that mention optimal control.

Back to top.

neuronal activity

Appears in 13 sentences as: neuronal activities (2) neuronal activity (12)
In A Biologically Plausible Computational Theory for Value Integration and Action Selection in Decisions with Competing Alternatives
  1. It provides insights to a variety of findings in neurophysiological and behavioral studies, such as the competitive interactions between populations of neurons within [27, 28] and between [29] brain regions that result frequently in spatial averaging movements [10, 18, 30] and the effects of decisions variables on neuronal activity [31, 32].
    Page 3, “Introduction”
  2. The results showed that the present framework can reproduce many characteristics of neuronal activity and behavior reported in the eXperimental literature.
    Page 6, “Visuomotor decisions with competing alternatives”
  3. The neuronal activity was weaker while both potential targets were presented prior to the reach onset, compared to the single-target trials [27, 33].
    Page 7, “Visuomotor decisions with competing alternatives”
  4. Once an effector is chosen, the neuronal activity in the corresponding cortical area increases, while the activity in the area selective for the non-selected effector decreases.
    Page 8, “Visuomotor decisions with competing alternatives”
  5. Notice that during that time, the neuronal activity in the reach DNF is lower than the activity in this field for the single target trial with no effector competition (see Fig.
    Page 8, “Visuomotor decisions with competing alternatives”
  6. Once the green cue is presented, the neuronal activity in the reach DNF becomes strong enough to inhibit the saccade DNF and conclusively wins the competition, and the model generates a direct reaching movement to the target.
    Page 9, “Visuomotor decisions with competing alternatives”
  7. There are several interesting features in the simulated neuronal activities and behavior:
    Page 10, “Visuomotor decisions with competing alternatives”
  8. Although the present framework makes qualitative predictions of many features of neuronal activity and behavior in an effector choice task, it assumes that the brain knows a priori, the association between color cues and actions.
    Page 12, “Visuomotor decisions with competing alternatives”
  9. By combining dynamic neural fields with stochastic optimal control systems the present framework explains a broad range of findings from experimental studies in both humans and animals, such as the influence of decision variables on the neuronal activity in parietal and premotor cortex areas, the effect of action competition on both motor and decision behavior, and the influence of effector competition on the neuronal activity in cortical areas that plan eye and hand movements.
    Page 20, “Discussion”
  10. The computational framework presented in the current study is a systems-level framework aimed to qualitatively model and predict response patterns of neuronal activities in ensembles of neurons, as well as decision and motor behavior in action selection tasks With competing alternatives.
    Page 21, “Mapping to neurophysiology”
  11. However, it captures many features of neuronal activity recorded from different cortical areas such as the parietal reach region (PRR), area 5, lateral intraparietal area (LIP), premotor cortex, prefrontal cortex (PFC) and orbitofrontal cortex (OFC) in nonhuman primates that perform reaching and saccadic decision tasks with competing options.
    Page 22, “Mapping to neurophysiology”

See all papers in March 2015 that mention neuronal activity.

See all papers in PLOS Comp. Biol. that mention neuronal activity.

Back to top.

decision-making

Appears in 12 sentences as: decision-making (12)
In A Biologically Plausible Computational Theory for Value Integration and Action Selection in Decisions with Competing Alternatives
  1. The value and the availability of an option can change dynamically even during ongoing actions which compounds the decision-making challenge.
    Page 2, “Author Summary”
  2. It combines dynamic neural field theory with stochastic optimal control theory, and includes circuitry for perception, eXpected reward, effort cost and decision-making .
    Page 2, “Author Summary”
  3. In such situations, the goals dynamically compete during movement, and it is not possible to clearly separate goal decision-making from action selection.
    Page 2, “Introduction”
  4. In the current study, we propose a distributed neurodynamical framework that models the neural basis of decision-making between actions.
    Page 3, “Introduction”
  5. It builds on successful models in dynamic neural field theory [25] and stochastic optimal control theory [26] and includes circuitry for perception, expected reward, selection bias, decision-making and effort cost.
    Page 3, “Introduction”
  6. We show how the proposed computational model can be extended to involve motor plan formation DNFs for various effectors which interact competitively to implement effector as well as spatial decision-making .
    Page 5, “Visuomotor decisions with competing alternatives”
  7. Classic serial theories, such as the “goods-based” model, suggest that decision-making is an independent cognitive process from the sensorimotor processes that implement the resulting choice [1].
    Page 18, “Discussion”
  8. These findings suggest that sensorimotor and midbrain regions may be causally involved in the process of decision-making and action-selection.
    Page 19, “Discussion”
  9. For instance, studies in oculomotor decision-making suggest that LIP neurons accumulate the evidence associated with the alternative options and the accumulated activity of these neurons is compared by “decision brain areas” to select an option [13, 56].
    Page 20, “Discussion”
  10. The model presented here bears some similarity to decision-making models in the hierarchical reinforcement learning (HRL) framework [59].
    Page 20, “Discussion”
  11. HRL suggests that decision-making takes place at various levels of abstraction, with higher levels determining overall goals and lower levels determining sub-sequences of actions to achieve them.
    Page 20, “Discussion”

See all papers in March 2015 that mention decision-making.

See all papers in PLOS Comp. Biol. that mention decision-making.

Back to top.

reinforcement learning

Appears in 8 sentences as: reinforcement learning (8)
In A Biologically Plausible Computational Theory for Value Integration and Action Selection in Decisions with Competing Alternatives
  1. Unlike classic neurodynamical models that used hardwired associations between sensory inputs and motor outputs, we allow for plasticity in the connections between specific dynamic neural fields (DNFs) and, using reinforcement learning mechanisms, we show how action-selection is influenced by trained sensorimotor associations or changing reward contingencies.
    Page 3, “Introduction”
  2. This layer was fully connected with each motor plan formation DNF, with weights initially randomized with low random values and trainable through reinforcement learning (SI Fig.
    Page 8, “Visuomotor decisions with competing alternatives”
  3. These weights were updated using reinforcement learning following every trial, T, with a reward given for moving to a target with the cued effector:
    Page 12, “Visuomotor decisions with competing alternatives”
  4. The activity in the field Ureward was computed by multiplying a multivariate Gaussian encoding stimulus position with a set of weights, Wreward, which were also updated using reinforcement learning following every trial T:
    Page 18, “Motor plan formation field”
  5. The model presented here bears some similarity to decision-making models in the hierarchical reinforcement learning (HRL) framework [59].
    Page 20, “Discussion”
  6. This is due to reinforcement learning on connection weights between the goods value field and context cue field and the motor plan formation DNF, and simulates the process by which cortico-basal ganglia networks map contextual cues and reward expectancy onto actions [60].
    Page 21, “Learning sensorimotor associations and reward contingencies”
  7. One of the novelties of the present framework is that the weights of the connections between the goods value field and the motor plan formation field are plastic and are modified using a simple reinforcement learning rule.
    Page 26, “Computing policy desirability”
  8. This layer was fully connected With both reach and saccade fields, With weights initially randomized with 10W random values and trainable through reinforcement learning (see main manuscript for more details).
    Page 27, “Supporting Information”

See all papers in March 2015 that mention reinforcement learning.

See all papers in PLOS Comp. Biol. that mention reinforcement learning.

Back to top.

choice tasks

Appears in 7 sentences as: choice task (1) choice tasks (5) choice tasks” (1)
In A Biologically Plausible Computational Theory for Value Integration and Action Selection in Decisions with Competing Alternatives
  1. The operation of the framework can be easily understood in the context of particular reaching choice tasks that involve action selection in the presence of competing targets.
    Page 6, “Visuomotor decisions with competing alternatives”
  2. in supporting information shows analytically the architecture of the framework for effector choice tasks ).
    Page 8, “Visuomotor decisions with competing alternatives”
  3. Sensorimotor association learning in effector choice tasks .
    Page 12, “Visuomotor decisions with competing alternatives”
  4. Although the present framework makes qualitative predictions of many features of neuronal activity and behavior in an effector choice task , it assumes that the brain knows a priori, the association between color cues and actions.
    Page 12, “Visuomotor decisions with competing alternatives”
  5. Once the weights of these connections have been learned (see “Sensori-motor association learning in effector choice tasks” section), the field excites all neurons in the motor plan formation field corresponding to the cued effector.
    Page 26, “Computing policy desirability”
  6. The model architecture designed to simulate effector choice tasks with single or multiple targets.
    Page 27, “Supporting Information”
  7. We extended the present computational theory to model effector choice tasks by duplicating the architecture of the framework and designating one network for sac-cades and one for reaches.
    Page 27, “Supporting Information”

See all papers in March 2015 that mention choice tasks.

See all papers in PLOS Comp. Biol. that mention choice tasks.

Back to top.

brain regions

Appears in 4 sentences as: brain regions (4)
In A Biologically Plausible Computational Theory for Value Integration and Action Selection in Decisions with Competing Alternatives
  1. However, recent findings argue for additional mechanisms and suggest the decisions between actions often emerge through a continuous competition within the same brain regions that plan and guide action execution.
    Page 1, “Abstract”
  2. It provides insights to a variety of findings in neurophysiological and behavioral studies, such as the competitive interactions between populations of neurons within [27, 28] and between [29] brain regions that result frequently in spatial averaging movements [10, 18, 30] and the effects of decisions variables on neuronal activity [31, 32].
    Page 3, “Introduction”
  3. This view has received apparent support from series of neurophysiological studies, which found neural correlates of decision variables within sensorimotor brain regions (for review, see [7, 13, 15] ).
    Page 19, “Discussion”
  4. Instead, they generate discrete choices and not population codes that can be used to guide action in continuous parameter spaces, and they do not capture the interactions between networks of brain regions in any realistic way (see below).
    Page 19, “Discussion”

See all papers in March 2015 that mention brain regions.

See all papers in PLOS Comp. Biol. that mention brain regions.

Back to top.

effort cost

Appears in 4 sentences as: effort cost (4)
In A Biologically Plausible Computational Theory for Value Integration and Action Selection in Decisions with Competing Alternatives
  1. It combines dynamic neural field theory with stochastic optimal control theory, and includes circuitry for perception, eXpected reward, effort cost and decision-making.
    Page 2, “Author Summary”
  2. It builds on successful models in dynamic neural field theory [25] and stochastic optimal control theory [26] and includes circuitry for perception, expected reward, selection bias, decision-making and effort cost .
    Page 3, “Introduction”
  3. The basic architecture of the framework is a set of dynamic neural fields (DNFs) that capture the neural processes underlying cue perception, motor plan formation, valuation of goods (e.g., eXpected reward/punishment, social reward, selection bias, cognitive bias) and valuation of actions (e.g., effort cost , precision required), Fig.
    Page 3, “Model architecture”
  4. It is comprised of a series of dynamic neural fields (DNFs) that simulate the neural processes underlying motor plan formation, expected reward and effort cost .
    Page 19, “Discussion”

See all papers in March 2015 that mention effort cost.

See all papers in PLOS Comp. Biol. that mention effort cost.

Back to top.

prefrontal cortex

Appears in 4 sentences as: prefrontal corteX (1) prefrontal cortex (3)
In A Biologically Plausible Computational Theory for Value Integration and Action Selection in Decisions with Competing Alternatives
  1. This view is in accordance with evidence suggesting the convergence of subjective value in the orbitofrontal corteX (OFC) and ventromedial prefrontal cortex (vaFC), where the best alternative is selected (for a review in “goods-based” theory see [4]).
    Page 2, “Introduction”
  2. Clinical and neurophysiological studies have shown that a neural representation of subjective values exists in certain brain areas, most notably in the orbitofrontal corteX (OFC) and ventromedial prefrontal corteX (vaFC) [3, 45].
    Page 18, “Discussion”
  3. However, these approaches do not incorporate the idea of dynamically integrating value information from disparate sources (with the exception of Cisek’s (2006) model [58] , which demonstrated how other regions, including prefrontal cortex , influence the competition), they do not model action selection tasks with competing effectors, and they do not model the eye/hand movement trajectories generated to acquire the choices.
    Page 20, “Discussion”
  4. However, it captures many features of neuronal activity recorded from different cortical areas such as the parietal reach region (PRR), area 5, lateral intraparietal area (LIP), premotor cortex, prefrontal cortex (PFC) and orbitofrontal cortex (OFC) in nonhuman primates that perform reaching and saccadic decision tasks with competing options.
    Page 22, “Mapping to neurophysiology”

See all papers in March 2015 that mention prefrontal cortex.

See all papers in PLOS Comp. Biol. that mention prefrontal cortex.

Back to top.

association learning

Appears in 3 sentences as: association learning (3)
In A Biologically Plausible Computational Theory for Value Integration and Action Selection in Decisions with Competing Alternatives
  1. Sensorimotor association learning in effector choice tasks.
    Page 12, “Visuomotor decisions with competing alternatives”
  2. We studied the neural mechanisms underlying sensorimotor association learning by training the model to distinguish between two cues which signal whether it is required to perform a reach (“green” cue) or saccade (“red” cue) to a target.
    Page 12, “Visuomotor decisions with competing alternatives”
  3. Once the weights of these connections have been learned (see “Sensori-motor association learning in effector choice tasks” section), the field excites all neurons in the motor plan formation field corresponding to the cued effector.
    Page 26, “Computing policy desirability”

See all papers in March 2015 that mention association learning.

See all papers in PLOS Comp. Biol. that mention association learning.

Back to top.

decision making

Appears in 3 sentences as: Decision making (1) decision making (2) decision making” (1)
In A Biologically Plausible Computational Theory for Value Integration and Action Selection in Decisions with Competing Alternatives
  1. Decision making is a vital component of human and animal behavior that involves selecting between alternative options and generating actions to implement the choices.
    Page 1, “Abstract”
  2. Classical psychological theories posit that decision making takes place within frontal areas and is a separate process from perception and action.
    Page 1, “Abstract”
  3. A classic psychological theory, known as “goods-based decision making”, posits that decision making is a distinct cognitive function from perception and action and that it entails assigning values to the available goods [1—5].
    Page 2, “Introduction”

See all papers in March 2015 that mention decision making.

See all papers in PLOS Comp. Biol. that mention decision making.

Back to top.

model parameters

Appears in 3 sentences as: Model parameters (1) model parameters (2)
In A Biologically Plausible Computational Theory for Value Integration and Action Selection in Decisions with Competing Alternatives
  1. While some studies attempt to find values for these parameters that capture the tradeoff subjects make between cost and reward [72, 73], we set them empirically in order to allow the model to successfully perform the task (see 81 Table, 82 Table, S3 Table and S4 Table in the supporting information for the values of the model parameters used in the current study).
    Page 26, “Computing policy desirability”
  2. Model parameters .
    Page 27, “Supporting Information”
  3. The values of the model parameters used in the simulations.
    Page 27, “Supporting Information”

See all papers in March 2015 that mention model parameters.

See all papers in PLOS Comp. Biol. that mention model parameters.

Back to top.

time steps

Appears in 3 sentences as: time steps (4)
In A Biologically Plausible Computational Theory for Value Integration and Action Selection in Decisions with Competing Alternatives
  1. In the free-choice condition, two equally rewarded targets are presented in both hemif1elds 5O time-steps after the beginning of the trial followed by a free-choice cue (i.e., red and green cues simultaneously presented on the center of the screen) 50 time steps later.
    Page 9, “Visuomotor decisions with competing alternatives”
  2. In the cued condition, a green or a red cue is presented 50 time steps after the onset of the targets, indicating Which effector should be used to acquire either of the targets.
    Page 10, “Visuomotor decisions with competing alternatives”
  3. 7 depicts such a scenario with 3 targets, in which the cue is presented 50 time steps after the trial starts, followed by the target onset 50 time steps later.
    Page 12, “Visuomotor decisions with competing alternatives”

See all papers in March 2015 that mention time steps.

See all papers in PLOS Comp. Biol. that mention time steps.

Back to top.