Abstract | In the artificial intelligence subfield of neural networks , a barrier to that goal is that when agents learn a new skill they typically do so by losing previously acquired skills, a problem called catastrophic forgetting. |
Abstract | In this paper, we test whether catastrophic forgetting can be reduced by evolving modular neural networks . |
Abstract | To produce modularity, we evolve neural networks with a cost for neural connections. |
Author Summary | A longstanding goal in artificial intelligence (AI) is creating computational brain models ( neural networks ) that learn what to do in new situations. |
Author Summary | Here we test whether such forgetting is reduced by evolving modular neural networks , meaning networks with many distinct subgroups of neurons. |
Background | Catastrophic forgetting (also called catastrophic interference) has been identified as a problem for artificial neural networks (ANNs) for over two decades: When learning multiple tasks in a sequence, previous skills are forgotten rapidly as new information is learned [9, 10]. |
Introduction | Such forgetting is especially problematic in fields that attempt to create artificial intelligence in brain models called artificial neural networks [1, 4, 5]. |
Introduction | To learn new skills, neural network learning algorithms change the weights of neural connections [6—8], but old skills are lost because the weights that encoded old skills are changed to improve performance on new tasks. |
Introduction | To advance our goal of producing sophisticated, functional artificial intelligence in neural networks and make progress in our longterm quest to create general artificial intelligence with them, we need to develop algorithms that can learn how to handle more than a few different problems. |
Abstract | Using neural network simulations, we demonstrate that ADO is a suitable mechanism for BIG learning. |
Author Summary | We introduce and evaluate a new biologically-motivated learning rule for neural networks . |
Author Summary | Such basic geometric requirement, which was explicitly recognized in Donald Hebb’s original formulation of synaptic plasticity, is not usually accounted for in neural network learning rules. |
Author Summary | Thus, the selectivity of synaptic formation implied by the ADO requirement is shown to provide a fundamental cognitive advantage over classic artificial neural networks . |
Introduction | Here we formulate this notion quantitatively with a new neural network learning rule, demonstrating by construction that ADO is a suitable mechanism for BIG learning. |
Neural Network Model and the BIG ADO Learning Rule | Neural Network Model and the BIG ADO Learning Rule |
Neural Network Model and the BIG ADO Learning Rule | This work assumes the classic model of neural networks as directed graphs in which nodes represent neurons and each directional edge represents a connection between the axon of the pre-synaptic neuron and the dendrite of the post-synaptic neuron. |
Neural Network Model and the BIG ADO Learning Rule | The learning rule introduced in this work implements a form of structural plasticity in neural networks that incorporates the constraint of proximity between pre and post-synaptic partners or axonal-dendritic overlap (ADO): if two neurons at and (9 fire together, a connection from a to b is only formed if the axon of a comes within a threshold distance from a dendrite of b. |
Pre-Training and Testing Design | Specifically, when initially connecting the neural network , we select the pre-training subset of edges non-uniformly from the reality-generating graph, such that distinct groups of nodes are differentially represented. |
Discussion | Bow-tie structures are also common in multilayered artificial neural networks used for classification and dimensionality reduction problems. |
Discussion | While there are parallels in the functional role of bow-ties there with the biological bow-ties which are the focus of this study, these artificial neural networks are designed a priori to have this bow-tie structure. |
Discussion | Multilayered neural networks often use an intermediate (hidden) layer whose number of nodes is smaller than the number of input and output nodes [30,75]. |
E E | To test this hypothesis we employed a well-studied problem of image analysis using perceptron nonlinear neural networks [65,66]. |
Introduction | Generically, in fields as diverse as artificial neural networks [30] and evolution of biological networks, simulations result in highly connected networks with no bow-tie [31—37]. |
Retina problem | We tested the evolution of bow-tie networks in this nonlinear problem which resembles standard neural network studies [39,65,84]. |
Abstract | The resulting learning rule endows neural networks with the capacity to create new working memory representations of task relevant information as persistent activity. |
Biological plausibility, biological detail and future work | These connections might further expand the set of tasks that neural networks can master if trained by trial-and-error. |
Comparison to previous modeling approaches | Earlier neural networks models used “backpropagation-through-time”, but its mechanisms are biologically implausible [77]. |
Discussion | To the best of our knowledge, AuGMEnT is the first biologically plausible learning scheme that implements SARSA in a multilayer neural network equipped with working memory. |
Discussion | These on-policy methods appear to be more stable than off-policy algorithms (such as Q-learning which considers transitions not experienced by the network), if combined with neural networks (see e.g. |
Vibrotactile discrimination task | Several models addressed how neural network models can store F1 and compare it to F2 [46—48]. |
Case 1: all cells recorded | We implemented a recurrent neural network with N = 5000 integrate-and-fire neurons that encodes some input stimulus s in the spiking activity of its neurons, and we built a perceptual readout from that network according to our model, with parameters K* = 80 neurons, w* = 50 ms, t; = 100 ms, and a; = 1 stimulus units (see Methods for a description of the network, and supporting 81 Text). |
Case 3: less than K* cells recorded | Another pathological situation could be a neural network specifically designed to dispatch information non-redundantly across the full population [31, 32], resulting in a few ‘global’ modes of activity with very large SNR—meaning high ym and low firm. |
Sensitivity and CC signals as a function of K | Validation on a simulated neural network |
Sensitivity and CC signals as a function of K | The neural network used to test our methods is described in detail in supporting 81 Text (section 3). |
Sensitivity and CC signals as a function of K | We implemented and simulated the network using Brian, a spiking neural network simulator in Python [39]. |
Supporting Information | Contains additional information about Choice Probabilities (section 1), the influence of parameter w on stimulus sensitivity (section 2), the encoding neural network used for testing the method (section 3), the Bayesian regularization procedure on Fisher’s linear discriminant (section 4), unbiased computation of CC indicators in the presence of measurement noise (section 5), and an extended readout model with variable extraction time tR (section 6). |
AMSN | This scenario is shown for the firing rate model in Fig 1B and for the spiking neural network in Fig 4). |
Effect of cortical spiking activity correlations on the DTT | These simulations were only performed for the spiking neural network model since modelling correlations in a mean field model is nontrivial, especially when post-synaptic neurons are recurrently connected. |
Supporting Information | Presence of DT T in spiking neural network in which D1 and D2 MSNs have different F-I curves. |