SciSurf: Index of 'A Neural Mechanism for Background Information-Gated Learning Based on Axonal-Dendritic Overlaps'

A Neural Mechanism for Background Information-Gated Learning Based on Axonal-Dendritic Overlaps

Matteo Mainetti, Giorgio A. Ascoli

Published in PLOS Comp. Biol., March 2015

Abstract

Although necessary, however, actual experience is not sufficient for memory formation. One-trial learning is also gated by knowledge of appropriate background information to make sense of the experienced occurrence. Strong neurobiological evidence suggests that longterm memory storage involves formation of new synapses. On the short time scale, this form of structural plasticity requires that the axon of the pre-synaptic neuron be physically proximal to the dendrite of the post-synaptic neuron. We surmise that such “axonal-dendritic overlap” (ADO) constitutes the neural correlate of background information-gated (BIG) learning. The hypothesis is based on a fundamental neuroanatomical constraint: an axon must pass close to the dendrites that are near other neurons it contacts. The topographic organization of the mammalian cortex ensures that nearby neurons encode related information. Using neural network simulations, we demonstrate that ADO is a suitable mechanism for BIG learning. We model knowledge as associations between terms, concepts or indivisible units of thought via directed graphs. The simplest instantiation encodes each concept by single neurons. Results are then generalized to cell assemblies. The proposed mechanism results in learning real associations better than spurious co-occurrences, providing definitive cognitive advantages.

Author Summary

The proposed mechanism explains why it is easier to acquire knowledge when it relates to known background information than when it is completely novel. We posit that this “background information-gated” (BIG) learning emerges from the necessity of neuronal axons and dendrites to be adjacent to each other in order to establish new synapses. Such basic geometric requirement, which was explicitly recognized in Donald Hebb’s original formulation of synaptic plasticity, is not usually accounted for in neural network learning rules. More generally, the level of abstraction of current computational models is insufficient to capture the details of axonal and dendritic shape. Here we show that “axonal-dendritic overlap” (ADO) can be parsimoniously related to connectivity by assuming optimal neuronal placement to minimize axonal wiring. Incorporating this new relationship into classic connectionist learning algorithms, we show that networks trained in a given domain more easily acquire further knowledge in the same domain than in others. Surprisingly, the morphologically-motivated constraint on structural plasticity also endows neural nets with the powerful computational ability to discriminate real associations of events, like the sight of a lightning and the sound of the thunder, from spurious co-occurrences, such as between the thunder and the beetle that flew by during the storm. Thus, the selectivity of synaptic formation implied by the ADO requirement is shown to provide a fundamental cognitive advantage over classic artificial neural networks.

Introduction

Studying the same material, it is much harder for someone with different expertise to learn the same facts. While it is commonsense that new information is easier to memorize if it relates to prior knowledge, the cognitive and neural mechanisms underlying this familiar phenomenon are not established. More specifically, one-trial learning of “neutral” events, as opposed to emotionally charged or surprising experiences [1], is gated by knowledge of appropriate background information to make sense of the experienced occurrence [2, 3]. Consider experiencing for the first time the co-occurrence of a buzzing sound with the sight of a beetle (Fig. 1A). Learning that “beetles can buzz” may depend on background information that renders the “buzzing beetle” association sensible. Prior knowledge might include that wasps, flies, and bees also buzz. Such facts are relevant because they involve related concepts: these insects share several common associations with beetles (e.g. small size, crawling, flying, erratic trajectories). The remainder of this paper refers to this cognitive phenomenon as “background information gating” or BIG learning.

Building on those ideas, we propose a possible neuroanatomical correlate of BIG learning. The hypothesized mechanism is initially best illustrated under the oversimplifying assumption that associations are stored by connecting “grandmother” neurons, each corresponding to individual concepts (Fig. 1B). The computational simulations presented in this work, however, demonstrate that this same concept also seamlessly works with distributed neuronal representations.

We henceforth refer to this “potential syn-apse” configuration [8] as axonal—dendritic overlap or ADO. Intuitively, the reason the axon passes near the dendrite is because it is connected to other dendrites in that vicinity. Why then is the potential post-synaptic dendrite close to other dendrites contacted by the potential pre-synaptic axon? Wiring cost considerations suggest that neurons should be placed nearby if they receive synapses from the same axons [9]. If knowledge representation is stored in pairwise neural connections [10] , this particular topology should correspond to relevant background information. Here we formulate this notion quantitatively with a new neural network learning rule, demonstrating by construction that ADO is a suitable mechanism for BIG learning. In our model, neural activation reflects associations sampled from various graphs taken as a simplified representation of everyday experience. Specifically, every instant of experience is represented as a subset of co-occurring elementary observables, each corresponding to a node of a “reality graph,” in which edges denote probability of co-occurrence (see 81 Text 1.1 for a more extended description). We study networks pre-trained with an initial connectivity by comparing their ability to learn new information that is related or unrelated to prior knowledge. Such preexisting background information may derive from repetition learning [11] or from experience earlier in life: if the BIG ADO were enforced from the start in a fully disconnected network, no new synapses could ever form. The simplest instantiation encodes each concept by single neurons; results are then shown to generalize robustly to realistic cell assemblies. Noticeably, the proposed mechanism results in learning real associations better than spurious co-occurrences, providing definitive cognitive advantages.

Materials and Methods

Here we explain the research design pertaining to the findings reported in the main text. The detailed methodologies are more thoroughly described in 81 Text 2.1—2.4.

Neural Network Model and the BIG ADO Learning Rule

The network only contains excitatory neurons. In this model, formation of new binary connections (a form of structural plasticity) underlies associative learning, and knowledge is encoded by the connectivity of the network [ 10].

Many variants of Hebbian synaptic modification exist [12], often summarized as ‘neurons that fire together wire together’. This popular quip, however, misses the essential requirement, clearly stressed in Hebb’s original formulation, that the axon of the pre-synaptic neuron must be sufficiently close to its post-synaptic target for plasticity to take place. The learning rule introduced in this work implements a form of structural plasticity in neural networks that incorporates the constraint of proximity between pre and post-synaptic partners or axonal-dendritic overlap (ADO): if two neurons at and (9 fire together, a connection from a to b is only formed if the axon of a comes within a threshold distance from a dendrite of b. In mathematical terms, this condition can be defined as a non-symmetric real-valued function between neurons corresponding to the distance from the axon of the candidate pre-synaptic neuron to the dendrite of the post-synaptic neuron.

The first assumption is that the axon of a passes near the dendrite of neuron (9 because it connects to another neuron c that is near neuron b. This assumption corresponds to a principle of parsimony in the use of axonal wiring: since the goal of axons is to carry signals to other neurons, the locations of axonal branches are part of trajectories towards synaptic contacts. The second assumption is that if neurons (9 and c are near each other, it is because they are both contacted by the same set of axons, which we generically call d (Fig. 1). This assumption presumes optimal neuronal placement once again to minimize axonal wiring, consistent with the existence of topographic maps e.g. in the mammalian cortex [13], but also in invertebrate nervous systems [14]. These two assumptions can be combined into the assertion that the tendency of the axon of neuron a to overlap With a dendrite of neuron (9 increases With the number of neurons c and d such that a is connected to c and d is connected to both (9 and c. This idea is quantified by the following proximity (71') function:

The above formula can be elegantly expressed as the product of three matrices: where .Q = {comm} is the (binary) network connectivity (also called adjacency matrix), with the number of rows and columns equal to the number of neurons in the network, and each row and column representing a neuron’s pre and post-synaptic contacts, respectively, with all other neurons; Qt is the transpose matrix in which every row is substituted with the corresponding column and vice versa (this operation is equivalent to switching axons and dendrites for each neuron); and H = {7101111)} is the proximity matrix, which (like [2) is square and non-symmetric.

(a,b) > 9. The proximity threshold is one of several parameters that have to be fixed when running simulations of an actual system; robustness of the mechanism is discussed in SI Text 3.2. As an alternative to such a discontinuous threshold, we also implemented a probabilistic criterion for relating potential connectivity to proximity. In this case, the probability of a and (9 being proximal was not a binary function of proximity but it instead followed a sigmoid curve. This probabilistic variant, while introducing an additional source of noise in the simulations, yielded results (also described in SI Text 3.2) that confirmed the main results of this work. However, this more general approach also increases the complexity of the model, by requiring the specification of an additional parameter to define the slope of the sigmoid.

For instance, network connectivity could be expressed as a matrix . (2 recording not just the existence of a connection between two neurons, but the number of their physical contacts or other relevant measures, such as the stability of the synapses [15]. In the simple formulation used in this work, which presumes optimal neuronal placement to minimize axonal wiring, high proximity values make axonal-dendritic overlap likely, but not absolutely warranted.

Most strikingly, a learning procedure with a very similar structure was described [16] to explain a generalization of a novel sequence (b-d) based on experienced sequences (ac), (ad), and (bc). Despite this similarity (which we discovered during peer-review), the formulation introduced in the current work was derived independently, starting from the interpretation in terms of axonal-dendritic overlaps and structural plasticity. More generally, circuit connectivity, synaptic plasticity, and neuronal placement are interrelated in a broad class of other common neural network approaches, including Kohonen-type self-organizing maps [17]. In our model, the ADO constraint on structural plasticity is reduced to simple topological proximity rather than physical distance between neurons. Moreover, the application to background information-gated learning, the neural network implementation, and the analyses presented here are all novel.

1B). With such a one-to-one mapping in place, existing synapses reflect learned associations between previously co-occurred observables (solid arrows in Fig. 1A), altogether constituting already acquired knowledge. When witnessing a new co-occurrence between the two observables a and b, the association of their internal representations will only be allowed if consistent with prior relevant knowledge, ultimately corresponding to background information.

Pre-Training and Testing Design

In the general simulation design, the network of the agent’s internal representation is created by copying the set of nodes from the reality-generating graph, but connecting them by sampling only a subset of edges. This process produces a network effectively encoding a certain amount of knowledge of reality consistent with prior experience. The same result would be obtained by “pre-training” a(n initially) fully disconnected network with the common “f1ring together, wiring together” rule (without BIG ADO filter) and sequentially activating pairs of neurons corresponding to the sampled subset of the reality-generating graph.

Such a setup allows investigation of the effect of the BIG ADO filter on subsequent learning. In the testing phase, further experience is sampled from not-yet learned edges of the reality-generating graph. These can be chosen so as to represent co-occurrences of observables more or less closely related to the pre-trained knowledge (mimicking expert or novice agents, respectively). Specifically, when initially connecting the neural network, we select the pre-training subset of edges non-uniformly from the reality-generating graph, such that distinct groups of nodes are differentially represented. For example, if the neural network is pre-trained with 50% of the edges from the reality-generating graph, three quarters of these edges can be sampled from half of the nodes, and one quarter of the edges from the other half. The resulting neural network is an “expert” on half of the reality-generating graph (because it knows a majority of the corresponding structure), and a “novice” on the other half (where it only knows a minority of the structure). In the “learning test” phase, the network is presented with new edges selected either from within the domain of expertise (that is, from the one quarter of edges not used in pre-training) or from the outside (from the three quarters of unused edges in the other half of nodes). The network learns new edges only if the proximity of the corresponding nodes is above threshold.

The former types reflect actual edges in the reality-generating graph (i.e. x-y and w-z), while the latter correspond to “random” co-occurrences (x-w, X-Z, y-w, and y-Z).

Thus, if the BIG ADO filter were in place from the beginning, no synapses would ever form in the network. The above pre-training design, which circumvents this impasse, can be justified by a two-stage developmental model [18]. Early in development, neurons are still optimizing their placements, and axonal branches undergo frequent rearrangements; in the subsequent mature stage, eXperience-dependent synapse formation and pruning are still common, but neuronal Wiring is much more stable. Nevertheless, the “pre-training” model adopted here is also consistent With non-developmental scenarios. Even in adulthood, growth processes can be triggered by continuous repetition or by neuromodulation reflecting emotionally salience (e.g. shock, pleasure, etc.). These conditions can eXplain the acquisition of prior knowledge (background information). The BIG ADO filter, in contrast, constitutes a neuroanatomically-in-spired model of one-trial, emotionally neutral learning.

Word Association Graph

2A-B) was derived from a compilation of noun/ adjective pairings in Wikipedia. In its original form, it consisted of 32 million adjective-modified nouns (http://wiki.ims.uni-stuttgart.de/extern/ WordGraph). After identifying nouns corresponding to animals and household objects, we skimmed infrequent adjectives and removed ambiguous terms (see SI Text 2.1 for exact protocol). The resulting bipartite graph consisted of 50 animal nouns, 50 household object nouns, 285 adjectives and 2,682 edges (1,324 for animals and 1,358 for objects). Next, two networks were pre-trained by connecting half of the noun-adjective pairs from the graph. One of the networks associated more edges pertaining to animal nodes (becoming an animal expert and object novice), while the other associated more edges pertaining to object nodes (object expert, animal novice). Moreover, the amount of specialization was also varied to mimic different levels of specialization. This was achieved by varying the ratio between animals and objects learned in pre-training. Learning was then tested on the other half of the noun-adjective pairs using the BIG ADO rule with a proximity threshold (9 in equation 1) of 6. In the random equivalent graphs, edges between 100 “noun” nodes and 285 “adjective” nodes were generated stochastically by preserving both the overall noun and adjective degree distributions of the word graph. In this “control” condition, networks were pre-trained with expertise on one arbitrary subset of nodes.

Specifically, consider the proximities of a noun with the set of all adjectives: the correlation of these values can be then computed between any two nouns. The intrinsic background information of a noun class will be reflected by a statistically larger mean correlation coefficient over all pairs of nouns within that class than over all pairs of nouns from two different classes. The mean correlation was significantly greater for animal-animal than the animal-object pairs (0.69 vs. 0.47, p<10'4), while there was no statistical difference (p>0.1) between the mean correlations of the object-object (0.48) and object-animal (0.46) pairs (see 81 Text 3.1 for details).

BIG Learning in Watts-Strogatz Networks

Specifically, unless otherwise noted, Watts-Strogatz graphs were initially produced with degree 20 and 10% rewiring probability. Next, a random direction was selected for 90% of the edges, while the remaining 10% was made bidirectional. A random 20% of the nodes, along with all their incoming edges, were then labeled as belonging to the agent’s area of expertise. In the pre-training phase, networks were wired with a random set of edges of the graph, with the constraint that half of them must belong to the area of expertise, unless otherwise specified. The resulting connectivity consisted of a subgraph of the initial graph, whose nodes in the area of eXpertise had higher average degree than those outside the agent’s eXpertise. In the “grandmother cell” implementation (Fig. 3), the BIG ADO threshold was set at 1. When the size of the graph (N) was varied to assess the robustness of the BIG ADO findings with respect to the parameter space, the degree (d) and the number of associations (edges) used to pre-train the network (T) also varied as d = N/5O and T = Nxd/4, in order to keep the fraction of associations learned during pre-training constant.

Extension of the ADO Rule to Cell Assemblies

4) implemented the Zip Net model [20], a computational enhancement of classic Associative Nets [21] that ensures optimal Bayesian learning [22]. Briefly, learning the association between two concepts A and B

. ., aS and b1, b2, . . ., bs, entails strengthening (or forming) synapses between co-active neurons and weakening or eliminating those between active and inactive neurons. Specifically, in the “incidence” matrix M with rows and columns respectively representing pre and post-synaptic neurons, the entries in columns bj’s of all as rows are increased while the remaining entries are decreasing by an appropriate amount to keep the total synaptic input constant (81 Text 2.3). In the pre-training phase, the connectiVity matrix is generated from the incidence matrix simply by keeping a fixed number of synapses per neuron (those with highest weight), and setting the rest to zero. During BIG ADO testing, two neurons a and b can only form a new synapse upon co-activation if they have an axonal-dendritic overlap, Which is expressed as the triple matrix product Qfltfl computed from the positive values of the incidence matrix

Lastly, retrieval works as a classic dendritic sum: given a stimulus A’ represented by neurons a’l, a’z, . . ., a’s, all the entries in the rows corresponding to the as are added up for each column, and those sums exceeding a given firing threshold correspond to activated (post-synaptic) neurons. If enough neurons belonging to the same cell assembly B’ fire, concept B’ gets activated.

Results

Prior Knowledge Gates Learning of Word Associations by Grandmother Neurons

We identified two classes of nouns (animals and household objects) and pre-trained two networks to learn a subset of the noun/ adjective associations, each with “expertise” mostly in one of the two noun classes (Fig. 2A). Specifically, one network was pre-trained with a greater proportion of animal/adjective associations than of object/ adjective associations (and vice versa for the other network). BIG learning facilitated networks to acquire new information that was related to the information already stored. Moreover, the magnitude of this phenomenon increased with the level of specialization between animals and objects (Fig. 2B). Note that, even in their “novice” domain of knowledge, networks cannot be completely “naive.” Even if the pre-trained proportion of “novice” edges is lower than in the domain of expertise, it must still be nonzero or else no subsequent associations could be learned.

Furthermore, more animal associations were learned when the network was pre-trained with the same number of animal and object edges. Both of these differences can be explained by two independent forms of background information: one intrinsic in the source data, and another dependent on the sample used to pre-train the network. The former was eliminated by repeating the simulations on random equivalent graphs (Fig. 2B: right bar pairs). Direct analysis of Pearson’s coefficients of the bipartite graph Proximity function (see Materials and Methods) confirmed that the noun/ adjective association is more specific for animals than for objects (0.69 vs. 0.48, p<10'4).

BIG Learning in Small-World Graphs: Ability to Differentiate Real from Spurious Associations

3A). Networks were pre-trained with samples of associations biased towards an arbitrary subset of nodes. As in the bipartite graph, the ADO filter gated subsequent learning of new associations by favoring those pertaining to this background information (Fig. 3B). Next we investigated the ability of BIG to differentiate between “real” and “spurious” associations. Most co-occurrences experienced in everyday life do not reflect real associations, but rather events that happened together by chance. For example, suppose you were eating a grapefruit while experiencing the buzzing beetle described in the Introduction. Why should buzzing be associated with beetle and not with grapefruit? Hebbian models form both associations, relying on later experience to reinforce those that reoccur and eliminating the others [12] , e.g. upon repeatedly dissociated experiences of eating a grapefruit without buzz and vice versa. Strikingly, the BIG ADO filter distinguished real from spurious associations (Fig. 3C), facilitating the ability to learn relevant co-occurrences over “oc-casional” ones the first time around. In a simple protocol, each eXperience consisted of the co-activation of two independent pairs of connected nodes in the Watts-Strogatz graph. The resulting siX co-occurrences correspond to two real associations (between the two connected nodes in each of the pair) and four spurious associations (between neurons across the pairs).

In the pre-trained network, the axon of buzzing overlaps with the dendrite of beetle (high ADO) thanks to the already acquired buzzing-wasp, flying erratically-wasp, and flying erratically-beetle associations. Thus, the potential association buzzing-beetle ‘passes’ the BIG ADO filter. In contrast, buzzing and grapefruit have little if any axonal-dendritic overlap; thus, the corresponding association is not formed according to the BIG ADO mechanism. The learning differentials of both eXpert-over-novice networks and real-over-spurious associations increased with the bias towards a subset of nodes in the Watts-Strogatz graph, and were observed over a broad range of model parameters (see SI Text 3.2 for additional results).

Generalization to Realistic Cell Assemblies

Theories and experiments estimate that at least 50—200 cells take part in encoding each unit of thought [25, 26, 27]. Cell assemblies provide for redundancy, error-correction, and larger storage capacity. We thus extended the BIG ADO paradigm to cell assemblies. In cell assembly models, acquiring a new association between two co-occurring events entails formation of new synapses between the neurons representing one event and the neurons representing the other event. With the BIG ADO filter, forming synapse between a pair of co-active neurons requires appropriate preexisting connections similarly to Fig. 1B, with the notable difference that the same neuron typically belongs to several cell assemblies.

Simulations with the Willshaw model confirmed the BIG ADO results with the word association graph (see SI Text 2.3 for implementation detail and SI Text 3.2 for analysis). However, the original Associative Nets achieve maximal storage capacity when cell assembly size is log-proportional to the number of neurons [20]. Such limitation on cell assembly size makes this approach unsuitable for learning realistic Watts-Strogatz graphs. A more sophisticated variant of this model, which achieves optimal Bayesian learning [22] , attains excellent performance for cell assembly sizes compatible with those estimated for real brains. This latter model (Zip Nets) enabled cell assembly implementation of the BIG ADO mechanism with generic Watts-Strogatz graphs. In a typical configuration, the network learned 50% of novel associations within its domain of expertise, but only 9% unrelated to prior knowledge. When two node pairs (sampled randomly within and outside domain of expertise) were co-activated at once, 30% of real associations were learned vs. 7% of the spurious ones. Sampling only within or outside the domain of expertise, the learning proportions for real and spurious pairs were 50% and 12% or 9% and 3%, respectively.

In particular, a substantially higher proportion of associations were learned within the domain of expertise than outside for any graph degree d (the average number of edges per node) from 8 to 24 and rewiring probability up to 80% (Fig. 4A). The rewiring probability R defines by construction Watts-Strogatz graphs as hybrids between regular

The fraction of spurious associations learned was substantially lower than that of real associations for degrees above 5 and rewiring probability below 50% (Fig. 4A). This suggests that prior connectivity (ADO) provides a biologically realistic neural correlate of background information and its ability to gate learning in any highly clustered networks. In clustered networks, two nodes are more likely to be interconnected if they are both connected to a third node. This is a common property of many types of graphs that extends beyond Watts-Strogatz networks [28].

Robustness Analysis and Optimal Conditions

Specifically, the described mechanism does not depend on specific choices of parameters such as graph dimension, number of associations presented, learning threshold, and others. In particular, the main effect of axonal-dendritic overlap to selectively gate learning by background information was consistently reproduced in every combination of parameters conducive to adequate memory storage (Fig. 4B). Moreover, the discrimination between real and spurious associations with cell assemblies in small-world graphs was also largely unaffected by the choice of numerical values. Importantly, however, this latter effect varied quantitatively as a function of selected model parameters (Fig. 4C), such as the proximity load, which determines how topologically close an axon and a dendrite must be to constitute a potential synapse (see section 2.4 of 81 Text). This is the key parameter distinguishing BIG ADO from traditional Hebbian learning: a new synapse is formed between two neurons when they fire together and only a potential synapse is already present. Thus, certain circuits might be better designed than others to support efficient one-trial learning depending on their specific plasticity and excitability (see 81 Text 3.2 for additional results).

Discussion

The key idea is that this “background information-gated” (BIG) learning emerges from the necessity of neuronal axons and dendrites to be adjacent to each other in order to establish new synapses. Such basic geometric requirement was explicitly recognized in Hebb’s original formulation of synaptic plasticity, yet is not usually accounted for in neural network learning rules. The claim that existing structure matters for learning is not new [29]. However, the level of abstraction of current computational models of brain function fails to capture the details of axonal and dendritic shape.

This corresponds to a fundamental neuroanatomical constraint: an axon must pass close to the dendrites that are near other neurons it contacts. The topographic organization of the mammalian cortex ensures that nearby neurons on average encode related information [30]. Incorporating this new relationship into classic connectionist learning algorithms, we found that networks trained in a given domain more easily acquire further knowledge in the same domain than in others. If the proximity threshold is set to zero, the model reverts to a traditional neural network unconditionally learning all associations. From this perspective, the BIG ADO rule could be considered as a biological constraint on learning. However, to our initial surprise, the morphologically-motivated constraint on structural plasticity also endows neural nets with the powerful computational ability to discriminate real associations of events, like the sight of a lightning and the sound of the thunder, from spurious co-occurrences, such as between the thunder and the beetle that flew by during the storm. Thus, we surmise that the selectivity of synaptic formation implied by the ADO requirement provides a fundamental cognitive advantage over the unconstrained “fire together, wire together” plasticity rule of classic artificial neural networks. Of course the ability to associate completely unrelated facts or events may also be useful in many circumstances. Several different models have proposed that the hippocampus might be specialized for precisely that function, possibly leveraging its superior plasticity rate [31] or adult neurogenesis [32]. Our model suggests that this ability might also derive from the lack of topographic mapping in this structure (e.g. hippocampal area CA3). Moreover, the profuse axonal arbors of cortical neurons may enable access to a surprisingly large pool of intertwining dendrites through neurite outgrowth [33], perhaps providing a counter-mechanism to balance the BIG ADO rule.

If k pairs of real associations (Al-B1, A2-B2, . . ., Ak-Bk) are presented at the same time, BIG ADO selectively learns the correctly paired events over spuriously co-occurring ones (e.g. A1-B2, AZ-Bl, etc.). A “fire-together, wire-together” rule without ADO constraint can achieve similar selectivity by repetition. In this case, each association must be presented multiple times in order to attain the same discrimination power displayed by BIG ADO in one-trial learning. The number of required repetitions grows with the number k of real associations presented together and also depends on the structure of the association graph. For example, in the conditions of Fig. 3, BIG ADO learns real associations at a rate of 6:1 relative to spurious co-occurrences upon the first presentation. To obtain the same ratio in the absence of ADO if just five pairs are presented together, each association has to be repeated on average four times.

Our research design is consistent with an initial phase of maXimal plasticity, followed by a ‘mature’ state of conditional plasticity. Specifically, during pre-training, all witnessed associations are learned. Clearly, the anatomical constraint of axonal-dendritic overlap holds in all phases of development. However, the more prominent neuronal and axonal movements in earlier developmental stages would largely circumvent or alleviate the ADO filter. In practice, we pre-load the network directly with synaptic connectivity equivalent to that resulting from such an initial developmental phase (representing ‘background knowledge’). Afterword, the model preferentially learns associations related to previously acquired information. The resulting mature network not only avoids associating the (most numerous) spurious co-occurrences, but is also optimally structured to learn the associations most relevant to the environment in which it developed. Besides providing clear evolutionary advantages, these key features could also be applied in artificial intelligence and search engines.

This process is complementary to (and as fundamental as) other factors known to control learning, such as valence and novelty. The proposed mechanism of axonal-dendritic overlap, based on the elementary anatomical organization of neuronal circuits, is also independent of neuromodulatory pathways likely to underlie alternative or parallel regulation of one-trial learning. This framework can also be useful to describe how semantic knowledge can be incorporated into existing knowledge. Moreover, the model offers a possible neural network correlate for the rapid memory consolidation occurring when new information is assimilated into a preexisting associative “schema” or mental representation [36]. Other recent models have been proposed to eXplain the dependence of learning on prior knowledge [37].

Realistically, potential synapses might work in synergy with additional mechanisms conducive to the same learning rule. For example, presentation of individual elemental associations (buzzing wasp, flying wasp, and flying beetle) may lead to the formation of cell assemblies representing associations between higher-order concepts and their properties (“flying insect”), as previously hypothesized [39], possibly supported by ongoing structural plasticity [40]. Moreover, axonal-dendritic overlap may provide powerful constraints for the recruitment of individual neurons into cell assemblies. While cell assembly selection has been proposed as the core of knowledge representation in neural systems [41], the underlying anatomical mechanisms have so far remained elusive [26]. Thus, the proposed link between neuronal structure and function may constitute an essential foundation for brain-based theories of cognition.

Supporting Information

Much ADO About BIG Learning: Supplementary Information. The single Supporting Information file (81 Text) describing the model’s underlying assumptions, detailed methodologies, and supplementary results includes additional text, illustration, and references.

Acknowledgments

We thank Dr. James L. Olds for feedback on an earlier version of the manuscript.

Author Contributions

Performed the experiments: MM. Analyzed the data: MM GAA. Contributed reagents/materials/ analysis tools: GAA. Wrote the paper: MM GAA.

Topics

neural network

Appears in 20 sentences as: Neural Network (1) Neural network (1) neural network (12) neural networks (6)