Brain scientists led by Sebastian Haesler (NERF, empowered by IMEC, KU Leuven and VIB) have identified a causal mechanism of how novel stimuli promote learning.
Novelty directly activates the dopamine system, which is responsible for associative learning. The findings have implications for improving learning strategies and for the design of machine learning algorithms.
Novelty and learning
A fundamental type of learning, known as associative learning, is commonly observed in animals and humans.
It involves the association of a stimulus or an action with a positive or negative outcome. Associative learning underlies many of our every-day behaviors: we reward children for doing their homework, for example, or limit their TV time if they misbehave.
Scientists have known since the 1960’s that novelty facilitates associative learning. However, the mechanisms behind this phenomenon remained unknown.
“Previous work suggested that novelty might activate the dopamine system in the brain. Therefore we thought that dopamine activation might also promote associative learning.” says Prof. Sebastian Haesler, who led the study.
Sniffing out novelty
To demonstrate that novelty indeed activates dopamine neurons, the researchers exposed mice to both new and familiar smells.
“When mice smell a novel stimulus, they get very excited and start sniffing very rapidly. This natural, spontaneous behavior provides a great readout for novelty perception.” explains Dr. Cagatay Aydin, postdoc in the group of Sebastian Haesler.
With the mouse experiments, the team confirmed dopamine neurons were activated by new smells, but not by familiar ones.
In a second step, the mice were trained to associate novel and familiar smells with reward.
The findings demonstrate that dopamine activation by novel stimuli promotes learning. Image is in the public domain.
“When we specifically blocked dopamine activation by novel stimuli in only a few trials, learning was slowed down.
On the other hand, stimulating dopamine neurons during the presentation of familiar stimuli accelerated learning.” says Joachim Morrens, PhD student in the group.
The value of novelty
The findings demonstrate that dopamine activation by novel stimuli promotes learning. They further provide direct experimental support for a group of theoretical frameworks in computer science, which incorporate a ‘novelty bonus’ to account for the beneficial effect of novelty.
Incorporating such a bonus can speed up machine learning algorithms and improve their efficiency.
From a very practical perspective, the results remind us to break our routine more often and seek out novel experiences to be better learners.
Funding: Funding came from HFSP, EC Marie Curie and FWO.
Midbrain dopamine neurons are widely proposed to signal value prediction errors (Mirenowicz and Schultz, 1994). However, the same neurons also respond to errors in predicting the features of rewarding events, even when their value remains unchanged (Howard and Kahnt, 2018; Takahashi et al., 2017). Such sensory prediction errors would be useful for learning detailed information about the relationships between real-world events (Gardner et al., 2018; Howard and Kahnt, 2018; Langdon et al., 2018; Takahashi et al., 2017). Indeed, dopamine transients facilitate learning such relationships, independent of value, when they are appropriately positioned to mimic endogenous errors (Chang et al., 2017; Keiflin et al., 2019; Sharpe et al., 2017). Yet dopaminergic responses to sensory prediction errors do not seem to encode the content of the mis-predicted event, either at the level of individual neurons or summed across populations (Howard and Kahnt, 2018; Takahashi et al., 2017).
How then do downstream areas that receive this teaching signal know what to learn? The conventional response is that such signals are permissive, with downstream areas controlling the content of the resultant learning (Glimcher, 2011). However, another possibility is that information about the content of the learning might be contained, at least partly, in the pattern of firing across ensembles of dopamine neurons. It is now widely accepted that information is represented in areas like cortex and hippocampus not by individual neurons, but rather in a distributed fashion in the firing of groups of cells (Gochin et al., 1994; Jennings et al., 2019; Jones et al., 2007; Rich and Wallis, 2016; Rigotti et al., 2013; Schoenbaum and Eichenbaum, 1995; Wikenheiser and Redish, 2015; Wilson and McNaughton, 1993). If this is true for the cortex and hippocampus, then why not for the midbrain dopamine system? Consistent with this, here we show that the pattern of firing across a small group of dopamine neurons recorded in rats contains specific information about the identity of a mis-predicted event. We further show that this same content-rich signal is evident in the BOLD response elicited by sensory prediction errors in human midbrain. These data provide the first evidence of which we are aware that dopamine neuron ensembles generate firing patterns capable of conveying not only the occurrence of a prediction error to downstream areas but also information regarding what exactly was mis-predicted. These findings open new possibilities for how dopaminergic error signals might contribute to the learning of complex associative information.
The results presented here show that, in both rats and humans, putative dopaminergic sensory prediction error responses in the midbrain contain specific information about the features of the mis-predicted event itself, appropriate for instructing or updating representations in downstream brain regions. These results are consistent with the proposal that the midbrain dopamine system signals a multidimensional prediction error, able to reflect a failure to predict information about an unexpected event beyond and even orthogonal to value (Gardner et al., 2018; Howard and Kahnt, 2018; Langdon et al., 2018; Takahashi et al., 2017). Importantly this proposal is not necessarily contrary to current canon; it can account for value errors as a special example of a more general function (Gardner et al., 2018), one readily apparent in the firing of individual neurons perhaps due to the priority given to such information when it is the goal of the experimental subject. However, this proposal also explains in a relatively straightforward way why dopamine neurons are often phasically active in settings where value errors were not anticipated a priori, at least by the experimenters, such as when novel cues or even information is first presented (Bromberg-Martin and Hikosaka, 2009; Horvitz, 2000; Horvitz et al., 1997; Kakade and Dayan, 2002), or even in response to violations in beliefs or auditory expectations (Gläscher et al., 2010; Gold et al., 2019; Iglesias et al., 2013; Schwartenbeck et al., 2016). That the pattern of firing across a relatively small population of dopamine neurons can provide details regarding the mis-predicted event endows the dopamine system with the ability to serve as an instructive ‘teaching’ signal outside the dimension of value.
One interesting question raised by the prior and current results is whether and how such a system would distinguish the omission of an expected sensory event from its unexpected appearance. The designs of the two experiments analyzed here do not allow us to distinguish representation of these two types of errors. We would speculate that both should be encoded in the neural activity of the system, including in the current data. Thus, the decoding demonstrated here would reflect the combination of these two changes. Of course, the actual presence of something is likely to support a much stronger signal than its absence, so in practice, it may be difficult or require substantially higher statistical power to see a representation of an omitted event, particularly one that involves subtle features orthogonal to value.
Another interesting question raised by these results is whether downstream areas use the information in the signal to support learning. While the current data is only correlative, it is notable that the information is only there when it is relevant to learning at the start of the blocks, so it is appropriately positioned to be of use to drive learning in downstream structures. And of course, a causal role for the signal shown here is in line with recent demonstrations that dopamine transients are necessary and sufficient for learning that cannot be easily accounted for by classic reinforcement learning mechanisms (Chang et al., 2017; Keiflin et al., 2019; Sharpe et al., 2017). Keiflin et al. (2019) is particularly relevant in this regard, since in this study, conditioned responding to a cue unblocked by artificial activation of VTA dopamine neurons at the time of an expected reward was shown to be sensitive to subsequent devaluation of that reward. Sensitivity to devaluation indicates that the artificial dopamine transients induced the formation of an association between the conditioned stimulus and the sensory properties (i.e. the flavor) of the reward, precisely the type of learning the signal here would be proposed to support (Gardner et al., 2018).
How the artificial activation of neurons engaged in representing information through a pattern of activity can cause normal learning in studies such as those cited above is another outstanding question raised by the current data. One possible explanation for this may be found in the appearance of external events at the time of stimulation in these studies. Even though these events are largely expected in the blocking designs used in Sharpe et al. (2017); Keiflin et al. (2019), input reflecting their appearance still impinges on the dopamine neuron population at the proper time to support learning. By randomly injecting current across a subset of this population, the artificial stimulation may recover a ghost of the error pattern that would be caused by these events if they were unexpected – a pattern close enough to cause learning that seems normal, given the very simple behavioral readouts used in these studies.
If dopamine neurons do provide information about errors beyond the single dimension of value, this brings up questions about the limits on this and how this system deals with the vastness of the possible error space relative to the number of dopamine neurons. There are approximately 40,000 dopamine neurons in the VTA of rats, and another 25,000 in SN (Nair-Roberts et al., 2008). In humans, the total number is about 300,000 (Hirsch et al., 1988). If each neuron provides only a single bit of information, the capacity of just the VTA in rats is still 2^40,000. Of course, there is surely substantial redundancy across neurons, yet even if we reduce the cell number to 1000 real bits of information, we still end up with 1.0715e+301 potential patterns. This is a huge number. And of course, information represented in spiking may be augmented (or attenuated) by factors such as co-release of other neurotransmitters downstream and the location (region, cell type, dendritic compartments) and type (receptors, second messenger cascades, interactions capable of modulating) of interactions with downstream regions, etc. Even if all this combines to yield only 20 or 30 unique coded dimensions, we still end up with a billion possible patterns of output. This number seems big enough, with assistance from other systems (we do not propose this to be the only learning signal) and with contextual modulation of the processing (i.e. some factors might be given priority or not, depending on situation, by modulating inputs), to deal with much of the problem of dimensionality.
Finally it is worth noting that the demonstration here mirrors advances in the computational field, where distributed, multidimensional error signaling is a key component of more advanced algorithms, such as distributed reinforcement learning and successor representation (Dabney et al., 2017; Dayan, 1993). In both, the error driving learning is not unitary but rather is represented as a vector. Distributed reinforcement learning has recently been suggested as an explanation for the heterogeneity of the responses of individual dopamine neurons to errors in predicting reward value (Kurth-Nelson et al., 2019). The current results extend this to show for the first time that an assembly of dopamine neurons can function to represent the content of errors, even outside the realm of value. That the same information available in the pattern of activity is not readily apparent in the activity of individual neurons is in accord with ideas guiding behavioral neurophysiology in other areas (Yuste, 2015), and suggests it is time to consider the functions of the dopamine system across rather than within individual neurons.