Scientists at UNSW Sydney’s Decision Neuroscience Lab have made a major discovery about the way brains influence behavior which challenges the theory that has stood for 30 years.
And the findings could one day have key implications for the way we treat brain related diseases such as Parkinson’s or deal with conditions like Tourette’s syndrome.
In a paper published today in the prestigious journal Science, the research team of Dr Miriam Matamales and Dr Jay Bertran-Gonzalez, together with Neuroscience Lab Director, Scientia Professor Bernard Balleine, wanted to determine the relationship between the two main types of neuron found in the striatum, a major area of the brain responsible for voluntary movement in animals and humans.
They set up experiments to observe mice while they learned new actions that led to a reward of food, then examined the activity of these neurons in large areas of the striatum.
They looked specifically at the activity of the two classes of neuron in this area – those expressing D1 or D2 types of dopamine receptors.
For the last three decades, these D1- and D2-neurons were thought to have an independent influence on voluntary action, respectively initiating and inhibiting reward-seeking behaviour.
While studying how these two types of neuron became active during learning, the team began to find an unexpectedly high degree of interaction between them which happened locally, within the striatum itself.
For an example of behaviour where these neurons would be active, Dr Matamales suggests a simple, but common scenario of walking into a room and flicking on a light switch to find the light doesn’t work.
“So you walk into a room, flick the switch without even thinking about it, and there’s no light,” she says. “You learn something has changed and so the behavioural response has to be modified by that learning.
What we’re interested in is what changes in the brain are necessary to update that learning to realise ‘oh, the bulb’s blown, I should stop flicking the switch expecting the light to go on’.
Although this may seem trivial at one level, this kind of plasticity in decision making processes is going on all the time.
Updating learning to control our actions is a critical aspect of brain function acquired through evolution, to stop us wasting valuable energy by repeating a task for no reward.”
Professor Balleine explains that what is happening is that prior learning about behaviour tied to one outcome is put on hold while an updated version relevant to the change in the environment is rewritten.
“This regulation of voluntary action is not about getting rid of or replacing the knowledge or behaviour, it’s about being more efficient in stopping actions that use energy for no reward,” he says.
“You’ve got a neuron, the D1-neuron, that’s involved in acquiring and maintaining ongoing behaviour and another, the D2-neuron, that’s engaged in updating that behaviour when there are changes in the environment.
And what is game changing is that this critical interaction is going on in the striatum, not further downstream in more distant motor output structures of the brain as was thought previously.”
Rethinking brain health
Professor Balleine says this new understanding of the D1 and D2 neurons intermingling in the striatum during learning could have important implications for medicine and even our concept of how voluntary actions are acquired and altered.
“Our research suggests that the whole theory of basal ganglia function that people have been working with in order to try and treat diseases of various kinds, is seriously incomplete,” he says.
Diseases that are associated with basal ganglia function include Parkinson’s and Huntington’s disease, dementia, dystonia, Tourette syndrome and obsessive-compulsive disorder.
Dr Bertran-Gonzalez suggests that a clue to understanding at least some of these conditions could be found in the learning-related functions of the striatum.
“Most basal ganglia dysfunction appears later in life and takes years to settle,” he says. “Some conditions are expressed by aberrant behaviour, where movements or whole actions that should be inhibited are not inhibited, perhaps because they never learned to be inhibited in the striatum, or because that learning was deficient.
In such cases, in addition to simply attempting to counter uncontrolled motor movements, we should perhaps explore more progressive therapy that tries to correct this early learning. I think that we should add a learning perspective to virtually all treatments of basal ganglia dysfunction. After all, most of our current behaviour is no more than learning’s ‘work in progress’.”
Professor Balleine notes that with health conditions related to the basal ganglia, the striatum could be the new target area for medical intervention.
D1 and D2 neurons in the striatum can be linked with learning and cognition rather than simply motor output, the research shows.
“We believe these findings have the potential to re-target treatments of basal ganglia disorders to the striatum,” Professor Balleine says.
“One of the most exciting parts of this research is that it speaks to particular connections between particular neurons within a particular structure. So it really gives great targeting information for treatment, and gives us new ways to think about these problems.”
Dr Matamales says while the research raises hopes for new ways to treat health problems relating to brain function, there is still plenty of research ahead before the observations in mice are replicated in humans.
“It is exciting to think that our new understanding could one day be used to target problems in the brain with more depth,” she says. “But the important thing you can say about this work right now is that we are providing more evidence to relate these neurons in the striatum with learning and cognition rather than simply motor output.
“Hopefully this will lead to further breakthroughs that help us understand how the brain learns and how we adapt our behaviour to our environment.”
Learning from trial and error is a core adaptive mechanism in behaviour (Packard et al., 1989; Glimcher, 2002). This learning process is driven by reward prediction errors (RPEs) that signal the difference between expected and actual outcomes (Houk, 1995; Montague et al., 1996; Schultz et al., 1997).
Substantia nigra and ventral tegmental area (VTA) midbrain neurons use bursts and dips in dopaminergic signalling to relay positive and negative RPEs to prefrontal cortex (Deniau et al., 1980; Swanson, 1982) and the striatum, activating the so-called Go and NoGo pathways (Beckstead et al., 1979; Surmeier et al., 2007).
Parkinson’s disease is caused by a substantial loss of dopaminergic neurons in the substantia nigra (Edwards et al., 2008), leading to the depletion of dopamine in the striatum (Koller and Melamed, 2007). Dopaminergic medication has been shown to alter how Parkinson’s disease patients learn from feedback (Cools et al., 2001; Bódi et al., 2009) and how they use past learning to make value-based choices in novel situations (Frank et al., 2004; Frank, 2007; Shiner et al., 2012).
A common finding is that, when required to make value-based decisions after learning, patients ON compared to OFF medication are better at choosing the option associated with the highest value (approach), whereas when OFF medication, they are better at avoiding the option with the lowest value (avoidance) (Frank et al., 2004; Frank, 2007).
However, it is currently unknown how dopamine-induced changes during the learning process relate to these subsequent dopamine-induced changes in approach/avoidance choice behaviour.
An influential framework of dopamine function in the basal ganglia proposes that the dynamic range of phasic dopamine modulation in the striatum, in combination with tonic baseline dopamine levels, gives rise to the medication differences observed in Parkinson’s disease (Frank, 2005).
This theory suggests that lower baseline dopamine levels in unmedicated Parkinson’s disease are favourable for the upregulation of the NoGo pathway, leading to an emphasis on learning from negative outcomes. In contrast, higher tonic dopamine levels in medicated Parkinson’s disease lead to continued suppression of the NoGo pathway, resulting in (erroneous) response perseveration even after negative feedback.
Extremes in these medication-induced changes in brain signalling are thought to manifest behaviourally in dopamine dysregulation syndrome, in which patients exhibit compulsive tendencies, such as pathological gambling or shopping (Voon et al., 2010).
In support of the theory on Go/NoGo signalling, impairments in learning performance associated with higher dopamine levels have been found mainly in negative-outcome contexts; during probabilistic selection (Frank et al., 2004), reversal learning (Cools et al., 2006), and probabilistic classification (Bódi et al., 2009).
In addition to these behavioural adaptations, increased striatal activations have been reported in medicated Parkinson’s disease patients during the processing of negative RPEs (Voon et al., 2010).
Similarly, a recent study on rats performing a reversal learning task revealed a distinct impairment in the processing of negative RPE with increased dopamine level (Verharen et al., 2018). However, little is known about how these medication-related changes in striatal responsivity to RPE relate to (i) later behavioural choice patterns; and (ii) changes in brain activity during subsequent value-based choices.
We examined the role of dopaminergic medication in choice behaviour and associated brain mechanisms. Twenty-four Parkinson’s disease patients ON and OFF medication and a reference group of 24 age-matched control subjects performed a two-stage probabilistic selection task (Frank et al., 2004) (Fig. 1A) while undergoing functional MRI.
The experiment’s first stage was a learning phase, during which participants gradually learned to make better choices for three fixed pairs of stimulus options, based on reward feedback. In the second, transfer stage, participants used their learning phase experience to guide choices when presented with novel combinations of options, without receiving any further feedback (Fig. 1A). Value-based decisions during the transfer phase were examined using an approach/avoidance framework (Fig. 1B).
To better describe the underlying processes that contribute to learning, behavioural responses were fit using a hierarchical Bayesian reinforcement learning model (Jahfari et al., 2018; Van Slooten et al., 2018), adapted to estimate both within-patient effects of medication and across-subject effects of disease (Sharp et al., 2016).
This quantification of behaviour then informed our model-based functional MRI analysis, in which we examined medication-related changes in blood oxygen level-dependent (BOLD) brain signals in response to RPEs during learning, as well as medication-related changes in approach/avoidance behaviour and brain responses during subsequent value-based choices.