Written by
Sina Tafazoli
Sept. 25, 2024

A new mathematical model offers a better explanation of how dopamine surges in the brain when receiving a reward. The findings from researchers at the Princeton Neuroscience Institute (PNI) may help improve both AI for smarter tech, which relies on similar principles underlying how dopamine is thought to impact reward learning. It might also improve the understanding of how neurologic conditions related to reward processing, like addiction, may arise.

The findings were published in the journal Nature Neuroscience on July 3.

The brain helps us make complex decisions every day. People who invest in the stock market, for example, must consider many factors simultaneously when evaluating whether to buy or sell, such as market conditions, stock performance trends, earnings reports, and risk tolerance. With such complex decisions, each variable needs to be accounted for as well as its potential impact on the outcome – whether it is the thrill of profit or the disappointment of loss. How the brain processes various life conditions and the potential for reward or loss, however, is less well known.

For decades, neuroscientists have studied how reward and punishments playout in the brain by studying the “pleasure” molecule dopamine. Until recently, scientists thought dopamine flooded all parts of the brain equally in response to reward. 

Recent findings by former Princeton University graduate student Rachel Lee, Ph.D. in the lab of PNI professors Ilana Witten, Ph.D. and Nathaniel Daw, Ph.D., though, reveal a more nuanced understanding of dopamine, finding not all brain regions receive dopamine equally in response to reward.

Mice trained to run on a ball in a virtual reality environment watched an immersive and responsive video not unlike the Windows 95 computer screensaver maze of yesteryear. As mice navigated through a virtual arena, tower-like images briefly appeared on either the left or right side. At the end of the corridor, thirsty mice having just run through a maze were treated to a hydration boost (drop of water) if they correctly turned toward the side with more towers.

When the researchers monitored the activity of dopamine neurons, they discovered a range of responses. Neurons increased or decreased their activity in response to the mice's moment-to-moment conditions the animals experienced during the experiment. This included factors such as their viewing angle relative to the screen, their position in the virtual arena, and their running speed.

“State is a very important part of classic reinforcement learning but we don't really know how state arises and how the brain exactly creates it because there is no clear idea where the state is,” said Lee, the lead author of the work and now scientist at the Allen Institute. “We had this kind of bold hypothesis that maybe this interesting heterogeneity might be a reflection of state features that are represented in the brain.”

To test their hypothesis, Lee and her team developed a mathematical model to explain their experimental findings by simulating dopamine neuron responses to replicate how the brain functions. This model consists of artificial neurons that are interconnected and arranged in layers that work together to recognize patterns or make decisions.

The researchers then trained this model on the same task that the mice performed in the virtual reality experiment by feeding it videos of the maze environment, and adjusting connections between artificial neurons to mimic how mice behaved in the experiment

In their model, dopamine neurons were simulated to respond to the expected rewards, depending on whether the virtual mouse turned left or right. Contrary to previous models, Lee found that like dopamine neurons in the mouse brain, dopamine neurons in their simulated brain responded not only to rewards but also to other simulated environmental states, such as the visual scene of the corridor presented to the network. This finding suggests that dopamine neurons process environmental states and rewards in an integrated manner, challenging the previous assumption in the field that these aspects were handled separately in the brain.

Lee and her team’s work comes at a pivotal moment in the history of artificial intelligence and neuroscience. As reinforcement learning techniques become integral to AI systems, like ChatGPT, the quest for bridging the gap between artificial and biological brains is growing. 

These results are a step in paving the way for more flexible algorithms for intelligent agents such as robots and self-driving cars, enhancing their ability to interact with the environment. Concurrently, it deepens our understanding of what might go wrong in brain disorders such as addiction and depression, revealing that an individual's environmental conditions are closely linked to their reward system. Ultimately, it highlights the benefits of integrating experimental and theoretical neuroscience, demonstrating the value of collaborative approaches in advancing both fields.

“One cool thing about our paper is that you can finally see really interesting stuff pop up when you when you allow mice to do more complicated things that we don’t think they can do,” Lee said.

CITATION: “A feature-specific prediction error model explains dopaminergic heterogeneity,” Rachel S. Lee, Yotam Sagiv, Ben Engelhard, Ilana B. Witten, Nathaniel D. Daw. Nature Neuroscience, July 03, 2024. DOI: 10.1038/s41593-024-01689-1