reward prediction error

4 days ago
4 min read

Updated: 3 days ago

Dopamine is a neurotransmitter which is fundamental for survival and is released when we take actions that ideally enable continued propagation of our genetic information with time. Our brains are hardwired with primitive neural circuits that release dopamine (which is essentially a reward signal) on completion of actions like food consumption, movement, procreation, learning, social cooperation which are all necessary actions for survival. The neural circuits for dopamine release have been molded through years of evolution and is tuned to the conditions of pre-modernization era where resources were scarce. The release and regulation of dopamine lied within an optimal range during the pre-modernization era which ensured sustained motivation to always take actions which enabled survival. Due to rapid progress in technology, we catapulted into an era of abundance of resources while our reward circuits have not quite evolved to catch up. We presently have easy and abundant access to resources in the form of processed foods or audio-visual stimulation that trigger massive spikes in dopamine at any time we want as compared to the pre-modernization era where this was a very rare occurrence. Our body immediately reacts to the unnatural dopamine spike by downregulating or reducing the number of dopamine receptors to maintain biological balance or homeostasis. The unnatural spike in dopamine caused by high stimulation does not sustain and immediately drops to the standard baseline level of dopamine in the body but however due to the downregulation of dopamine receptors caused due to the spike, the body perceives the dopamine level to be lower than the baseline level essentially leading us to be in an unmotivated state to take effortful action. The reward circuit which is responsible for motivation paradoxically turns out to be a double edged sword in the modernization era. Overstimulation results in a self-fulfilling destructive loop by reducing dopamine receptor density thereby causing the requirement of higher stimulation to feel a sense of reward which in turn leads to further reduction in dopamine receptor density.

What really fascinated me was the internal mechanism of how dopamine release really takes place in a conscious mind. Our minds are wired to start releasing dopamine in anticipation of a reward even before the action takes place. This is a fundamental mechanism of habit loops which are neural circuits that automate repetitive actions to preserve energy. A non-habitual action that enables survival requires conscious effort and high energy expenditure where dopamine is released only after the action is completed. A habitual action on the other hand has two junctures of dopamine release where it is first released in anticipation of the reward before the action takes place which enables an automatic energy-efficient action and then released again after the actual completion of the action. The feeling of reward in the case of habit loops turns out to be a delta of the dopamine release at the two junctures of the action. The pre-action anticipatory release of dopamine is essentially a prediction based on past experience which should ideally be equivalent to the post-action release of dopamine thereby maintaining dopamine at the normal baseline level. However, in general reality, the prediction can end up being inaccurate resulting in a either a positive or negative delta of dopamine levels which ultimately impacts the feeling of reward post the action leading to the phenomenon termed as reward prediction error. A positive error occurs when the actual release of dopamine is more than the anticipatory release of dopamine leading to dopamine levels rising above baseline thereby resulting in a positive feeling of reward, upliftment in mood and strengthening of the corresponding habit loop. The reverse happens for a negative prediction error. What truly fascinated me here was learning that we could consciously control the release of dopamine at the two junctures of the habit loop and sway the reward prediction error in a direction we want. By consciously setting our expectations for the reward, we could control the anticipatory release of dopamine and by managing our conscious thoughts post the completion of the action, we could control the post action dopamine release as well. For example, consciously lowering expectations can reduce anticipatory dopamine release and conscious positive reinforcement post action can increase post action dopamine release both leading to a positive reward prediction error thereby raising dopamine levels above baseline whereby the exact same action if repeated without any conscious intervention could have resulted in a neutral or even negative reward prediction error. Reward prediction error also helps explain the power of intermittent reward schedules used by social media or casinos to reinforce certain habit loops. Fixed reward schedules leads to low variance of delta between anticipatory and post-action dopamine release as the fixed nature of the reward schedule results in the predicted dopamine release being almost accurately equal to the actual dopamine release post action leading to the dopamine levels to more or less remain at the baseline. However, in intermittent reward schedules there is a large variance in the delta of predicted dopamine release and actual dopamine release due to the unpredictable nature of the reward schedule. Intermittent reward schedules can be designed specifically to create a positive delta between the actual and predicted dopamine release resulting in dopamine levels to increase above baseline thereby further strengthening the corresponding habit loop in a way that does not happen in a fixed reward schedule. This is a ultimately very interesting mechanism which can and is being used to modify our habit loops and control intrinsic motivation.