Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

Reinforcement learning using a continuous time actor-critic framework with spiking neurons.

PLoS computational biology | 2013

Animals repeat rewarded behaviors, but the physiological basis of reward-based learning has only been partially elucidated. On one hand, experimental evidence shows that the neuromodulator dopamine carries information about rewards and affects synaptic plasticity. On the other hand, the theory of reinforcement learning provides a framework for reward-based learning. Recent models of reward-modulated spike-timing-dependent plasticity have made first steps towards bridging the gap between the two approaches, but faced two problems. First, reinforcement learning is typically formulated in a discrete framework, ill-adapted to the description of natural situations. Second, biologically plausible models of reward-modulated spike-timing-dependent plasticity require precise calculation of the reward prediction error, yet it remains to be shown how this can be computed by neurons. Here we propose a solution to these problems by extending the continuous temporal difference (TD) learning of Doya (2000) to the case of spiking neurons in an actor-critic network operating in continuous time, and with continuous state and action representations. In our model, the critic learns to predict expected future rewards in real time. Its activity, together with actual rewards, conditions the delivery of a neuromodulatory TD signal to itself and to the actor, which is responsible for action choice. In simulations, we show that such an architecture can solve a Morris water-maze-like navigation task, in a number of trials consistent with reported animal performance. We also use our model to solve the acrobot and the cartpole problems, two complex motor control tasks. Our model provides a plausible way of computing reward prediction error in the brain. Moreover, the analytically derived learning rule is consistent with experimental evidence for dopamine-modulated spike-timing-dependent plasticity.

Pubmed ID: 23592970 RIS Download

Research resources used in this publication

None found

Additional research tools detected in this publication

Antibodies used in this publication

None found

Associated grants

None

Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.

This is a list of tools and resources that we have found mentioned in this publication.


Swiss National Science Foundation (tool)

RRID:SCR_011554

The Swiss National Science Foundation (SNSF) is Switzerland''s leading provider of scientific research funding. The SNSF annually supports some 7200 researchers, almost 80 percent of whom are aged 35 years or younger. With its federal mandate, it supports basic research in all disciplines, from philosophy and biology to the nanosciences and medicine. It also invests in applied research in various scientific fields. The focus of its activities is the scientific endorsement of projects submitted by researchers. The best applicants are funded by the SNSF with an annual total amount equalling approximately CHF 700 million. Established in 1952 as a foundation under private law, the SNSF has the autonomy it needs to promote independent scientific research. The SNSF is committed to promoting young scientists and works to ensure that scientific research in Switzerland has the most favourable conditions for developing internationally. It also encourages dialogue between scientists and representatives in society, politics and the economy.

View all literature mentions