ARTICLE Wawrzynski_NN_2013/IDIAP Autonomous reinforcement learning with experience replay Wawrzyński, P. Tanwani, Ajay Kumar Actor–critic Autonomous learning reinforcement learning Step-size estimation Neural Networks 41 0 156 - 167 0893-6080 2013 Special Issue on Autonomous Learning http://www.sciencedirect.com/science/article/pii/S0893608012002936 URL http://dx.doi.org/10.1016/j.neunet.2012.11.007 doi This paper considers the issues of efficiency and autonomy that are required to make reinforcement learning suitable for real-life control tasks. A real-time reinforcement learning algorithm is presented that repeatedly adjusts the control policy with the use of previously collected samples, and autonomously estimates the appropriate step-sizes for the learning updates. The algorithm is based on the actor–critic with experience replay whose step-sizes are determined on-line by an enhanced fixed point algorithm for on-line neural network training. An experimental study with simulated octopus arm and half-cheetah demonstrates the feasibility of the proposed algorithm to solve difficult learning control problems in an autonomous way within reasonably short time.