Continuous action-space reinforcement learning methods applied to the minimum-time swing-up of the acrobot

Nichols, Barry D. ORCID: https://orcid.org/0000-0002-6760-6037 (2015) Continuous action-space reinforcement learning methods applied to the minimum-time swing-up of the acrobot. In: Systems, Man and Cybernetics (SMC), 2015 IEEE International Conference on. Institute of Electrical and Electronics Engineers (IEEE), pp. 2084-2089. 9781479986965. (doi:10.1109/SMC.2015.364)

Full text is not in this repository.

Abstract

Here I apply three reinforcement learning methods to the full, continuous action, swing-up acrobot control benchmark problem. These include two approaches from the literature: CACLA and NM-SARSA and a novel approach which I refer to as NelderMead-SARSA. NelderMead-SARSA, like NM-SARSA, directly optimises the state-action value function for action selection, in order to allow continuous action reinforcement learning without a separate policy function. However, as it uses a derivative-free method it does not require the first or second partial derivatives of the value function.

All three methods achieved good results in terms of swing-up times, comparable to previous approaches from the literature. Particularly NelderMead-SARSA, which performed the swing-up in a shorter time than many approaches from the literature.

Item Type: Book Section
Research Areas: A. > School of Science and Technology > Computer Science
Item ID: 18767
Useful Links:
Depositing User: Barry Nichols
Date Deposited: 18 Jan 2016 10:51
Last Modified: 30 May 2019 18:34
ISBN: 9781479986965
URI: https://eprints.mdx.ac.uk/id/eprint/18767

Actions (login required)

Edit Item Edit Item