Continuous action-space reinforcement learning methods applied to the minimum-time swing-up of the acrobot
Nichols, Barry D. ORCID: https://orcid.org/0000-0002-6760-6037
(2015)
Continuous action-space reinforcement learning methods applied to the minimum-time swing-up of the acrobot.
In:
Systems, Man and Cybernetics (SMC), 2015 IEEE International Conference on.
Institute of Electrical and Electronics Engineers (IEEE), pp. 2084-2089.
ISBN 9781479986965.
[Book Section]
(doi:10.1109/SMC.2015.364)
Abstract
Here I apply three reinforcement learning methods to the full, continuous action, swing-up acrobot control benchmark problem. These include two approaches from the literature: CACLA and NM-SARSA and a novel approach which I refer to as NelderMead-SARSA. NelderMead-SARSA, like NM-SARSA, directly optimises the state-action value function for action selection, in order to allow continuous action reinforcement learning without a separate policy function. However, as it uses a derivative-free method it does not require the first or second partial derivatives of the value function.
All three methods achieved good results in terms of swing-up times, comparable to previous approaches from the literature. Particularly NelderMead-SARSA, which performed the swing-up in a shorter time than many approaches from the literature.
Item Type: | Book Section |
---|---|
Research Areas: | A. > School of Science and Technology > Computer Science |
Item ID: | 18767 |
Useful Links: | |
Depositing User: | Barry Nichols |
Date Deposited: | 18 Jan 2016 10:51 |
Last Modified: | 30 May 2019 18:34 |
URI: | https://eprints.mdx.ac.uk/id/eprint/18767 |
Actions (login required)
![]() |
View Item |
Statistics
Additional statistics are available via IRStats2.