This paper presents the application of reinforcement learning to improve the performance of highly dynamic single legged locomotion with compliant series elastic actuators. The goal is to optimally exploit the capabilities of the hardware in terms of maximum jump height, jump distance, and energy efficiency of periodic hopping. These challenges are tackled with the reinforcement learning method Policy Improvement with Path Integrals (PI2) in a model-free approach to learn parameterized motor velocity trajectories as well as highlevel control parameters. The combination of simulation and hardware-based optimization allows to efficiently obtain optimal control policies in an up to 10-dimensional parameter space. The robotic leg learns to temporarily store energy in the elastic elements of the joints in order to improve the jump height and distance. In addition, we present a method to learn time-independent control policies and apply it to improve the energetic efficiency of periodic hopping.
- Detailed record: https://infoscience.epfl.ch/record/198739?ln=en