Generalizing Movements with Information Theoretic Stochastic Optimal Control


Stochastic Optimal Control (SOC) is typically used to plan a movement for a specific situation. While most SOC methods fail to generalize this movement plan to a new situation without re-planning, we present a SOC method that allows us to reuse the obtained policy in a new situation as the policy is more robust to slight deviations from the initial movement plan. In order to improve the robustness of the policy, we employ information-theoretic policy updates that explicitly operate on trajectory distributions instead of single trajectories. To ensure a stable and smooth policy update, we limit the ‘distance’ between the trajectory distributions of the old and the new control policy. The introduced bound offers a closed form solution for the resulting policy and extends results from recent developments in SOC. Indifference to many standard SOC algorithms, our approach can directly infer the system dynamics from data points, and, hence, can also be used for model-based reinforcement learning.

Journal of Aerospace Information Systems