Artificial intelligence approaches have produced impressive results across a wide spectrum of fields and applications in recent years. These successes in combination with an increasing demand for assisted living, elderly care and local production have have caused the expectation of an imminent deployment of intelligent autonomous robots in our everyday life. These future agents will be expected to work in close interaction with non-expert users in both general everyday situations and professional tasks. In either scenario such intelligent agents will have to adapt to new or modified tasks in dynamic environments without relying on the huge amounts of data that can not be provided by non-experts in either the quantity or quality required by current approaches. Furthermore, state-of-the-art machine learning methods are not able to represent the learned models, behaviors and features in a transparent and comprehensible way resulting in processes with insufficient guarantees and additional complexity for human-machine collaborations. A new generation of intelligent robots will be required that is capable of communicating intent to non-expert users as well as understanding intent from action of the user in a natural way. These robots will appear more intuitive to non-expert users as well as being able to deduct valuable information through more intuitive interaction with the non-expert user. This new generation of intelligent, intuitive robots will require
i) efficient learning of explainable and comprehensible skills and behaviors,
ii) skill and behavior improvement from weakly labeled, suboptimal demonstrations,
iii) efficient transfer and adaptation of skills and behaviors through intuitive interaction.

Machine Learning for Robotics

Recent advances in robotics forecast the advent of autonomous agents in our everyday life. Highly dynamic environments and increasingly sophisticated human-robot interactions render classical, hand-tuned controllers and preprogrammed behavior obsolete. Instead such systems will apply machine learning techniques to adapt to new environments and to learn new behaviors. I reformulated the stochastic optimal control problem as a constraint optimization and introduced a Kullback-Leibler bound on the state-action distribution, resulting in a novel policy update [ICRA 2014, JAIS 2015]. At the Intelligent Autonomous Systems group I focused my research on imitation learning and human-robot collaboration. In particular the goal of my thesis was the development of The Imitation Learning Pipeline, a pipeline of novel methods that learn explainable behaviors from unlabeled demonstrations represented in structures comprehensible to non-expert users. First, a set of demonstrations of one or multiple tasks are collected in either the Cartesian-space via motion capture or in the joint-space via kinesthetic teaching or tele-operation.

Learning Movement Primitive Sequences from Unlabeled Demonstrations

These initially unlabeled demonstrations are segmented into sequences of movement primitives, while simultaneously learning a library of primitives. Movement primitives represent policies that decide for certain actions given a state. I developed an expectation-maximization based method entitled Probabilistic Segmentation [HUMANOIDS 2015, IJRR 2017], where the expectation step computes new segmentations of the demonstrations given the current model and the maximization step learns a new model given the current segmentations. The model is defined as a mixture of primitives. In my past work I chose the Probabilistic Movement Primitive representation because of its probabilistic nature and the corresponding model properties. However, I have also worked with other representations, such as Dynamical Movement Primitives, and have discussed how a simple probabilistic extension allows to use them in my segmentation approach.

Learning Explainable Behavior from Demonstrations

The segmented demonstrations and the learned primitive library are used to induce a Probabilistic Context-Free Grammar [ICRA 2018, IJRR 2019]. The grammar structure is learned using an Markov chain Monte Carlo optimization, where I introduced a novel prior based on three Poisson distributions, addressing several problems of commonly used grammar priors. The new prior favors intuitively structured grammars, where the three hyper-parameters of the prior have clear semantic meanings. The induced grammar serves as an easily comprehensible representation of the capability of a robot with respect to the primitives learned from the demonstrations. Furthermore, the grammar is used to generate new sequences of primitives from the library.

Improving Comprehensibly Structured Behavior through Reinforcement Learning

Finally, the induced grammar is improved through a reinforcement learning approach. Given that the grammar was induced from multiple demonstrations, it can be assumed that the learned structure was biased towards the seen demonstrations. However, the learned primitives and relations encoded in the grammar might be capable of solving previously unseen tasks. I formulated a natural policy gradient method for formal grammars. The resulting method efficiently improves the hierarchical grammar representation given an objective function. The method is not limited to context free grammars but can also be used for higher order grammars. Further learning methods can be applied in order to strengthen, weaken, create or destroy relations in order to improve the grammar even further.