Previous research has shown that human actions can be detected and classified using their motion patterns. However, simply labelling motion patterns is not sufficient in a cognitive system that requires the ability to reason about the agent's intentions, and also to account for how the environmental setting (e.g. the presence of nearby objects) affects the way an action is performed. In this paper, we develop a graphical model that captures how the low level movements that form a high level intentional action (e.g. reaching for an object) vary depending on the situation. We then present statistical learning algorithms that are able to learn characterisations of specific actions from video using this representation. Using object manipulation tasks, we illustrate how the system infers an agent's goals from visual information and compare the results with findings in psychological experiments. In particular we show that we are able to reproduce a key result from the child development literature on action learning in children. This provides support for our model having properties in common with action learning in humans. At the end of the paper we argue that our action representation and learning model is also suitable as a framework for understanding and learning about affordances. An important element of our framework is that it will allow affordances to be understood as indicative of possible intentional actions.
|Title of host publication||Towards Affordance-Based Robot Control, International Seminar, Dagstuhl Castle, Germany, June 5-9, 2006, Revised Papers|
|Publication status||Published - 1 Feb 2008|