Decoding joint action success through eye movements: A data-driven approach
Talk, 47th European Conference on Visual Perception, Mainz, Germany
Humans have coordinated with one another, animals, and machines for centuries, yet the mechanisms that enable seamless collaboration without extensive training remain poorly understood. Previous research on human-human and human-agent coordination—often relying on simplified paradigms—has identified variables such as action prediction, social traits, and action initiation as key contributors to successful coordination. However, how these factors interact and influence coordination success in ecologically valid settings remains unclear. In this study, we reverse-engineered the coordination process in a naturalistic, turn-taking table tennis task while controlling for individual skill levels. We found that well-calibrated internal models—reflected in individuals’ ability to predict their own actions—strongly predict coordination success, even without prior extensive training. Using multimodal tracking of eye and body movements combined with machine learning, we demonstrate that dyads with similarly accurate self-prediction abilities coordinated more effectively than those with lower or less similar predictive skills. These well-calibrated individuals were also better at anticipating the timing of their partners’ actions and relied less on visual feedback to initiate their own, enabling more proactive rather than reactive responses. These findings support motor control theories, suggesting that internal models used for individual actions can extend to social interactions. This study introduces a data-driven framework for understanding joint action, with practical implications for designing collaborative robots and training systems that promote proactive control. More broadly, our approach—combining ecologically valid tasks with computational modeling—offers a blueprint for investigating complex social interactions in domains such as sports, robotics, and rehabilitation.