This article surveys reinforcement learning approaches in social robotics. Reinforcement learning is a framework for decision-making problems in which an agent interacts through trial-and-error with its environment to discover an optimal behavior. Since interaction is a key component in both reinforcement learning and social robotics, it can be a well-suited approach for real-world interactions with physically embodied social robots. The scope of the paper is focused particularly on studies that include social physical robots and real-world human-robot interactions with users. We present a thorough analysis of reinforcement learning approaches in social robotics. In addition to a survey, we categorize existent reinforcement learning approaches based on the used method and the design of the reward mechanisms. Moreover, since communication capability is a prominent feature of social robots, we discuss and group the papers based on the communication medium used for reward formulation. Considering the importance of designing the reward function, we also provide a categorization of the papers based on the nature of the reward. This categorization includes three major themes: interactive reinforcement learning, intrinsically motivated methods, and task performance-driven methods. The benefits and challenges of reinforcement learning in social robotics, evaluation methods of the papers regarding whether or not they use subjective and algorithmic measures, a discussion in the view of real-world reinforcement learning challenges and proposed solutions, the points that remain to be explored, including the approaches that have thus far received less attention is also given in the paper. Thus, this paper aims to become a starting point for researchers interested in using and applying reinforcement learning methods in this particular research field.
Extract authors, key findings, references, and an executive summary using AI.
This article provides an extensive survey of the intersection between reinforcement learning (RL) and social robotics, highlighting RL's potential as a framework for enabling robots to learn adaptive social behaviors through interaction. Recognizing that social robots must navigate complex, unstructured human environments, the authors argue that RL—which relies on trial-and-error learning from environment feedback—is uniquely suited for this domain. The survey focuses exclusively on physically embodied social robots and real-world human-robot interaction studies, excluding virtual agents and industrial systems to ensure relevance to social HRI. The authors propose a comprehensive taxonomy for categorizing existing research based on the type of RL algorithm (e.g., bandit-based, value-based, or deep RL) and the design of the reward mechanism. Three primary themes emerge: Interactive RL, where humans provide explicit or implicit guidance; Intrinsically Motivated methods, which focus on the robot's internal 'well-being' and needs; and Task Performance-driven methods, where rewards are tied to the successful completion of specific goals. This categorization helps researchers choose suitable architectures based on their specific application domain, such as eldercare, education, or entertainment. A significant portion of the paper is dedicated to the communication channels used for reward formulation. The authors observe that while verbal and non-verbal cues (like smiles and gaze) are frequent, tactile communication remains an underutilized but powerful medium. They also analyze higher-level interaction dynamics such as user engagement and attention, which provide complex but valuable feedback for learning algorithms. The survey highlights that model-free approaches like Q-learning remain dominant, though Deep Reinforcement Learning (DRL) is gaining ground for handling high-dimensional audio-visual sensory data. The paper identifies several persistent challenges in real-world RL deployment, notably the 'curse of goal specification' and the difficulty of learning from limited, noisy samples. To combat these, researchers have explored solutions such as reward shaping, human-in-the-loop guidance, and simulation-to-real transfer. However, the authors note that modeling human behavior in simulation remains a formidable hurdle. Transparency in robot learning—showing the human teacher what the robot is thinking or intends to do—is identified as a critical factor for improving training efficiency and user trust. Looking forward, the authors point toward multi-goal and multi-objective RL as the next frontier. They suggest that social robots of the future will need to manage a variety of simultaneous objectives, balancing task execution with human comfort and emotional satisfaction. By providing a structured overview of the current landscape and pointing out unexplored avenues, this work serves as a foundational reference for researchers aiming to develop more autonomous, adaptive, and socially intelligent robotic systems.
This article surveys reinforcement learning approaches in social robotics. Reinforcement learning is a framework for decision-making problems in which an agent interacts through trial-and-error with its environment to discover an optimal behavior. Since interaction is a key component in both reinforcement learning and social robotics, it can be a well-suited approach for real-world interactions with physically embodied social robots. The scope of the paper is focused particularly on studies that include social physical robots and real-world human-robot interactions with users. We present a thorough analysis of reinforcement learning approaches in social robotics. In addition to a survey, we categorize existent reinforcement learning approaches based on the used method and the design of the reward mechanisms. Moreover, since communication capability is a prominent feature of social robots, we discuss and group the papers based on the communication medium used for reward formulation. Considering the importance of designing the reward function, we also provide a categorization of the papers based on the nature of the reward. This categorization includes three major themes: interactive reinforcement learning, intrinsically motivated methods, and task performance-driven methods. The benefits and challenges of reinforcement learning in social robotics, evaluation methods of the papers regarding whether or not they use subjective and algorithmic measures, a discussion in the view of real-world reinforcement learning challenges and proposed solutions, the points that remain to be explored, including the approaches that have thus far received less attention is also given in the paper. Thus, this paper aims to become a starting point for researchers interested in using and applying reinforcement learning methods in this particular research field.
1.Reinforcement Learning (RL) is uniquely suited for social robotics because interaction is a core component of both decision-making frameworks and social embodiment.
2.The 'curse of goal specification' remains a primary challenge, making the design of the reward function the most crucial step in implementing RL for social robots.
3.RL in social robotics can be categorized into three main reward themes: interactive RL, intrinsically motivated methods, and task performance-driven methods.
The discussion section underscores the distinct potential of social robots as a testbed for RL in real-world scenarios, particularly through their ability to communicate internal states via social cues like facial expressions and gaze. It highlights that while Interactive RL (IRL) with implicit rewards is the most common approach, the inherent slowness and sparsity of social signals pose significant challenges for convergence. To address this, the authors advocate for combining reward approaches—integrating intrinsic motivation and task-driven metrics—to provide constant feedback even in the absence of explicit human social cues. Future directions emphasize moving beyond single-task scenarios toward multi-goal and multi-objective RL. These frameworks would enable robots to handle diverse domestic tasks (e.g., medication reminders and caregiver alerts) while simultaneously optimizing for both operational efficiency and user satisfaction. Finally, the authors suggest that model-based RL, despite its complexity in capturing human dynamics, remains a critical area for reducing interaction time and hardware depreciation.