PhD Defense in Informatics Engineering: ''Symmetry, hierarchical structures and shallow neural networks: Advancing reinforcement learning for humanoids'' - DEI

Candidate:
Miguel António Mourão de Abreu

Date, Time and Location
July 19, 15:00, room Professor Joaquim Sarmento (G129), DEC, Faculdade de Engenharia da Universidade do Porto

President of the Jury:
Rui Filipe Lima Maranhão de Abreu, PhD, Full Professor, Faculdade de Engenharia, Universidade do Porto

Members:
Francisco António Chaves Saraiva de Melo, PhD, Associate Professor with Habilitation, Department of Computer Science and Engineering, Instituto Superior Técnico, Universidade de Lisboa;
Carlos Fernando da Silva Ramos, PhD, Full Professor, Department of Informatics Engineering, Instituto Superior de Engenharia do Porto, Instituto Politécnico do Porto;
Abbas Abdolmaleki, PhD, Senior Scientist at Google DeepMind;
Luís Paulo Gonçalves dos Reis, PhD, Associate Professor with Habilitation, Department of Informatics Engineering, Faculdade de Engenharia, Universidade do Porto (Supervisor);
Henrique Daniel de Avelar Lopes Cardoso, PhD, Associate Professor, Department of Informatics Engineering, Faculdade de Engenharia, Universidade do Porto;
Armando Jorge Miranda de Sousa, PhD, Associate Professor, Department of Electrical and Computer Engineering, Faculdade de Engenharia, Universidade do Porto.

The thesis was co-supervised by José Nuno Panelas Nunes Lau, PhD, Associate Professor in the Department of Electronics, Telecommunications and Informatics at the Universidade de Aveiro.

Abstract:
In the rapidly evolving field of robotics, reinforcement learning (RL) has become an essential tool. However, as tasks become more complex, traditional RL methods face challenges in terms of sample efficiency, inter-task coordination, stability, and overall solution quality. To address this problem, we investigated various strategies. Initially, we explored ways of enriching the state space while learning skills from scratch with RL, resulting in excellent individual behaviors. However, integrating these behaviors proved challenging, as they often explored the vast action space in a non-structured manner. To address this, we shifted to a structured approach, starting by abstracting the robot’s locomotion model with an analytical controller, and improving the upper body efficiency.
Gradually, the learning component was extended to the entire robot, making the analytical controller a starting point in the learning process, rather than a restriction. We studied realistic external perturbations and ways of leveraging the robot’s symmetry to speed up the optimization. This led to an extension to PPO’s objective function called Proximal Symmetry Loss, with which we created
a fully functional omnidirectional walk with push-recovery abilities. Building on this knowledge, we devised a new symmetry-enriched learning framework based on Skill-Set-Primitives — a novel hierarchical structure that captures commonalities across different skills, easing transitions. This framework simplified the policy into a shallow neural network, significantly improving sample efficiency and stability. Applying this framework, we completely redesigned our simulated soccer team, achieving cohesive high-quality behaviors that secured victory in the RoboCup World Championship in 2022 and 2023. This team included a new localization algorithm with unprecedented accuracy, custom algorithms for path planning, role management, teammate communication, and more. We released the codebase to the RoboCup community, offering a robust Python foundation for new teams. Our work received recognition in scientific challenges, earning awards for introducing
he league’s first running skill, pioneering an agile close control dribble, and developing the most accurate localization algorithm. The contributions extend beyond RoboCup with Adaptive Symmetry Learning, a method of leveraging symmetry to improve sample efficiency, even in robots not perfectly symmetric by design or those with asymmetrical flaws. A natural next step is to assess how this approach could benefit real humanoid robots, which inherently have imperfections.

Keywords: Reinforcement Learning; Humanoid Robots; Symmetry; Locomotion; Skill-Set-Primitives; Hierarchical Structures; Shallow Neural Networks; RoboCup; Robotic Soccer.