{"id":13291,"date":"2024-07-18T11:44:03","date_gmt":"2024-07-18T11:44:03","guid":{"rendered":"https:\/\/dei.fe.up.pt\/dev\/?p=13291"},"modified":"2024-07-23T08:13:59","modified_gmt":"2024-07-23T08:13:59","slug":"phd-defense-in-informatics-engineering-symmetry-hierarchical-structures-and-shallow-neural-networks-advancing-reinforcement-learning-for-humanoids","status":"publish","type":"post","link":"https:\/\/dei.fe.up.pt\/dev\/phd-defense-in-informatics-engineering-symmetry-hierarchical-structures-and-shallow-neural-networks-advancing-reinforcement-learning-for-humanoids\/","title":{"rendered":"PhD Defense in Informatics Engineering: &#8221;Symmetry, hierarchical structures and shallow neural networks: Advancing reinforcement learning for humanoids&#8221;"},"content":{"rendered":"<p><strong>Candidate:<\/strong><br \/>\nMiguel Ant\u00f3nio Mour\u00e3o de Abreu<\/p>\n<p><strong>Date, Time and Location <\/strong><br \/>\nJuly 19, 15:00, room Professor Joaquim Sarmento (G129), DEC, Faculdade de Engenharia da Universidade do Porto<\/p>\n<p><strong>President of the Jury:<\/strong><br \/>\nRui Filipe Lima Maranh\u00e3o de Abreu, PhD, Full Professor, Faculdade de Engenharia, Universidade do Porto<\/p>\n<p><strong>Members:<\/strong><br \/>\nFrancisco Ant\u00f3nio Chaves Saraiva de Melo, PhD, Associate Professor with Habilitation, Department of Computer Science and Engineering, Instituto Superior T\u00e9cnico, Universidade de Lisboa;<br \/>\nCarlos Fernando da Silva Ramos, PhD, Full Professor, Department of Informatics Engineering, Instituto Superior de Engenharia do Porto, Instituto Polit\u00e9cnico do Porto;<br \/>\nAbbas Abdolmaleki, PhD, Senior Scientist at Google DeepMind;<br \/>\nLu\u00eds Paulo Gon\u00e7alves dos Reis, PhD, Associate Professor with Habilitation, Department of Informatics Engineering, Faculdade de Engenharia, Universidade do Porto (Supervisor);<br \/>\nHenrique Daniel de Avelar Lopes Cardoso, PhD, Associate Professor, Department of Informatics Engineering, Faculdade de Engenharia, Universidade do Porto;<br \/>\nArmando Jorge Miranda de Sousa, PhD, Associate Professor, Department of Electrical and Computer Engineering, Faculdade de Engenharia, Universidade do Porto.<\/p>\n<p>The thesis was co-supervised by Jos\u00e9 Nuno Panelas Nunes Lau, PhD, Associate Professor in the Department of Electronics, Telecommunications and Informatics at the Universidade de Aveiro.<\/p>\n<p><strong>Abstract:<\/strong><br \/>\nIn the rapidly evolving field of robotics, reinforcement learning (RL) has become an essential tool. However, as tasks become more complex, traditional RL methods face challenges in terms of sample efficiency, inter-task coordination, stability, and overall solution quality. To address this problem, we investigated various strategies. Initially, we explored ways of enriching the state space while learning skills from scratch with RL, resulting in excellent individual behaviors. However, integrating these behaviors proved challenging, as they often explored the vast action space in a non-structured manner. To address this, we shifted to a structured approach, starting by abstracting the robot\u2019s locomotion model with an analytical controller, and improving the upper body efficiency.<br \/>\nGradually, the learning component was extended to the entire robot, making the analytical controller a starting point in the learning process, rather than a restriction. We studied realistic external perturbations and ways of leveraging the robot\u2019s symmetry to speed up the optimization. This led to an extension to PPO\u2019s objective function called Proximal Symmetry Loss, with which we created<br \/>\na fully functional omnidirectional walk with push-recovery abilities. Building on this knowledge, we devised a new symmetry-enriched learning framework based on Skill-Set-Primitives \u2014 a novel hierarchical structure that captures commonalities across different skills, easing transitions. This framework simplified the policy into a shallow neural network, significantly improving sample efficiency and stability. Applying this framework, we completely redesigned our simulated soccer team, achieving cohesive high-quality behaviors that secured victory in the RoboCup World Championship in 2022 and 2023. This team included a new localization algorithm with unprecedented accuracy, custom algorithms for path planning, role management, teammate communication, and more. We released the codebase to the RoboCup community, offering a robust Python foundation for new teams. Our work received recognition in scientific challenges, earning awards for introducing<br \/>\nhe league\u2019s first running skill, pioneering an agile close control dribble, and developing the most accurate localization algorithm. The contributions extend beyond RoboCup with Adaptive Symmetry Learning, a method of leveraging symmetry \u00a0to improve sample efficiency, even in robots not perfectly symmetric by design or those with asymmetrical flaws. A natural next step is to assess how this approach could benefit real humanoid robots, which inherently have imperfections.<\/p>\n<p><strong>Keywords:<\/strong> Reinforcement Learning; Humanoid Robots; Symmetry; Locomotion; Skill-Set-Primitives; Hierarchical Structures; Shallow Neural Networks; RoboCup; Robotic Soccer.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Candidate: Miguel Ant\u00f3nio Mour\u00e3o de Abreu Date, Time and Location July 19, 15:00, room Professor Joaquim Sarmento (G129), DEC, Faculdade de Engenharia da Universidade do Porto President of the Jury: Rui Filipe Lima Maranh\u00e3o de Abreu, PhD, Full Professor, Faculdade de Engenharia, Universidade do Porto Members: Francisco Ant\u00f3nio Chaves Saraiva de Melo, PhD, Associate Professor [&hellip;]<\/p>\n","protected":false},"author":60,"featured_media":13325,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"footnotes":""},"categories":[267,27,271],"tags":[],"class_list":["post-13291","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-highlights","category-news","category-phd-defenses"],"_links":{"self":[{"href":"https:\/\/dei.fe.up.pt\/dev\/wp-json\/wp\/v2\/posts\/13291","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dei.fe.up.pt\/dev\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dei.fe.up.pt\/dev\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dei.fe.up.pt\/dev\/wp-json\/wp\/v2\/users\/60"}],"replies":[{"embeddable":true,"href":"https:\/\/dei.fe.up.pt\/dev\/wp-json\/wp\/v2\/comments?post=13291"}],"version-history":[{"count":1,"href":"https:\/\/dei.fe.up.pt\/dev\/wp-json\/wp\/v2\/posts\/13291\/revisions"}],"predecessor-version":[{"id":13292,"href":"https:\/\/dei.fe.up.pt\/dev\/wp-json\/wp\/v2\/posts\/13291\/revisions\/13292"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/dei.fe.up.pt\/dev\/wp-json\/wp\/v2\/media\/13325"}],"wp:attachment":[{"href":"https:\/\/dei.fe.up.pt\/dev\/wp-json\/wp\/v2\/media?parent=13291"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dei.fe.up.pt\/dev\/wp-json\/wp\/v2\/categories?post=13291"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dei.fe.up.pt\/dev\/wp-json\/wp\/v2\/tags?post=13291"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}