PhD Defense in Digital Media (PDMD): ”Cultivando a empatia digital: o potencial da produção de narrativas áudio”

Candidate:
Ivone Manuela Neiva Santos

Date, Time and Location:
29 January 2026, 14:30, Sala de Atos da Faculdade de Engenharia da Universidade do Porto

President of the Jury:
António Fernando Vasconcelos Cunha Castro Coelho (PhD), Associate Professor with Habilitation from Faculdade de Engenharia, Universidade do Porto

Members:
Marisa Rodrigues Pinto Torres da Silva (PhD), Full Professor, Faculdade de Ciências Sociais e Humanas, Universidade Nova de Lisboa;
Maria Madalena da Costa Oliveira (PhD), Associate Professor, Instituto de Ciências Sociais, Universidade do Minho;
Maria José Lisboa Brites de Azeredo (PhD), Associate Professor with Habilitation, Faculdade de Comunicação, Arquitetura, Artes e Tecnologias da Informação, Universidade Lusófona;
Ana Isabel Crispim Mendes Reis (PhD), Associate Professor, Department of Communication and Information Sciences, Faculdade de Letras, Universidade do Porto (Supervisor);
Ricardo José Pinheiro Fernandes Morais (PhD), Assistant Professor, Department of Communication and Information Sciences, Faculdade de Letras, Universidade do Porto.

The thesis was co-supervised by José Manuel Pereira Azevedo (PhD), Full Professor, Department of Communication and Information Sciences, Faculdade de Letras, Universidade do Porto.

Abstract:

Empathy, which is defined as the ability to understand and share the emotional state of others, is considered to be fundamental to both personal well-being and social cohesion. Its presence has been linked to pro-social behaviour, while its absence has been associated with a greater predisposition to aggressive behaviour. Empathy is a multidimensional construct integrating affective and cognitive components, and educational interventions appear to positively influence its development. While cognitive empathy is generally considered to be more sensitive to education, the affective component appears to benefit from emotional activities. Empathy is now considered a vital skill for the ‘digital citizen’, but research suggests that empathy displayed online is lower than empathy displayed offline. Tendencies such as inattention, desensitisation and disinhibition, which are stimulated by the internet, seem to make empathy more difficult to achieve. The fact that empathic capacity primarily develops towards the end of adolescence highlights the importance of exploring strategies to foster it throughout education, particularly in an era where digital technology is becoming increasingly prevalent in all areas of social life. This scenario highlights the need to deepen our understanding of digital empathy and consider strategies for promoting it in education. This is reflected in the objectives that guided this thesis. The research underlying this thesis includes a review of the literature on empathy and the methodologies employed to study and encourage it. It features a critical analysis of the role of screens in young people’s daily lives, as well as of the different approaches to the relationship between empathy and digital technology. Given that sound is a privileged vehicle for emotional connection with characteristics that make it resilient to digital environments, the review also explores the literature on the potential of auditory stimuli and audio narratives to promote empathy. Supported by this review, the empirical research comprises two studies: a descriptive study and a quasi-experimental study. These studies involved three educational institutions at different levels and students in the adolescent age group (10–24 years). A total of 279 students participated in the descriptive study and 228 in the quasi-experimental study, of whom 76 were in the experimental group. The descriptive study measured and compared participants’ empathy and digital empathy using a questionnaire based on self-report scales previously used in empathy research with this age group. The quasi-experimental study assessed the impact of an educational programme designed to explore the potential of sound and narrative. Based on the experience-based learning model, the programme combined technical and socio-emotional learning through Media Education. It is organised into two modules. The first module consists of a set of group dynamics exploring the theme of empathy and its relationship with digital environments and auditory stimuli. The second module considers the process of producing audio narratives with emotional content. The intervention’s impact was assessed both quantitatively, via pre- and post-test surveys, and qualitatively, through analysis of the narratives and other texts produced by participants throughout the programme. Overall, the results of the descriptive study indicate that digital empathy is lower than general empathy, with the affective component being lower than the cognitive component on both scales. The results also show that girls have higher levels of both empathy and digital empathy. Age appears to be a differentiating factor in empathy levels, but not in digital empathy. Results suggest that digital empathy does not increase significantly during adolescence, unlike general empathy. These results therefore support the need for educational interventions to stimulate empathy from early adolescence onwards addressing its multidimensionality and various contexts. The quantitative impact of participation in the educational programme evaluated in the quasi-experimental study was not significant. Nevertheless, a qualitative analysis of the data suggests that participation in the programme provided an opportunity to experiment with different empathic practices. The programme can therefore be considered a tool that facilitates the reconciliation of technical learning with the development of empathy. This tool can be applied to different stages of adolescence, levels of education, and school contexts.
The findings of this research reiterate the concerns expressed in existing literature about the impact of digital environments on empathy development among young people. The
findings suggest that programmes based on producing audio narratives with emotional content could promote empathy in educational contexts, addressing the constraints imposed by digital environments. Based on these findings, this research has produced a manual to disseminate the tested educational model and a validated instrument to measure empathy and digital empathy in Portuguese. To our knowledge, this is the first questionnaire of its kind to enhance sound stimuli.

Keywords: digital empathy; audio production; narrative; education.

PhD Defense in Digital Media (PDMD): ”Hibridismo Urbano-Digital e Bem-Estar Social: Estratégias para Fortalecer a Conexão Social nas Cidades”

Candidate:
Acilon Himercírio Baptista Cavalcante

Date, Time and Location:
26 January 2026, 14:30, Room Professor Joaquim Sarmento (G129), Department of Civil and Georesources Engineering, Faculdade de Engenharia da Universidade do Porto

President of the Jury:
António Fernando Vasconcelos Cunha Castro Coelho (PhD), Associate Professor with Habilitation, Faculdade de Engenharia da Universidade do Porto

Members:
Isabel Alexandra Reis Gonçalves Ferreira (PhD), Researcher at the Centre for Social Studies, Universidade de Coimbra;
Ivone Marília Carinhas Ferreira (PhD), Assistant Professor, Department of Communication Sciences, Faculdade de Ciências Sociais e Humanas, Universidade Nova de Lisboa;
Ana Isabel Barreto Furtado Franco de Albuquerque Veloso (PhD), Full Professor, Department of Communication and Art, Universidade de Aveiro;
José Manuel Pereira Azevedo (PhD), Full Professor, Department of Communication and Information Sciences, Faculdade de Letras, Universidade do Porto (Supervisor);
Maria Van Zeller de Macedo de Oliveira e Sousa (PhD), Invited Assitant Professor, Department of Informatics Engineering, Faculdade de Engenharia, Universidade do Porto and Researcher at the Instituto de Engenharia de Sistemas e Computadores, Tecnologia e Ciência (INESC TEC).

Abstract:

This thesis investigates the promotion of social well-being in cities through the concept of urban-digital hybridity, which considers social and spatial interactions—whether physical and/or digital—as inseparable in the urban context. Based on an integrative literature review, a set of indicators was identified and categorised to more comprehensively assess the effectiveness of public policies aimed at improving quality of life in technology-mediated urban environments.
The review of indicators combined traditional methodologies—such as those used in the World Happiness Report, published by the United Nations—with metrics related to physical and mental health, community participation, perception of safety, and cultural vitality, while also incorporating emerging variables derived from the use of digital media. The research methodology adapted the mapping and critical analysis of these indicators to Marichela Sepe’s Cartography of Happiness, applying it to contexts of urban-digital hybridity and combining it with empirical digital placemaking experiments.
Case studies and digital placemaking experiences were conducted in the cities of Porto and Póvoa de Varzim, involving local communities, religious institutions, and schools, exploring technological mediation as a catalyst for social bonds and the activation of public spaces. Heatmaps of interactions, together with qualitative field data, allowed the identification of correlations between patterns of urban activation, city morphology, and landscape.
As its main outcome, the research proposes three core metrics for assessing social well-being in hybrid cities: Sense of Belonging, Sense of Place, and Sense of Community, analysed in their urban, digital, and hybrid dimensions.
The thesis’ main contribution is an integrated model for assessing urban social well-being, combining physical and digital metrics to provide an operational framework for urban planning and public policy design, aiming to foster more inclusive, participatory, and well-being-oriented cities.

PhD Defense in Informatics Engineering (ProDEI): ”Novel Computational Methodologies for Detailed Analysis of Human Motion from Image Sequences”

Candidate:
João Ferreira de Carvalho Castro Nunes

Date, Time and Location:
12th December 2025, at 14:00, in Sala de Atos of the Faculdade de Engenharia da Universidade do Porto

President of the Jury:
Pedro Nuno Ferreira da Rosa da Cruz Diniz, Full Professor at the Department of Informatics Engineering of the Faculdade de Engenharia da Universidade do Porto

Members:
Carlos Miguel Fernandes Quental (PhD), Assistant Professor at the Department of Mechanical Engineering, Instituto Superior Técnico, Universidade de Lisboa;
Hugo Pedro Martins Carriço Proença (PhD), Full Professor at the Department of Computer Science, Universidade da Beira Interior;
João Manuel Ribeiro da Silva Tavares (PhD), Full Professor at the Department of Mechanical Engineering, Faculdade de Engenharia, Universidade do Porto (Supervisor);
Luís Paulo Gonçalves dos Reis (PhD), Associate Professor with Habilitation at the Department of Informatics Engineering, Faculdade de Engenharia, Universidade do Porto.

The thesis was co-supervised by Pedro Miguel do Vale Moreira (PhD), Full Professor at the Instituto Politécnico de Viana do Castelo.

Abstract:

Human gait analysis provides critical information on biomechanical function, clinical assessment, and biometric recognition, but achieving accurate and reproducible motion understanding under real-world variability remains a major challenge. Traditional motion capture techniques are dependent on expensive infrastructure and controlled environments, which limit scalability and realworld validity. This thesis addresses these limitations by developing computational methodologies that exploit both RGB and depth information to enable robust, efficient, and fully automatic gait analysis using consumer-grade sensors. The research followed a structured trajectory that encompasses dataset creation, representation design, and methodological innovation. First, an extensive review and comparative analysis of existing vision- and depth-based gait datasets identified gaps in modality diversity, annotation quality, and accessibility. To address these issues, the Gait Recognition Image and Depth Dataset (GRIDDS) was designed, acquired, and publicly released. GRIDDS provides synchronized RGB, depth, silhouette, and 3D skeletal data from 35 participants recorded under controlled conditions, establishing one of the first standardized multi-modal benchmarks for gait analysis and recognition. Building on this foundation, two novel computational gait representations were introduced that fuse two-dimensional appearance cues with three-dimensional skeletal structure to increase robustness to viewpoint, clothing, and carried-object variations. These Gait Skeleton Image (GSI) variants (joint-based and line-based) were integrated within deep learning frameworks and evaluated through extensive experiments, demonstrating competitive and, under certain circumstances, superior performance compared with established appearance-based methods across multiple datasets and covariate conditions. Finally, new methods for gait silhouette interpolation were introduced, combining deterministic geometric reasoning (BRIEF) and bidirectional deep learning (BiSINet) to reconstruct missing frames and enhance temporal coherence. The proposed interpolation techniques significantly improved downstream recognition accuracy and demonstrated strong generalization across datasets and frame-rate conditions. Collectively, the contributions of this work, which span multi-modal data acquisition, robust gait representation learning, and temporal reconstruction, advance the scientific and technological frontiers of human gait analysis, promoting reproducibility, accessibility, and applicability in both clinical and computer vision domains.

PhD Defense in Informatics Engineering (ProDEI): ”Educational Question Generation with Narrative and Difficulty Control: A Special Focus on Portuguese”

Candidate:
Bernardo José Coelho Leite

Date, Time and Location:
17th November 2025, 14:00, Sala de Atos of the Faculdade de Engenharia da Universidade do Porto

President of the Jury:

Pedro Nuno Ferreira da Rosa da Cruz Diniz (PhD), Full Professor in the Department of Informatics Engineering, Faculdade de Engenharia, Universidade do Porto.

Members:

Hugo Ricardo Gonçalo Oliveira (PhD), Associate Professor in the Department of Informatics Engineering, Faculdade de Ciências e Tecnologia, Universidade de Coimbra;

Maria Luísa Torres Ribeiro Marques da Silva Coheur (PhD), Associate Professor in the Department of Informatics Engineering, Instituto Superior Técnico, Universidade de Lisboa;

Luís Paulo Gonçalves dos Reis (PhD), Associate Professor with Habilitation in the Department of Informatics Engineering, Faculdade de Engenharia, Universidade do Porto;

Henrique Daniel de Avelar Lopes Cardoso (PhD), Associate Professor in the Department of Informatics Engineering, Faculdade de Engenharia, Universidade do Porto (Supervisor).

Abstract:

Humans pose questions all the time, and efforts to create AI systems to do the same have been developed. This task, known as Question Generation (QG), is a subfield of natural language generation that aims to automatically produce relevant and grammatically correct questions from a given input, such as text. A key motivation for QG is to support time-consuming tasks like the manual creation of educational questions by teachers. While QG systems have significantly improved, grammatical accuracy alone does not ensure educational value. Consequently, the adoption of QG tools in educational contexts remains limited.

This thesis is driven by three key challenges in QG: (1) the trustworthiness of AI-generated questions; (2) the limited controllability; (3) restricted applicability in less-resourced languages. To address these challenges, we focus on generating open-ended and multiple-choice reading comprehension questions from narrative texts for elementary school students. For challenge 1, we analyze and report the quality of generated questions, identifying both successful and failed cases. For challenge 2, we enhance controllable generation mechanisms by incorporating multiple attributes, such as narrative elements, explicitness, and difficulty, into the generated questions. Challenge 3 is addressed through a special focus on Portuguese, a morphologically rich language that remains underrepresented in QG research.

Our methodology spans from early rule-based and neural approaches to more advanced controllable QG techniques, including fine-tuning, zero- and few-shot prompting with both small and large language models. This offers a comprehensive view of the evolution and performance of QG systems across different stages. We contribute by systematically applying and adapting current QG techniques. We develop case studies that explore controllability and educational relevance, providing comprehensive analyses of question quality, and releasing new QG models and datasets tailored to less-resourced languages such as Portuguese. Evaluation combines automatic metrics with human-centered assessments involving experts, teachers, and students, whose input provides critical insights into the usefulness and effectiveness of the generated questions.

The results show that it is possible to generate well-formulated and answerable questions with controllable attributes. Although machine-generated questions approach the quality of humanauthored ones, semantic issues still arise. In addition, generating MCQs with answer options that are effective for students remains a challenge. These findings highlight the ongoing need for research in educational QG, especially in supporting less-resourced languages and enhancing the reliability of automated generation systems.

PhD Defense (PDMD): ”Food Wide Web: a digital food and media literacy program addressed to adolescents”

Candidate:
Adriana Aguiar Aparício Fogel

Date, Time and Location:
October 20 2025, 14:30, Sala de Atos da Faculdade de Engenharia da Universidade do Porto

President of the Jury:

António Fernando Vasconcelos Cunha Castro Coelho (PhD), Associate Professor with Habilitation, Faculdade de Engenharia da Universidade do Porto.

Members:

Joana Alves Dias Martins de Sousa Ferreira (PhD), Assistant Professor, Faculdade de Medicina da Faculdade de Lisboa;
Ivone Marília Carinhas Ferreira (PhD), Assistant Professor, Department of Communication Sciences, Faculdade de Ciências Sociais e Humanas da Universidade Nova de Lisboa;
Sara de Jesus Gomes Pereira (PhD), Associate Professor, Department of Communication Sciences, Instituto de Ciências Sociais da Universidade do Minho;
Ana Filipa Pereira Oliveira (PhD), Assistant Professor, Faculdade de Comunicação, Arquitetura, Artes e Tecnologias da Informação da Universidade Lusófona;
José Manuel Pereira Azevedo (PhD), Full Professor, Department of Communication and Information Sciences, Faculdade de Letras da Universidade do Porto (Supervisor);
Ricardo José Pinheiro Fernandes Morais (PhD), Assistant Professor, Department of Communication and Information Sciences, Faculdade de Letras da Universidade do Porto.

Abstract:

The current complex and saturated media environment has given rise to an “infodemic” — an excess of information, both accurate and misleading, with potential impacts on the health of populations.
In the field of nutrition, the widespread dissemination of biased or incorrect content can contribute to unhealthy eating behaviors and may help explain the high global prevalence of obesity. Adolescents are particularly susceptible to this phenomenon because their self-regulation processes are not fully developed and because they are more influenced by external stimuli during this phase. This context reinforces the importance of promoting integrated food and media literacy among young people, providing them with tools that allow them to critically interpret, question, and consciously deal with the influences of food marketing and misinformation about nutrition. This study was developed in this context and had three main objectives: (i) to develop and implement a school-based intervention program using an intertwined perspective between media and food literacy issues; (ii) to evaluate the effectiveness of this intervention on the levels of media and food literacy of adolescents; (iii) to contribute to characterizing the media and food literacy levels of teenagers in Portuguese schools. The intervention consisted of ten 45-minute sessions, addressing eight dimensions of the food system — production, processing, distribution, planning and management, selection, preparation and cooking, intake, and disposal — through the lens of core media literacy competencies: access, analysis, evaluation, and creation. The contents included media materials that encouraged reflection and debate on the global food system. The program was implemented between October 2022 and May 2023 in four schools in northern Portugal — two were part of the intervention group and two were part of the control group. The final sample consisted of 202 students between the ages of 13 and 16 (M = 13.6; SD = 0.75). Data was collected through a questionnaire covering five main thematic areas: (a) exposure to food advertising, (b) satisfaction with body weight, (c) opinions, attitudes, and knowledge about media and food, (d) dietary practices, and (e) literacy related to food and media content. The questionnaire, constructed from pre-existing instruments, included open-ended and closed-ended questions and was administered to both groups before and after the sessions. In the intervention group, the creative projects developed in the classroom were also analyzed. Quantitative data were statistically evaluated, and qualitative data were subjected to a hybrid thematic analysis (inductive/deductive), followed by content analysis. After the initial qualitative analysis, a scoring system was developed that assigned numerical values to the responses. In line with the project objectives, healthy and sustainable choices, as well as critical evaluations and creations that encouraged participation, were valued. This scoring system included both closed-ended and task-based questions, allowing for a comprehensive and quantifiable assessment of the impact of the intervention on students’ food and media literacy, as well as their associated behaviors. The Likert section, consisting of 15 questions on attitudes, opinions, and knowledge, was scored from 0 to 4 per item, with a maximum possible score of 60 points. The food consumption section was converted to a weekly pattern and included a dietary adequacy index, with positive scores attributed to healthy behaviors (e.g. consumption of fruit and vegetables) and negative scores to unhealthy behaviors (e.g. consumption of fast food), with an initial score between -15 and 38, later transformed into a scale starting at 0, to facilitate interpretation. Finally, the section on food media literacy assessed the understanding of food labels (0 to 6 possible points, based on correct answers) and advertising literacy (score up to 14 points), including critical analysis of advertisements (one printed and one video) and an open-ended creative activity. The responses were analyzed based on their complexity, considering the ability to interpret marketing strategies and express ideas critically and creatively. The conversion of qualitative data into numerical scales allowed statistical comparisons between moments (pre vs. post) and groups (control vs. intervention; male vs. female). The results demonstrated that the intervention developed was feasible and effective. Significant improvements were observed in the students’ advertising literacy (1.5 vs. 1.9; p = 0.009) and in their ability to interpret food labels (2.0 vs. 2.2; p = 0.039). Among the girls in the intervention group, a significant improvement was observed in the total scores regarding opinions, attitudes, and knowledge about media and food (36.8 vs 38.1; p = 0.037). Concerning body satisfaction, significant differences between the girls in the intervention group and those in the control group at the preintervention moment became insignificant after the intervention (p = 0.015 vs. p = 0.402). The same occurred with the differences between the girls and boys in the intervention group, which were significant only before the program (p = 0.010 vs. p = 0.412). These data denote improvements in satisfaction with body image, particularly among the female participants, who reported a more balanced and healthy relationship with their bodies and eating habits after participating in the program. Regarding eating patterns, the male participants also showed improvements, but in specific habits, with an increase in the consumption of cereals and tubers standing out (6.2 vs. 8.2; p = 0.032). However, a persistent concern related to body weight was identified: 43.5% of the girls expressed a desire to change their weight, although only 28.3% considered themselves to be outside the weight they would consider normal. Among the boys, 76.1% declared themselves to be of normal weight, but 35.8% reported the desire to change their weight, even after taking part in the intervention. In addition, gaps in knowledge about the Mediterranean dietary pattern were found. Considering the entire sample, the students revealed difficulties in responding adequately to questions related to this topic, reporting only moderate levels of adherence to the aforementioned dietary pattern. In this aspect, they obtained a score of 30.6 (SD = 7.4), out of a maximum of 53. This is an important aspect in the characterization of adolescents, as the Mediterranean diet is the basis of Portugal’s national dietary guide, known as the Food Wheel. The adolescents also reported habitual exposure to advertisements for foods rich in sugar, salt and fat, despite the existing regulatory measures. Only 6.7% stated that they had not seen advertising for these products in the 30 days prior to the survey. In conclusion, this thesis proposes an innovative framework that integrates food and media literacy. Supported by empirical evidence, it includes a well-organized lesson plan and detailed assessment tools, constituting a practical resource for educators in general. The support resources used in the sessions are potentially adaptable to different educational and geographical contexts. The results contribute to the growing body of evidence supporting comprehensive educational interventions and reinforce the importance of integrating food and media literacy into school curricula as a strategy to promote critical thinking and informed food choices. Finally, the data suggest that a collaborative effort is essential to prepare adolescents to navigate an increasingly complex food environment, promoting healthier and more conscious choices. In this sense, collaboration between political decision-makers, education professionals, and stakeholders from the sectors involved (advertisers, advertising agencies, media outlets, social media platforms) is essential. The actions taken today have a substantial impact on the health and well-being of this and future generations.

Keywords: media literacy; food literacy; digital media; school-based intervention; adolescents.

PhD Defense in Informatics Engineering (ProDEI): ”Generative models for soccer”

Candidate:
Tiago Filipe Mendes Neves

Date, Time and Location:
16 September 2025, 15h30, Sala de Atos, Faculdade de Engenharia da Universidade do Porto

President of the Jury:
Pedro Nuno Ferreira da Rosa da Cruz Diniz (PhD), Full Professor, Department of Informatics Engineering, Faculdade de Engenharia da Universidade do Porto

Members:
Keisuke Fujii (PhD), Associate Professor, Department of Intelligent Systems, Graduate School of Informatics of the Nagoya University, Japan;
Jesse Jon Davis (PhD), Full Professor, Department of Computer Science, Faculty of Engineering Science, Katholieke Universiteit Leuven, Belgium;
Luís Paulo Gonçalves dos Reis (PhD), Associate Professor with Habilitation, Departament of Informatics Engineering, Faculdade de Engenharia da Universidade do Porto;
João Pedro Carvalho Leal Mendes Moreira (PhD), Associate Professor, Departament of Informatics Engineering, Faculdade de Engenharia da Universidade do Porto (Supervisor).

The thesis was co-supervised by Luís Jorge Machado da Cunha Meireles (PhD), Senior Psychologist & Data Scientist, FC Porto.

Abstract:

Self-supervised large models that disrupt domains such as language, vision, and biology are transforming the world. However, these generative models that learn the underlying data distribution do not perform at the same level on all tasks. For example, Large Language Models (LLMs) do not yet have concrete applicability in soccer analytics. The models lack reasoning capabilities to provide concrete and actionable insights that can compete with the wide range of case-specific metrics within soccer analytics. While there have been some studies exploring the applicability of generative models in soccer, no study aimed for the moonshot of building a complete self-supervised learning model for soccer event data. Let’s consider the individual events (each shot, pass, tackle, …) in a soccer match the “words” that describe what is happening. We can consider each possession a “sentence,” each game an “essay,” and event data as a whole a “language.” By working within this framework, we have all the tools to build a self-supervised model in the same image as LLMs. The goal of this thesis is to build a foundation self-supervised model for soccer event data – termed Large Events Model (LEM) – and demonstrate its real-world applicability and generality in solving a wide range of tasks, such as simulation and modeling, that would otherwise require multiple different approaches. We propose three approaches to building LEMs: a chain of classifiers, causal mask modeling, and sequential language modeling with transformers. First, the chain of classifiers provides the first generative model that models all aspects of event data without posing restrictions on event types, reaching a level of performance that allows large-scale simulation of soccer matches. Then, we investigate two alternative approaches to remove some of the constraints of the first approach. The causal mask modeling approach using multilayer perceptrons reaches the state-of-the-art performance of several of our proposed benchmarks, providing a set of application-ready models to solve a wide range of soccer analytics tasks. We explore a wide range of applications, from automated strategy search with reinforcement learning to risk-reward behaviors of soccer players. More than a dozen use cases for LEMs are present in this thesis. The implications of our work are far-reaching. LEMs have the potential to become the operating system for event data in soccer analytics. They will transform the way clubs work, with easier access to machine learning models that would otherwise require tremendous modeling effort. With LEMs, the barrier to entry will lower significantly as any club in the world can access a model capable of solving its most relevant problems.

Keywords: generative models; foundation models; sports analytics; deep learning applications; simulation; soccer.

PhD Defense in Informatics Engineering (ProDEI): “Text Information Retrieval in Tetun”

Candidate:
Gabriel de Jesus

Date, Time and Location:
1 September 2025, 14:30, Sala de Atos, Faculdade de Engenharia da Universidade do Porto

President of the Jury:
Rui Filipe Lima Maranhão de Abreu (PhD), Full Professor, Departament of Informatics Engineering, Faculdade de Engenharia da Universidade do Porto

Members:
Arjen P. de Vries (PhD), Full Professor at the Institute for Computing and Information Sciences of the Radboud Universiteit, Nimega, The Netherlands;
Bruno Emanuel da Graça Martins (PhD), Associate Professor, Departament of Electrical and Computer Engineering, Instituto Superior Técnico da Universidade de Lisboa;
Henrique Daniel de Avelar Lopes Cardoso (PhD), Associate Professor, Departament of Informatics Engineering, Faculdade de Engenharia da Universidade do Porto;
Sérgio Sobral Nunes (PhD), Associate Professor, Departament of Informatics Engineering, Faculdade de Engenharia da Universidade do Porto (Supervisor).

Abstract:

Ensuring access to information in all languages is crucial for bridging disparities in communities’ participation in the digital age and fostering a more inclusive and equitable society, particularly for speakers of low-resource languages. However, enabling such access remains a significant challenge for many of these communities. Tetun, a language that transitioned from a dialect to one of Timor-Leste’s official languages when the country restored its independence in 2002, faces similar challenges. According to the 2015 census, Tetun is spoken by approximately 79% of the country’s 1.18 million population. Despite its official status, Tetun remains underserved in language technology. Specifically, information retrieval-based solutions for the language do not exist, making it challenging to find relevant information on the internet and digital platforms for text-based search in Tetun.
This work tackles these challenges by investigating retrieval strategies for text-based search that can enable the application of information retrieval techniques to develop search solutions for Tetun, with a specific focus on the ad-hoc text retrieval task. Given that language-specific algorithms, tools, and document collections for Tetun were previously unavailable, this work began by creating these foundational resources, which serve as contributions relevant to information retrieval and natural language processing domains. These resources include a tokenizer, a language identification model, a stemmer, a stopword list, a document collection, a test collection, baselines for the ad-hoc text retrieval task, and a search log dataset. The contributions to information retrieval for low-resource languages include: (1) A data collection pipeline tailored for low-resource languages to streamline the construction of textual data from the web; (2) A human-in-the-loop methodology for annotating, processing, and constructing a dataset well-suited for a variety of information retrieval and natural language processing tasks; (3) A novel network-based approach for stopword detection; (4) Methodologies for developing a stemmer, designed for a language heavily influenced by loanwords, and the construction of a ground truth set for evaluating stemmer performance; (5) A detailed approach for constructing a test collection to evaluate the effectiveness of retrieval systems; (6) A methodology for establishing a robust baseline for the ad-hoc text retrieval task; and (7) Document contextualization and dual-parameter tuning strategies for hybrid text retrieval. The results from this work contribute to the development of technologies associated with the computational processing of Tetun, address gaps in its linguistic resources, and achieve impactful outcomes that elevate Tetun’s status. These advancements open new opportunities for future research and innovation. Moreover, this work introduces promising methodologies that can be adapted to other languages facing similar challenges, thereby contributing to the broader advancement of information retrieval for low-resource languages.

PhD Defense in Informatics Engineering: ”Onboard detection and guidance based on side scan sonar images for autonomous underwater vehicles”

Candidate: Martin Joseph Aubard

Date, time and location:
25 July 2025, 14:00, Sala de Atos DEEC – I-105, Faculty of Engineering, University of Porto

President of the Jury:
Pedro Nuno Ferreira da Rosa da Cruz Diniz (PhD), Full Professor, Department of Informatics Engineering, Faculty of Engineering, University of Porto

Members:
Bilal Wehbe (PhD), Senior Researcher at the German Research Center for Artificial Intelligence, Germany;
Catarina Helena Branco Simões da Silva (PhD), Associate Professor, Department of Computer Engineering, Faculty of Science and Technology, University of Coimbra;
Andry Maykol Gomes Pinto (PhD), Associate Professor, Department of Electrical and Computer Engineering, Faculty of Engineering, University of Porto;
Ana Maria Dias Madureira Pereira (PhD), Coordinating Professor with Aggregation, Department of Computer Engineering, Instituto Superior de Engenharia do Porto, Instituto Polítécnico do Porto (Supervisor).

The thesis was co-supervised by Luís Filipe Pinto de Almeida Teixeira (PhD), Associate Professor in the Department of Informatics Engineering at the Faculty of Engineering of the University of Porto.

Abstract:

This thesis addresses the challenge of improving Autonomous Underwater Vehicles (AUVs) onboard detection and interaction capabilities using Side-Scan Sonar (SSS) data. Traditionally, underwater missions relied on pre-defined plans where data are analyzed post-mission by operators or experts. This workflow is time-consuming, often requiring multiple missions to identify and localize underwater targets. The need for repeated missions increases operational costs and complexity, highlighting the inefficiency of current methodologies. Moreover, such approaches do not allow the AUV to interact with detected targets in real time, limiting the scope of mission adaptation and real-time decision-making. To overcome these limitations, this thesis presents a novel framework integrating deep learning models for object detection directly onboard AUVs. This integration enables the vehicle to detect, localize, and interact with underwater targets in real time, offering significant improvements over traditional post-mission analysis. The framework builds upon the LSTS toolchain, which is responsible for AUV motion control and communication, and introduces enhanced real-time data processing capabilities. However, implementing such a model into an embedded system suffers from computational limitations affecting the model’s performance. Thus, the knowledge distillation methods have been implemented, ensuring smaller, more efficient models to perform onboard detection without sacrificing accuracy. Additionally, to improve the model’s robustness against underwater noise, a novel adversarial retraining framework, ROSAR, is introduced, ensuring reliable operation even in noisy sonar environments. Following the onboard detection and localization enhancement, we focused on onboard interaction with the detected object. This is realized by extending the previous onboard framework and validating it through a customized simulator, enhancing interaction with the detected objects, and validating through a pipeline inspection use case, which reduces mission time by combining sonar detection and camera data collection in a single mission, utilizing behavior trees and safety-assessed models. Given the lack of open-source sonar datasets in the field, this thesis contributes to two novel publicly available side-scan sonar datasets, SWDD and Subpipe, which include field-collected data on walls and pipelines and are manually annotated for object detection. By shifting from post-mission analysis to real-time detection and interaction, this thesis significantly improves the operational efficiency of AUV missions. The proposed framework streamlines underwater operations and enhances AUVs’ autonomous behavior, relying on efficient, accurate, and robust object detection model for efficient underwater exploration and monitoring applications.

PhD Defense in Informatics Engineering : ”Uncertainty interpretations for the robustness of object detection in self-driving vehicles”

Candidate:
Filipa Marília Monteiro Ramos Ferreira

Date, time and location:
23 July 2025, 14:30, Sala de Atos, Faculty of Engineering, University of Porto

President of the Jury:
Carlos Miguel Ferraz Baquero-Moreno (PhD), Full Professor, Department of Informatics Engineering, Faculty of Engineering, University of Porto

Members:
Tiago Manuel Lourenço Azevedo (PhD), Associate Researcher, Department of Computer Science and Technology, University of Cambridge, United Kingdom;
Marco António Morais Veloso (PhD), Coordinating Professor, Department of Science and Technology, Oliveira do Hospital School of Technology and Management, Polytechnic Institute of Coimbra;
Luís Filipe Pinto de Almeida Teixeira (PhD), Associate Professor, Department of Informatics Engineering, Faculty of Engineering, University of Porto;
Rosaldo José Fernandes Rossetti (PhD), Full Professor, Department of Informatics Engineering, Faculty of Engineering, University of Porto (Supervisor).

Abstract:

Ensuring the reliability and robustness of deep learning remains a pressing challenge, particularly as neural networks gain traction in safety-critical applications. While extensive research has focused on improving accuracy across datasets, generalisation, interpretability and robustness in the deployment domain remain poorly understood. In fact, in real-world scenarios, models often underperform without clear explanations. Addressing these concerns, uncertainty quantification has emerged as a key research direction, offering deeper insight into neural networks and enhancing confidence, interpretability, and robustness. Among critical applications, self-driving vehicles stand out, where uncertainty-aware object detection can significantly improve perception and decision-making. This thesis explores interpretations of uncertainty tailored to object detection in the context of self-driving vehicles. In this sense, two novel methods to estimate the aleatoric component and one approach to modelling the epistemic uncertainty are proposed. Through the utilisation of anchor distributions readily available in any anchor-based object detector, uncertainty is estimated holistically while avoiding costly sampling procedures. Further, the concept of existence is introduced, a probability measure that indicates whether an object truly exists in the real-world, regardless of classification. Building upon these ideas, three applications of uncertainty and existence are explored, namely the Existence Map, the Uncertainty Map and the Existence Probability. Whilst the aforementioned maps encode the existence measure and the aleatoric uncertainty over the space of input samples, the Existence Probability merges the information provided by the Existence Map with the standard detections, supplementing model outputs. Evaluation showcases the coherence of uncertainty estimates and demonstrates the usefulness of the Existence and Uncertainty Map in supporting the standard model, providing open-set capabilities and giving a degree of confidence to true positives, false positives and false negatives. The merging strategy of the Existence Probability reports a considerable improvement in the performance of the object detector both in validation and perturbation, while detecting all classes of the dataset despite being trained only on cars, pedestrians and cyclists. The second part of this thesis features a study of the underspecification distribution and its connection with the epistemic uncertainty. Underspecification, recently coined, greatly endangers deep learning deployment in safety-critical systems as it depicts the variability of predictors generated by a single architecture with increasingly diverging performance in the application domain. The analysis performed showcases that, if the uncertainty estimates are correctly calibrated, a single predictor is sufficient to predict the spread of the underspecification distribution, avoiding running repeated costly training sessions. All proposed methods are designed to be model-agnostic, real-time compatible, and seamlessly applicable to deployed models without requiring retraining, underscoring their significance for robust and interpretable object detection in autonomous driving.

PhD Defense in Informatics Engineering: ”Aiding researchers making their computational experiments reproducible”

Candidate:
Lázaro Gabriel Barros da Costa

Date, Time and Location:
18th of July 2025, 16:00, Sala de Atos of the Faculty of Engineering of University of Porto.

President of the Jury:
Pedro Nuno Ferreira da Rosa da Cruz Diniz (PhD), Full Professor, Department of Informatics Engineering, Faculty of Engineering, University of Porto.

Members:
Tanu Malik (PhD), Associate Professor, Department of Electrical Engineering and Computer Science, University of Missouri, U.S.A;
Miguel Carlos Pacheco Afonso Goulão (PhD), Associate Professor, Department of Computer Science, Faculty of Science and Technology, New University of Lisbon;
Gabriel de Sousa Torcato David (PhD), Associate Professor, Department of Informatics Engineering, Faculty of Engineering, University of Porto;
Jácome Miguel Costa da Cunha (PhD), Associate Professor, Department of Informatics Engineering, Faculty of Engineering, University of Porto (Supervisor).

The thesis was co-supervised by Susana Alexandra Tavares Meneses Barbosa (PhD), Senior Researcher at INESCTEC Porto.

Abstract:

Scientific reproducibility and replicability are essential pillars of credible research, especially as computational experiments become increasingly prevalent across diverse scientific disciplines such as chemistry, climate science, and biology. Despite strong advocacy for Open Science and adherence to FAIR (Findable, Accessible, Interoperable, and Reusable) principles, achieving true reproducibility remains a formidable challenge for many researchers. Key issues such as complex dependency management, inadequate metadata, and the often cumbersome access to necessary code and data severely hamper reproducibility efforts. Moreover, existing reproducibility tools frequently offer piecemeal solutions that fail to address the multifaceted needs of diverse and complex experimental setups, particularly those that span multiple programming languages and involve intricate data systems. This thesis addresses these challenges by presenting a comprehensive framework designed to enhance computational reproducibility across a variety of scientific fields. Our approach involved a detailed systematic review of existing reproducibility tools to identify prevailing gaps and limitations in their design and functionality. This review underscored the fragmented nature of these tools, each supporting only aspects of the reproducibility process but none providing a holistic solution, particularly for experiments that require robust data handling or support for many programming languages.
To bridge these gaps, we introduced SCIREP, an innovative framework that automates essential aspects of the reproducibility workflow such as dependency management, containerization, and cross platform compatibility. This framework was rigorously evaluated using a curated dataset of computational experiments, achieving a reproducibility success rate of 94%.
Furthering the accessibility and usability of reproducible research, we developed SCICONV, a conversational interface that simplifies the configuration and execution of computational experiments using natural language processing. This interface significantly reduces the technical barriers traditionally associated with setting up reproducible studies, allowing researchers to interact with the system through simple, guided conversations. Evaluation results indicated that SCICONV successfully reproduced 83% of the experiments in our curated dataset with minimal user input, highlighting its potential to make reproducible research more accessible to a broader range of researchers.
Moreover, recognizing the critical role of user studies in evaluating tools, methodologies, and prototypes, particularly in software engineering and behavioral sciences, this thesis also extends into the realm of experimental tool evaluation. We conducted a thorough analysis of existing tools used for software engineering and behavioral science experiments, identifying and proposing specific features designed to enhance their functionality and ease of use for conducting user studies. These proposed features were validated through a survey involving the research community, confirming their relevance and the need for their integration into existing and future tools. The contributions of this thesis are manifold, encompassing the development of a classification framework for reproducibility tools, the creation of a standardized benchmark dataset for assessing tool efficacy, and the formulation of SCIREP and SCICONV to significantly advance the state-of-the-art in computational reproducibility. Looking forward, the research will focus on expanding the capabilities of reproducibility tools to support more complex scientific workflows, further enhancing user interfaces, and integrating additional functionalities to fully support user studies. By doing so, this work aims to pave the way for a more robust, accessible, and efficient computational reproducibility ecosystem that can meet the evolving needs of the global research community.

Keywords: Reproducibility; Replicability; Reusability; Computational Experiments; Conversational User Interface; User Studies.