PhD Defence in Informatics Engineering: "Inmplode: A Framework to Interpret Multiple Related Rule-Based Models'' - DEI

Candidate:

Pedro Rodrigo Caetano Strecht Ribeiro

Date, Time and Location:

13th of June 2025, 15:00, Sala de Atos, Faculdade de Engenharia, Universidade do Porto

President of the Jury:

Rui Filipe Lima Maranhão de Abreu, PhD, Full Professor, Department of Informatics Engineering, Faculdade de Engenharia, Universidade do Porto

Members:

Johannes Fürnkranz, PhD, Full Professor, Department of Computer Science of the Institute for Application-Oriented Knowledge Processing at the Johannes Kepler University Linz, Austria;

José María Alonso Moral, PhD, Full Professor, Department of Electronics and Computing, Escuela Técnica Superior de Ingeniería de la Universidad de Santiago de Compostela, Spain;

José Luís Cabral de Moura Borges, PhD, Associate Professor, Department of Industrial Engineering and Management, Faculdade de Engenharia, Universidade do Porto;

João Pedro Carvalho Leal Mendes Moreira, PhD, Associate Professor, Department of Informatics Engineering, Faculdade de Engenharia, Universidade do Porto (Supervisor).

The thesis was co-supervised by Carlos Manuel Milheiro de Oliveira Pinto Soares, PhD, Associate Professor, Department of Informatics Engineering, Faculdade de Engenharia, Universidade do Porto.

Abstract:

This thesis investigates the challenges and opportunities presented by the increasing trend of using multiple specialized models, referred to as operational models, to address complex data analysis problems. While such an approach can enhance predictive performance for specific sub-problems, it often leads to fragmented knowledge and difficulties understanding overarching organizational phenomena. This research focuses on synthesizing the knowledge embedded within a collection of decision tree models chosen for their inherent interpretability and suitability for knowledge extraction. For example, a company with chain stores or a university with diverse programs, each using dedicated prediction models (sales or dropout, respectively). While these localized models are important, a global perspective is valuable organization-wide. However, managing many operational models, especially for cross-program/store analysis, can be overwhelming.

A methodology framed within a comprehensive framework is introduced to merge sets of operational models into consensus models. These consensus models are directed towards higher level decision-makers, enhancing the interpretability of knowledge generated by the operational models. The framework, named Inmplode, addresses common challenges in model merging and presents a highly customizable process. This process features a generic workflow and adaptable components, detailing alternative approaches for each subproblem encountered in the merging process.

The framework was applied to four public datasets from diverse business areas and a case study in education using data from the University of Porto. Different model merging approaches were explored in each case, illustrating various process instantiations. The model merging process revealed that the resulting consensus models are frequently incomplete, meaning they cannot cover the entire decision space, which can undermine their intended purpose. To address the issue of incompleteness, two novel methodologies are explored: one relies on the generation of synthetic datasets followed by decision tree training. At the same time, the other uses a specialized algorithm designed to construct a decision tree directly from aggregated (i.e., symbolic) data.

The effectiveness of these methodologies in generating complete consensus models from incomplete rule sets is evaluated across the five datasets. Empirical results demonstrate the feasibility of overcoming the incompleteness issue, contributing to knowledge synthesis and decision tree modeling. However, tradeoffs were identified between completeness and interpretability, predictive performance, and the fidelity of consensus models.

Overall, this research addresses a critical gap in the literature by providing a comprehensive framework for synthesizing knowledge from multiple decision tree models, focusing on overcoming the challenge of incompleteness. The conclusions have implications for organizations seeking to use specialized models while maintaining a holistic understanding of the analyzed phenomenon.

Keywords: interpretability; rule-based models; model merging framework; decision trees; completeness.