PhD Defense in Informatics Engineering: ”Scaling-up organization of document sets to facilitate their analysis”

Candidate:
Rui Portocarrero Macedo de Morais Sarmento

Date, Time and Place:
July 24, 14:00, Sala de Atos DEGI (L202A), FEUP

President of the Jury:
Carlos Manuel Milheiro de Oliveira Pinto Soares, PhD, Associate Professor, Departamento de Engenharia Informática, Faculdade de Engenharia da Universidade do Porto.

Members:
José Fernando Ferreira Mendes, PhD, Full Professor, Departamento de Física, Universidade de Aveiro;
Bruno Emanuel da Graça Martins, PhD, Associate Professor, Departamento de Engenharia Electrotécnica e de Computadores, Instituto Superior Técnico da Universidade de Lisboa;
Pavel Bernard Brazdil, PhD, Emeritus Professor, Faculdade de Economia, Universidade do Porto (Co-Supervisor);
Henrique Daniel de Avelar Lopes Cardoso, PhD, Associate Professor, Departamento de Engenharia Informática, Faculdade de Engenharia da Universidade do Porto;
Sérgio Sobral Nunes, PhD, Associate Professor, Departamento de Engenharia Informática, Faculdade de Engenharia da Universidade do Porto.

The thesis was supervised by João Manuel Portela da Gama, PhD, Full Professor at Faculdade de Economia da Universidade do Porto.

Abstract:

“The summarization and organization of document production of an organization in an intuitive and scalable way for massive amounts of data is of great importance in supporting decision-making.

This thesis intends to develop a theoretical and practical study to solve these challenges. The contents of this thesis were born after developing a static software prototype to analyze and provide decision support from text documents and a network of authors of scientific documentation. Several advantages were proved from the use of this mentioned prototype. Nonetheless, there were some concerns regarding the prototype’s ability to cope with higher dimensional networks and also a massive amount of documents. The development case study considers the affinity between authors on a large scale and constantly evolving. The first challenge is to scale the representation methods of documents of the authors. The second challenge is to capture the temporal development of the organization. Considering this context, we developed and implemented streaming techniques for the characterization of each author and other sub-units of the organization. Thus, by integrating into affinity groups identified by keywords and relevance measures that characterize them. We have finished this work by testing several developed algorithms to minor the disadvantages of the original prototype and gathering a panoply of solutions for problems related to text streaming techniques, considering a large-scale approach for the corresponding analysis. Information Retrieval techniques were used, and the analysis of social networks and streaming data was necessary. We solved several associated issues with efficient text streams analysis, using several techniques from pure streams analysis techniques to evolving complex networks techniques. These techniques that served as a base to innovation and contribution with more than ten new algorithms proved to improve the prototype and solve the issues that initially drove us to improve and contribute to several related areas of text analysis and streams.”

keywords: Streaming; Text Mining; Social network Analysis; Social network Visualization.

Posted in Events, Highlights, News, PhD Defenses.