Candidate:
Tiago Filipe Mendes Neves
Date, Time and Location:
16 September 2025, 15h30, Sala de Atos, Faculdade de Engenharia da Universidade do Porto
President of the Jury:
Pedro Nuno Ferreira da Rosa da Cruz Diniz (PhD), Full Professor, Department of Informatics Engineering, Faculdade de Engenharia da Universidade do Porto
Members:
Keisuke Fujii (PhD), Associate Professor, Department of Intelligent Systems, Graduate School of Informatics of the Nagoya University, Japan;
Jesse Jon Davis (PhD), Full Professor, Department of Computer Science, Faculty of Engineering Science, Katholieke Universiteit Leuven, Belgium;
Luís Paulo Gonçalves dos Reis (PhD), Associate Professor with Habilitation, Departament of Informatics Engineering, Faculdade de Engenharia da Universidade do Porto;
João Pedro Carvalho Leal Mendes Moreira (PhD), Associate Professor, Departament of Informatics Engineering, Faculdade de Engenharia da Universidade do Porto (Supervisor).
The thesis was co-supervised by Luís Jorge Machado da Cunha Meireles (PhD), Senior Psychologist & Data Scientist, FC Porto.
Abstract:
Self-supervised large models that disrupt domains such as language, vision, and biology are transforming the world. However, these generative models that learn the underlying data distribution do not perform at the same level on all tasks. For example, Large Language Models (LLMs) do not yet have concrete applicability in soccer analytics. The models lack reasoning capabilities to provide concrete and actionable insights that can compete with the wide range of case-specific metrics within soccer analytics. While there have been some studies exploring the applicability of generative models in soccer, no study aimed for the moonshot of building a complete self-supervised learning model for soccer event data. Let’s consider the individual events (each shot, pass, tackle, …) in a soccer match the “words” that describe what is happening. We can consider each possession a “sentence,” each game an “essay,” and event data as a whole a “language.” By working within this framework, we have all the tools to build a self-supervised model in the same image as LLMs. The goal of this thesis is to build a foundation self-supervised model for soccer event data – termed Large Events Model (LEM) – and demonstrate its real-world applicability and generality in solving a wide range of tasks, such as simulation and modeling, that would otherwise require multiple different approaches. We propose three approaches to building LEMs: a chain of classifiers, causal mask modeling, and sequential language modeling with transformers. First, the chain of classifiers provides the first generative model that models all aspects of event data without posing restrictions on event types, reaching a level of performance that allows large-scale simulation of soccer matches. Then, we investigate two alternative approaches to remove some of the constraints of the first approach. The causal mask modeling approach using multilayer perceptrons reaches the state-of-the-art performance of several of our proposed benchmarks, providing a set of application-ready models to solve a wide range of soccer analytics tasks. We explore a wide range of applications, from automated strategy search with reinforcement learning to risk-reward behaviors of soccer players. More than a dozen use cases for LEMs are present in this thesis. The implications of our work are far-reaching. LEMs have the potential to become the operating system for event data in soccer analytics. They will transform the way clubs work, with easier access to machine learning models that would otherwise require tremendous modeling effort. With LEMs, the barrier to entry will lower significantly as any club in the world can access a model capable of solving its most relevant problems.
Keywords: generative models; foundation models; sports analytics; deep learning applications; simulation; soccer.





