PhD Defense in Digital Media: ”Towards Human-in-the-Loop Computational Rhythm Analysis in Challenging Musical Conditions”

António Humberto e Sá Pinto

Date, Time and Place:
September 8, 14:30, Sala de Atos FEUP

President of the Jury:
António Fernando Vasconcelos Cunha Castro Coelho, PhD, Associate Professor with Habilitation, Faculdade de Engenharia da Universidade do Porto;

Magdalena Fuentes, PhD, Assistant Professor, Music and Audio Research Lab (MARL) and Integrated Design & Media (IDM), New York University (NYU);
Jason Hockman, PhD, Associate Professor, School of Computing and Digital Technology (DMT), Birmingham City University (UK);
Matthew Edward Price Davies, PhD, Senior Scientist,  SiriusXM/Pandora (USA) – (Supervisor);
Rui Pedro da Silva Nóbrega, PhD, Assistant Professor, Departamento de Informática, Faculdade de Ciências e Tecnologia da Universidade Nova de Lisboa;
Aníbal João de Sousa Ferreira, Associate Professor, Departamento de Engenharia Eletrotécnica e de Computadores, Faculdade de Engenharia da Universidade do Porto.

The thesis was co-supervised by Prof Rui Luís Nogueira Penha, Coordinating Professor at ESMAE, and Prof Gilberto Bernardes de Almeida, Assistant Professor at FEUP.


“Music Information Retrieval (MIR) is an interdisciplinary field focused on the extraction, analysis, and processing of information from various musical representations.
Grounded on the automatic analysis of musical facets such as rhythm, melody, harmony, and timbre, MIR enables applications in areas like music recommendation, automated music transcription, and intelligent music composition tools. Rhythm, an integral element of music, provides a foundation for decoding music’s complex relational structures and layered depth. Computational rhythm analysis is thus central to MIR research. It encompasses a wide range of tasks, such as the pivotal beat tracking, which unlocks the use of musical time across many MIR systems. However, conventional beat-tracking methods have struggled when dealing with complex musical features, such as expressive timing or intricate rhythmic patterns. While specialized approaches demonstrate some degree of adaptation, they do not generalise to diverse scenarios. Deep learning methods, while promising in addressing these issues, depend heavily on the availability of substantial annotated data. In scenarios requiring adaptation to user subjectivity, or where acquiring annotated data is challenging, the efficacy of beat-tracking methods lowers, thus leaving a gap in the applicability of computational rhythm analysis methods. This thesis investigates how user-provided information can enhance computational rhythm analysis in challenging musical conditions. It initiates the exploration of human-in-the-loop strategies with the aim of fostering adaptability of current MIR techniques. By focusing on beat tracking, due to its fundamental role in rhythm analysis, our goal is to develop streamlined solutions for cases where even the most advanced methods fall short. This is achieved by utilising both high-level and low-level user inputs —- namely, the user’s judgement regarding the expressiveness of the musical piece and annotations of a brief excerpt —- to adapt the state of the art to abstract particularly demanding signals. In an exploratory study, we validate the shared perception of rhythmic complexity among users as a proxy for musical expressiveness, and consequently as a key performance enhancer for beat tracking. Building upon this, we examine how highlevel user information can reparameterise a leading-edge beat-tracker, augmenting its performance to highly expressive music. We then propose a transfer learning method that finetunes the current state of the art, hereafter referred to as the baseline, to a concise user-annotated region. This method exhibits versatility across varied musical styles and offers potential solutions to the inherent limitations of previous approaches. Incorporating both user-guided contextualisation and transfer learning into a human-in-the-loop workflow, we undertake a comprehensive evaluation of our adaptive techniques. This includes examining the key customisation options available to users and their effect on performance enhancement. Our approach outperforms the current state of the art, particularly in the challenging musical content of the SMC dataset, with an improvement over the baseline F-measure of almost 10 percentage points (corresponding to over 16%). However, these quantitative improvements require further interpretation due to the inherent differences between our file-specific, human-in-the-loop technique and traditional dataset-wide methods, which operate without prior exposure to specific file characteristics. With the aim of advancing towards a user-centric evaluation framework for beat tracking, we introduce two novel metrics: the E-Measure and Annotation Efficiency. These metrics account for the user perspective regarding the annotation and finetuning process. The E-Measure is a variant of the F-measure focused on the annotation correction workflow and includes a shifting operation over a larger tolerance window. The Ae is defined as the relative (to the baseline) decrease in correction operations enabled by the fine-tuning process, normalised by the number of user annotations. Specifically, we probe the theoretical upper bound of beat tracking accuracy improvement over the SMC dataset. Our results show that the correct beat estimates provided by our approach surpass those of the state of the art by more than 20%. When considering the full length of the files, we can further frame this improvement in terms of gain per unit of user effort, quantifying the annotation efficiency of our approach. This is reflected in the substantial reduction of required corrections, with nearly 2/3 fewer corrections per user annotation compared to the baseline. In the final phase, we evaluate our human-in-the-loop strategy’s adaptability across a range of musical genres and instances presenting significant challenges. Our exploration extends to various rhythm tasks, including beat tracking, onset detection, and (indirectly) metre analysis. We apply this user-driven strategy to three unique genres with complex rhythm structures, such as polyrhythms, polymetres, and polytempi. Our approach exhibits swift adaptability, enabling efficient utilisation of the state-of-the-art method while bypassing the need for extensive retraining. This results in a balanced integration of data-driven and user-centric methods into a practical and streamlined solution.”

Keywords: Music Information Retrieval; User-centric; Transfer Learning; Beat Tracking.

Posted in Highlights, News, PhD Defenses.