SCALE- AND RHYTHM-AWARE MUSICAL NOTE ESTIMATION FOR VOCAL F0 TRAJECTORIES BASED ON A SEMI-TATUM-SYNCHRONOUS HIERARCHICAL HIDDEN SEMI-MARKOV MODEL
Abstract
This paper presents a statistical method that estimates a musically-natural sequence of musical notes from a vocal F0 trajectory. Since the onset times and F0s of sung notes are considerably deviated from the tatums and pitches indicated in a musical score, a score model is crucial for improving time-frequency quantization of F0s. We therefore propose a hierarchical hidden semi-Markov model (HSMM) that combines a score model representing the rhythms and pitches of musical notes under musical scales with an F0 model representing the time-frequency deviations of F0s from the score. In the score model, musical scales are generated stochastically and note pitches are then generated according to the scales. To make rhythms, note onsets follow a Markov process defined on the tatum grid. In the F0 model, onset time deviations, smooth inter-note F0 transitions, and F0 fluctuations are added to the score stochastically. Given an F0 trajectory, we estimate the most likely sequence of musical notes while giving more importance on the score model than the F0 model. Experimental results showed that the proposed method outperformed an HMM-based method having no models of scales and rhythms.
You can get the latest version of our paper from here. [PDF]
Errata
Corrections from the submitted version are shown below.
Page | Location | Incorrect | Correct |
3 | Figure 4 | $k_{j-1}, k_{j}, s_{j-1}, s_{j}, h_{n-l_j}, h_{n}$ | $p_{j-1}, p_{j}, d_{j-1}, d_{j}, u_{n-l_j}, u_{n}$ |
1 | Figure 1 | Time deviation | Temporal deviation |
Demo
Example results of musical notes estimated from a ground-truth F0 trajectories by the proposed method and its variant without scale and rhythm constraints are shown. The input F0 trajectories and tatum times are obtained from the annotation data [1,2].