Music and song recognition is an activity of wide interest for researchers and companies due to the intrinsic challenges and the possible economic profits it can give. Despite basic algorithms about song recognition are simple in principle, it is quite difficult to obtain an efficient and robust approach able to generate an effective algorithm for identifying songs on the fly. This statement is proved by the fact that there are very few companies in the world having their core business into this field, even if the potential market is very huge. In this paper, we propose a new approach for generating fingerprints from excerpts of songs that is the first step in implementing a complete algorithm of song recognition. Their generation is based on the Welch’s method for spectral density estimation, the use of a Mel filter bank and an exponential adaptive threshold curve in the frequency domain never used before. Even if the previous techniques are not new, at the best of our knowledge they are not used all together for fingerprint generation. Our main purpose is to show that the proposed fingerprint generation approach permits to obtain a very high accuracy in recognizing pieces of song and their position inside the song, as well as it appears robust compared to typical alteration of the audio signal. Specifically, the fingerprints we generate are highly insensitive to noise and audio lossy compression algorithms; moreover, we think the method is prone also to generate pitch insensitive fingerprints with a small modification. We show through an experimentation with a large database of songs the recognition accuracy obtained with our fingerprints is better than the landmark-based approach (already used by the famous Shazam application). This is not a negligible results because even small improvements means a very large number of more recognitions, with higher profit prospects in industrial applications. In order to better focus on the fingerprint structure and its generation algorithm, we don’t discuss any specific search algorithm, that is a subject of further work, and we use a linear search only in our experiments; in such a way, we think the goodness of the fingerprint as such is better evinced.

A new fingerprint definition for effective song recognition

Serrano S.
;
Sahbudin M. A. B.;Chaouch C.;Scarpa M.
2022-01-01

Abstract

Music and song recognition is an activity of wide interest for researchers and companies due to the intrinsic challenges and the possible economic profits it can give. Despite basic algorithms about song recognition are simple in principle, it is quite difficult to obtain an efficient and robust approach able to generate an effective algorithm for identifying songs on the fly. This statement is proved by the fact that there are very few companies in the world having their core business into this field, even if the potential market is very huge. In this paper, we propose a new approach for generating fingerprints from excerpts of songs that is the first step in implementing a complete algorithm of song recognition. Their generation is based on the Welch’s method for spectral density estimation, the use of a Mel filter bank and an exponential adaptive threshold curve in the frequency domain never used before. Even if the previous techniques are not new, at the best of our knowledge they are not used all together for fingerprint generation. Our main purpose is to show that the proposed fingerprint generation approach permits to obtain a very high accuracy in recognizing pieces of song and their position inside the song, as well as it appears robust compared to typical alteration of the audio signal. Specifically, the fingerprints we generate are highly insensitive to noise and audio lossy compression algorithms; moreover, we think the method is prone also to generate pitch insensitive fingerprints with a small modification. We show through an experimentation with a large database of songs the recognition accuracy obtained with our fingerprints is better than the landmark-based approach (already used by the famous Shazam application). This is not a negligible results because even small improvements means a very large number of more recognitions, with higher profit prospects in industrial applications. In order to better focus on the fingerprint structure and its generation algorithm, we don’t discuss any specific search algorithm, that is a subject of further work, and we use a linear search only in our experiments; in such a way, we think the goodness of the fingerprint as such is better evinced.
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11570/3236490
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 1
social impact