A spectrogram-based audio fingerprinting system for content-based copy detection

Chahid Ouali; Pierre Dumouchel; Vishwa Gupta

doi:10.1007/s11042-015-3081-8

A spectrogram-based audio fingerprinting system for content-based copy detection

Chahid Ouali
, Pierre Dumouchel
, Vishwa Gupta

École de technologie supérieure

Résultats de recherche: Contribution à un journal › Article publié dans une revue, révisé par les pairs › Revue par des pairs

8 Citations (Scopus)

Résumé

This paper presents a novel audio fingerprinting method that is highly robust to a variety of audio distortions. It is based on an unconventional audio fingerprint generation scheme. The robustness is achieved by generating different versions of the spectrogram matrix of the audio signal by using a threshold based on the average of the spectral values to prune this matrix. We transform each version of this pruned spectrogram matrix into a 2-D binary image. Multiple versions of these 2-D images suppress noise to a varying degree. This varying degree of noise suppression improves likelihood of one of the images matching a reference image. To speed up matching, we convert each image into an n-dimensional vector, and perform a nearest neighbor search based on this n-dimensional vector. We give results with two different feature parameters and their combination. We test this method on TRECVID 2010 content-based copy detection evaluation dataset, and we validate the performance on TRECVID 2009 dataset also. Experimental results show the effectiveness of these features even when the audio is distorted. We compare the proposed method to two state-of-the-art audio copy detection systems, namely NN-based and Shazam systems. Our method by far outperforms Shazam system for all audio transformations (or distortions) in terms of detection performance, number of missed queries and localization accuracy. Compared to NN-based system, our approach reduces minimal Normalized Detection Cost Rate (min NDCR) by 23 % and improves localization accuracy by 24 %.

langue originale	Anglais
Pages (de - à)	9145-9165
Nombre de pages	21
journal	Multimedia Tools and Applications
Volume	75
Numéro de publication	15
Les DOIs	https://doi.org/10.1007/s11042-015-3081-8
état	Publié - 1 août 2016

Accès au document

10.1007/s11042-015-3081-8

Autres fichiers et liens

Lien vers la publication dans Scopus

Empreinte digitale

Voici les principaux termes ou expressions associés à « A spectrogram-based audio fingerprinting system for content-based copy detection ». Ces libellés thématiques sont générés à partir du titre et du résumé de la publication. Ensemble, ils forment une empreinte digitale unique.

Contient cette citation

@article{dac0933c2dab4fbca86f888858de6b91,

title = "A spectrogram-based audio fingerprinting system for content-based copy detection",

abstract = "This paper presents a novel audio fingerprinting method that is highly robust to a variety of audio distortions. It is based on an unconventional audio fingerprint generation scheme. The robustness is achieved by generating different versions of the spectrogram matrix of the audio signal by using a threshold based on the average of the spectral values to prune this matrix. We transform each version of this pruned spectrogram matrix into a 2-D binary image. Multiple versions of these 2-D images suppress noise to a varying degree. This varying degree of noise suppression improves likelihood of one of the images matching a reference image. To speed up matching, we convert each image into an n-dimensional vector, and perform a nearest neighbor search based on this n-dimensional vector. We give results with two different feature parameters and their combination. We test this method on TRECVID 2010 content-based copy detection evaluation dataset, and we validate the performance on TRECVID 2009 dataset also. Experimental results show the effectiveness of these features even when the audio is distorted. We compare the proposed method to two state-of-the-art audio copy detection systems, namely NN-based and Shazam systems. Our method by far outperforms Shazam system for all audio transformations (or distortions) in terms of detection performance, number of missed queries and localization accuracy. Compared to NN-based system, our approach reduces minimal Normalized Detection Cost Rate (min NDCR) by 23 \% and improves localization accuracy by 24 \%.",

keywords = "Audio fingerprints, Content-based copy detection, Feature parameters, Spectrogram, TRECVID",

author = "Chahid Ouali and Pierre Dumouchel and Vishwa Gupta",

note = "Publisher Copyright: {\textcopyright} 2015, Springer Science+Business Media New York.",

year = "2016",

month = aug,

day = "1",

doi = "10.1007/s11042-015-3081-8",

language = "English",

volume = "75",

pages = "9145--9165",

journal = "Multimedia Tools and Applications",

issn = "1380-7501",

publisher = "Springer",

number = "15",

}

TY - JOUR

T1 - A spectrogram-based audio fingerprinting system for content-based copy detection

AU - Ouali, Chahid

AU - Dumouchel, Pierre

AU - Gupta, Vishwa

PY - 2016/8/1

Y1 - 2016/8/1

N2 - This paper presents a novel audio fingerprinting method that is highly robust to a variety of audio distortions. It is based on an unconventional audio fingerprint generation scheme. The robustness is achieved by generating different versions of the spectrogram matrix of the audio signal by using a threshold based on the average of the spectral values to prune this matrix. We transform each version of this pruned spectrogram matrix into a 2-D binary image. Multiple versions of these 2-D images suppress noise to a varying degree. This varying degree of noise suppression improves likelihood of one of the images matching a reference image. To speed up matching, we convert each image into an n-dimensional vector, and perform a nearest neighbor search based on this n-dimensional vector. We give results with two different feature parameters and their combination. We test this method on TRECVID 2010 content-based copy detection evaluation dataset, and we validate the performance on TRECVID 2009 dataset also. Experimental results show the effectiveness of these features even when the audio is distorted. We compare the proposed method to two state-of-the-art audio copy detection systems, namely NN-based and Shazam systems. Our method by far outperforms Shazam system for all audio transformations (or distortions) in terms of detection performance, number of missed queries and localization accuracy. Compared to NN-based system, our approach reduces minimal Normalized Detection Cost Rate (min NDCR) by 23 % and improves localization accuracy by 24 %.

AB - This paper presents a novel audio fingerprinting method that is highly robust to a variety of audio distortions. It is based on an unconventional audio fingerprint generation scheme. The robustness is achieved by generating different versions of the spectrogram matrix of the audio signal by using a threshold based on the average of the spectral values to prune this matrix. We transform each version of this pruned spectrogram matrix into a 2-D binary image. Multiple versions of these 2-D images suppress noise to a varying degree. This varying degree of noise suppression improves likelihood of one of the images matching a reference image. To speed up matching, we convert each image into an n-dimensional vector, and perform a nearest neighbor search based on this n-dimensional vector. We give results with two different feature parameters and their combination. We test this method on TRECVID 2010 content-based copy detection evaluation dataset, and we validate the performance on TRECVID 2009 dataset also. Experimental results show the effectiveness of these features even when the audio is distorted. We compare the proposed method to two state-of-the-art audio copy detection systems, namely NN-based and Shazam systems. Our method by far outperforms Shazam system for all audio transformations (or distortions) in terms of detection performance, number of missed queries and localization accuracy. Compared to NN-based system, our approach reduces minimal Normalized Detection Cost Rate (min NDCR) by 23 % and improves localization accuracy by 24 %.

KW - Audio fingerprints

KW - Content-based copy detection

KW - Feature parameters

KW - Spectrogram

KW - TRECVID

UR - https://www.scopus.com/pages/publications/84947707961

U2 - 10.1007/s11042-015-3081-8

DO - 10.1007/s11042-015-3081-8

M3 - Journal Article

AN - SCOPUS:84947707961

SN - 1380-7501

VL - 75

SP - 9145

EP - 9165

JO - Multimedia Tools and Applications

JF - Multimedia Tools and Applications

IS - 15

ER -