Audio recognition techniques: signal processing approaches with secure cloud storage

Sahbudin, MURTADHA ARIF BIN

In this thesis, we identify an audio identification task through audio signal processing for extracting fingerprinting, emphasizing the complex task of designing a highly robust system for the signal from Frequency Modulation (FM) Radio Broadcasting. We create a system capable of retrieving analog signals from the radio channel with the proposed Internet of Things (IoT) and Application Programming Interface (API), in conjunction with creating models of database clustering. To complement the rich number of suggested methods and research in this field, we encourage the design of yet another fingerprinting method. The challenges and research problems in an audio recognition system in regular use can vary from many different aspects. Significant aspects are near similarity of the different original audio, in-variance to noises and spectral or temporal distortions, a minimum length of song track needed for identification, retrieval speed, and computing load. The study proposes a novel, efficient, highly accurate, and precise fingerprinting through the Short Time Power Spectral Density (ST-PSD) method. We propose matching features using an efficient hamming distance search on a binary type fingerprint and subsequently integrating a verification stage for match hypotheses to maintain high precision and specificity on challenging datasets. We gradually refine this method from its early concept by introducing a new component such as the Mel frequency bank filter and progressive probability evaluation score. Our proposed ST-PSD based fingerprint extraction technique and improvements can recognize an audio piece of music with an accuracy close to 100%. Despite the white noise of range 5dB, 10dB, 15dB, and 20dB in the sample query, it still outperformed other established methods. Moreover, an API integrated into a smartphone app is also included in the research. To bridge that gap in evaluation methodology, we are comparing against the commercial application, such as the Landmark-based approach, to support the research community in investigating and evaluating the particular strengths and weaknesses of the proposed systems. We make use of this dataset in this thesis for an extensive evaluation of our method. Finally, we show the possibility of establishing a sequence detection program on top of the fingerprinter to allow long query recordings to be monitored for either interactive analysis or fully automated reporting of results. Finally, this research introduces a database storage framework that addresses data confidentiality problems when multi-cloud storage services are used. The evaluation results prove how practical the approach is in real-time usage. Furthermore, the framework is intended to be used for fingerprint collection or even to provide a new platform for users to choose their preferred multi-cloud storage services such as Google Drive, DropBox, and OpenStack to have flexibility and security at the same time.