REA Press

Null

REA Press

3042-01803042-0180

REA Press

https://doi.org/10.22105/scfa.v1i2.35

Research Article

Deep learning, Support vector machine, Random forest, Artificial neural network, Convolutional neural network, Mel-frequency cepstrum coefficients, Librosa.

Deep Audio Classifier: An Artificial Neural Network Approach

Yadav

Abhishek

Kalinga Institute of Industrial Technology (KIIT) University, Bhubaneswar, Odisha, India. Raj

Abhishek

Kalinga Institute of Industrial Technology (KIIT) University, Bhubaneswar, Odisha, India. Anand

Sankalp

Kalinga Institute of Industrial Technology (KIIT) University, Bhubaneswar, Odisha, India. Kumar

Vineet

Kalinga Institute of Industrial Technology (KIIT) University, Bhubaneswar, Odisha, India. Kumar

Abhay

Kalinga Institute of Industrial Technology (KIIT) University, Bhubaneswar, Odisha, India.

06 2024

07 06 2024

1 2

2024

This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Deep Audio Classifier: An Artificial Neural Network Approach

This research centers on developing a deep audio classifier by examining several machine learning and deep learning algorithms, such as Support Vector Machines (SVMs), Random Forest (RF), Artificial Neural Networks (ANNs), and Convolutional Neural Networks (CNNs). The models were trained and evaluated using the UrbanSound8K dataset. The objective of this study is to create strong models that can effectively classify intricate urban sound environments. The audio samples went through comprehensive preprocessing steps, including noise reduction, normalization, and trimming to maintain consistent sample duration. Feature extraction was conducted using Mel-Frequency Cepstral Coefficients (MFCCs). The ANN model, which consists of dense layers tailored for feature learning and utilizes softmax activation for multi-class classification, obtained a classification accuracy of 80.20%. The SVM and RF models achieved accuracies of 82.34% and 84.90%, respectively, using linear and ensemble methodologies. The CNN model surpassed the others with an accuracy of 88.45%, showcasing its ability to capture spatial hierarchies and localized patterns within audio data. Model performance differed by class, demonstrating high precision in recognizing specific sounds such as car horns and gunshots. The research ends with recommendations for future efforts, such as utilizing sophisticated data augmentation methods, investigating hybrid models, and conducting more extensive hyperparameter tuning to enhance classification accuracy and adaptability in practical urban settings.

Null