Research

23 pages, 3313 KiB

Open AccessArticle

IoT System for Detecting the Condition of Rotating Machines Based on Acoustic Signals

by Milutin Radonjić, Sanja Vujnović, Aleksandra Krstić and Žarko Zečević

Appl. Sci. 2022, 12(9), 4385; https://0-doi-org.brum.beds.ac.uk/10.3390/app12094385 - 26 Apr 2022

Cited by 5 | Viewed by 2139

Modern predictive maintenance techniques have been significantly improved with the development of Industrial Internet of Things solutions which have enabled easier collection and analysis of various data. Artificial intelligence-based algorithms in combination with modular interconnected architecture of sensors, devices and servers, have resulted [...] Read more.

Modern predictive maintenance techniques have been significantly improved with the development of Industrial Internet of Things solutions which have enabled easier collection and analysis of various data. Artificial intelligence-based algorithms in combination with modular interconnected architecture of sensors, devices and servers, have resulted in the development of intelligent maintenance systems which outperform most traditional machine maintenance approaches. In this paper, a novel acoustic-based IoT system for condition detection of rotating machines is proposed. The IoT device designed for this purpose is mobile and inexpensive and the algorithm developed for condition detection consists of a combination of discrete wavelet transform and neural networks, while a genetic algorithm is used to tune the necessary hyperparameters. The performance of this system has been tested in a real industrial setting, on different rotating machines, in an environment with strong acoustic pollution. The results show high accuracy of the algorithm, with an average F1 score of around 0.99 with tuned hyperparameters. Full article

(This article belongs to the Special Issue Applications of Machine Learning in Audio Classification and Acoustic Scene Characterization)

► Show Figures

Figure 1

16 pages, 1023 KiB

Open AccessArticle

You Only Hear Once: A YOLO-like Algorithm for Audio Segmentation and Sound Event Detection

by Satvik Venkatesh, David Moffat and Eduardo Reck Miranda

Appl. Sci. 2022, 12(7), 3293; https://0-doi-org.brum.beds.ac.uk/10.3390/app12073293 - 24 Mar 2022

Cited by 13 | Viewed by 7127

Abstract

Audio segmentation and sound event detection are crucial topics in machine listening that aim to detect acoustic classes and their respective boundaries. It is useful for audio-content analysis, speech recognition, audio-indexing, and music information retrieval. In recent years, most research articles adopt segmentation-by-classification. [...] Read more.

Audio segmentation and sound event detection are crucial topics in machine listening that aim to detect acoustic classes and their respective boundaries. It is useful for audio-content analysis, speech recognition, audio-indexing, and music information retrieval. In recent years, most research articles adopt segmentation-by-classification. This technique divides audio into small frames and individually performs classification on these frames. In this paper, we present a novel approach called You Only Hear Once (YOHO), which is inspired by the YOLO algorithm popularly adopted in Computer Vision. We convert the detection of acoustic boundaries into a regression problem instead of frame-based classification. This is done by having separate output neurons to detect the presence of an audio class and predict its start and end points. The relative improvement for F-measure of YOHO, compared to the state-of-the-art Convolutional Recurrent Neural Network, ranged from 1% to 6% across multiple datasets for audio segmentation and sound event detection. As the output of YOHO is more end-to-end and has fewer neurons to predict, the speed of inference is at least 6 times faster than segmentation-by-classification. In addition, as this approach predicts acoustic boundaries directly, the post-processing and smoothing is about 7 times faster. Full article

(This article belongs to the Special Issue Applications of Machine Learning in Audio Classification and Acoustic Scene Characterization)

► Show Figures

Figure 1

23 pages, 20456 KiB

Open AccessArticle

Spatial Audio Scene Characterization (SASC): Automatic Localization of Front-, Back-, Up-, and Down-Positioned Music Ensembles in Binaural Recordings

by Sławomir K. Zieliński, Paweł Antoniuk and Hyunkook Lee

Appl. Sci. 2022, 12(3), 1569; https://0-doi-org.brum.beds.ac.uk/10.3390/app12031569 - 01 Feb 2022

Viewed by 1606

Abstract

The automatic localization of audio sources distributed symmetrically with respect to coronal or transverse planes using binaural signals still poses a challenging task, due to the front–back and up–down confusion effects. This paper demonstrates that the convolutional neural network (CNN) can be used [...] Read more.

The automatic localization of audio sources distributed symmetrically with respect to coronal or transverse planes using binaural signals still poses a challenging task, due to the front–back and up–down confusion effects. This paper demonstrates that the convolutional neural network (CNN) can be used to automatically localize music ensembles panned to the front, back, up, or down positions. The network was developed using the repository of the binaural excerpts obtained by the convolution of multi-track music recordings with the selected sets of head-related transfer functions (HRTFs). They were generated in such a way that a music ensemble (of circular shape in terms of its boundaries) was positioned in one of the following four locations with respect to the listener: front, back, up, and down. According to the obtained results, CNN identified the location of the ensembles with the average accuracy levels of 90.7% and 71.4% when tested under the HRTF-dependent and HRTF-independent conditions, respectively. For HRTF-dependent tests, the accuracy decreased monotonically with the increase in the ensemble size. A modified image occlusion sensitivity technique revealed selected frequency bands as being particularly important in terms of the localization process. These frequency bands are largely in accordance with the psychoacoustical literature. Full article

(This article belongs to the Special Issue Applications of Machine Learning in Audio Classification and Acoustic Scene Characterization)

► Show Figures

Figure 1

14 pages, 2411 KiB

Open AccessArticle

Sound Source Separation Mechanisms of Different Deep Networks Explained from the Perspective of Auditory Perception

by Han Li, Kean Chen, Lei Wang, Jianben Liu, Baoquan Wan and Bing Zhou

Appl. Sci. 2022, 12(2), 832; https://0-doi-org.brum.beds.ac.uk/10.3390/app12020832 - 14 Jan 2022

Cited by 6 | Viewed by 2583

Abstract

Thanks to the development of deep learning, various sound source separation networks have been proposed and made significant progress. However, the study on the underlying separation mechanisms is still in its infancy. In this study, deep networks are explained from the perspective of [...] Read more.

Thanks to the development of deep learning, various sound source separation networks have been proposed and made significant progress. However, the study on the underlying separation mechanisms is still in its infancy. In this study, deep networks are explained from the perspective of auditory perception mechanisms. For separating two arbitrary sound sources from monaural recordings, three different networks with different parameters are trained and achieve excellent performances. The networks’ output can obtain an average scale-invariant signal-to-distortion ratio improvement (SI-SDRi) higher than 10 dB, comparable with the human performance to separate natural sources. More importantly, the most intuitive principle—proximity—is explored through simultaneous and sequential organization experiments. Results show that regardless of network structures and parameters, the proximity principle is learned spontaneously by all networks. If components are proximate in frequency or time, they are not easily separated by networks. Moreover, the frequency resolution at low frequencies is better than at high frequencies. These behavior characteristics of all three networks are highly consistent with those of the human auditory system, which implies that the learned proximity principle is not accidental, but the optimal strategy selected by networks and humans when facing the same task. The emergence of the auditory-like separation mechanisms provides the possibility to develop a universal system that can be adapted to all sources and scenes. Full article

(This article belongs to the Special Issue Applications of Machine Learning in Audio Classification and Acoustic Scene Characterization)

► Show Figures

Figure 1

20 pages, 7836 KiB

Open AccessArticle

Characterization of Sonic Events Present in Natural-Urban Hybrid Habitats Using UMAP and SEDnet: The Case of the Urban Wetlands

by Víctor Poblete, Diego Espejo, Víctor Vargas, Felipe Otondo and Pablo Huijse

Appl. Sci. 2021, 11(17), 8175; https://0-doi-org.brum.beds.ac.uk/10.3390/app11178175 - 03 Sep 2021

Cited by 5 | Viewed by 2482

Abstract

We investigated whether the use of technological tools can effectively help in manipulating the increasing volume of audio data available through the use of long field recordings. We also explored whether we can address, by using these recordings and tools, audio data analysis, [...] Read more.

We investigated whether the use of technological tools can effectively help in manipulating the increasing volume of audio data available through the use of long field recordings. We also explored whether we can address, by using these recordings and tools, audio data analysis, feature extraction and determine predominant patterns in the data. Similarly, we explored whether we can visualize feature clusters in the data and automatically detect sonic events. Our focus was primarily on enhancing the importance of natural-urban hybrid habitats within cities, which benefit communities in various ways, specifically through the natural soundscapes of these habitats that evoke memories and reinforce a sense of belonging for inhabitants. The loss of sonic heritage can be a precursor to the extinction of biodiversity within these habitats. By quantifying changes in the soundscape of these habitats over long periods of time, we can collect relevant information linked to this eventual loss. In this respect, we developed two approaches. The first was the comparison among habitats that progressively changed from natural to urban. The second was the optimization of the field recordings’ labeling process. This was performed with labels corresponding to the annotations of classes of sonic events and their respective start and end times, including events temporarily superimposed on one another. We compared three habitats over time by using their sonic characteristics collected in field conditions. Comparisons of sonic similarity or dissimilarity among patches were made based on the Jaccard coefficient and uniform manifold approximation and projection (UMAP). Our SEDnet model achieves a F1-score of 0.79 with error rate 0.377 and with the area under PSD-ROC curve of 71.0. In terms of computational efficiency, the model is able to detect sound events from an audio file in a time of 14.49 s. With these results, we confirm the usefulness of the methods used in this work for the process of labeling field recordings. Full article

(This article belongs to the Special Issue Applications of Machine Learning in Audio Classification and Acoustic Scene Characterization)

► Show Figures

Figure 1

14 pages, 1938 KiB

Open AccessArticle

Modelling the Microphone-Related Timbral Brightness of Recorded Signals

by Andy Pearce, Tim Brookes and Russell Mason

Appl. Sci. 2021, 11(14), 6461; https://0-doi-org.brum.beds.ac.uk/10.3390/app11146461 - 13 Jul 2021

Cited by 1 | Viewed by 1523

Abstract

Brightness is one of the most common timbral descriptors used for searching audio databases, and is also the timbral attribute of recorded sound that is most affected by microphone choice, making a brightness prediction model desirable for automatic metadata generation. A model, sensitive [...] Read more.

Brightness is one of the most common timbral descriptors used for searching audio databases, and is also the timbral attribute of recorded sound that is most affected by microphone choice, making a brightness prediction model desirable for automatic metadata generation. A model, sensitive to microphone-related as well as source-related brightness, was developed based on a novel combination of the spectral centroid and the ratio of the total magnitude of the signal above 500 Hz to that of the full signal. This model performed well on training data (r = 0.922). Validating it on new data showed a slight gradient error but good linear correlation across source types and overall (r = 0.955). On both training and validation data, the new model out-performed metrics previously used for brightness prediction. Full article

(This article belongs to the Special Issue Applications of Machine Learning in Audio Classification and Acoustic Scene Characterization)

► Show Figures

Figure 1

18 pages, 1634 KiB

Open AccessArticle

An Ensemble of Convolutional Neural Networks for Audio Classification

by Loris Nanni, Gianluca Maguolo, Sheryl Brahnam and Michelangelo Paci

Appl. Sci. 2021, 11(13), 5796; https://0-doi-org.brum.beds.ac.uk/10.3390/app11135796 - 22 Jun 2021

Cited by 48 | Viewed by 5482

Abstract

Research in sound classification and recognition is rapidly advancing in the field of pattern recognition. One important area in this field is environmental sound recognition, whether it concerns the identification of endangered species in different habitats or the type of interfering noise in [...] Read more.

Research in sound classification and recognition is rapidly advancing in the field of pattern recognition. One important area in this field is environmental sound recognition, whether it concerns the identification of endangered species in different habitats or the type of interfering noise in urban environments. Since environmental audio datasets are often limited in size, a robust model able to perform well across different datasets is of strong research interest. In this paper, ensembles of classifiers are combined that exploit six data augmentation techniques and four signal representations for retraining five pre-trained convolutional neural networks (CNNs); these ensembles are tested on three freely available environmental audio benchmark datasets: (i) bird calls, (ii) cat sounds, and (iii) the Environmental Sound Classification (ESC-50) database for identifying sources of noise in environments. To the best of our knowledge, this is the most extensive study investigating ensembles of CNNs for audio classification. The best-performing ensembles are compared and shown to either outperform or perform comparatively to the best methods reported in the literature on these datasets, including on the challenging ESC-50 dataset. We obtained a 97% accuracy on the bird dataset, 90.51% on the cat dataset, and 88.65% on ESC-50 using different approaches. In addition, the same ensemble model trained on the three datasets managed to reach the same results on the bird and cat datasets while losing only 0.1% on ESC-50. Thus, we have managed to create an off-the-shelf ensemble that can be trained on different datasets and reach performances competitive with the state of the art. Full article

(This article belongs to the Special Issue Applications of Machine Learning in Audio Classification and Acoustic Scene Characterization)

► Show Figures

Figure 1

20 pages, 7807 KiB

Open AccessArticle

A Biologically Inspired Sound Localisation System Using a Silicon Cochlea Pair

by Ying Xu, Saeed Afshar, Runchun Wang, Gregory Cohen, Chetan Singh Thakur, Tara Julia Hamilton and André van Schaik

Appl. Sci. 2021, 11(4), 1519; https://0-doi-org.brum.beds.ac.uk/10.3390/app11041519 - 08 Feb 2021

Cited by 4 | Viewed by 2177

Abstract

We present a biologically inspired sound localisation system for reverberant environments using the Cascade of Asymmetric Resonators with Fast-Acting Compression (CAR-FAC) cochlear model. The system exploits a CAR-FAC pair to pre-process binaural signals that travel through the inherent delay line of the cascade [...] Read more.

We present a biologically inspired sound localisation system for reverberant environments using the Cascade of Asymmetric Resonators with Fast-Acting Compression (CAR-FAC) cochlear model. The system exploits a CAR-FAC pair to pre-process binaural signals that travel through the inherent delay line of the cascade structures, as each filter acts as a delay unit. Following the filtering, each cochlear channel is cross-correlated with all the channels of the other cochlea using a quantised instantaneous correlation function to form a 2-D instantaneous correlation matrix (correlogram). The correlogram contains both interaural time difference and spectral information. The generated correlograms are analysed using a regression neural network for localisation. We investigate the effect of the CAR-FAC nonlinearity on the system performance by comparing it with a CAR only version. To verify that the CAR/CAR-FAC and the quantised instantaneous correlation provide a suitable basis with which to perform sound localisation tasks, a linear regression, an extreme learning machine, and a convolutional neural network are trained to learn the azimuthal angle of the sound source from the correlogram. The system is evaluated using speech data recorded in a reverberant environment. We compare the performance of the linear CAR and nonlinear CAR-FAC models with current sound localisation systems as well as with human performance. Full article

(This article belongs to the Special Issue Applications of Machine Learning in Audio Classification and Acoustic Scene Characterization)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Applications of Machine Learning in Audio Classification and Acoustic Scene Characterization

Share This Special Issue

Special Issue Editor

Special Issue Information

Keywords

Published Papers (8 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI