sensors-logo

Journal Browser

Journal Browser

Audio Signal Processing for Sensing Technologies

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Intelligent Sensors".

Deadline for manuscript submissions: closed (15 June 2023) | Viewed by 15340

Special Issue Editor


E-Mail Website
Guest Editor
Polytechnic University of Valencia, 46022 Valencia, Spain
Interests: audio; acoustics; signal processing; machine learning; multimedia
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Sound is a pressure wave created by a vibrating object. Through the sound, it is possible to infer information from our environment with applications in multiple scenarios in daily life, automation, quality control, etc. Audio signal processing has been proven as a powerful tool for audio sensing through the extraction of various types of features in both time and frequency domains from audio signals and applying machine learning techniques in recognizing the different sound events. Real-world environments rarely present a good signal to noise ratio, and frequently, more than one sound source is playing at the same time, which complicates the necessary digital processing as well as the classification and detection stages. In recent years, the same deep learning techniques that have been employed to vision problems have been successfully adapted to sound classification problems, improving recognition in unconstrained acoustic environments and constituting an important line of research.

Throughout the last decade, the smartphone has become increasingly assimilated into our daily lives, making mobile audio sensing an active and promising area of interest. Many techniques and end-to-end applications have been developed, and future and innovative ones are possible, employing the continuous computational power increase of smartphones.

The aim of this Special Issue is to bring together innovative developments and applications in audio sensing, and it is open to all researchers. Papers are solicited on the following themes:

Detection of daily activities: eating, driving, coughing;

Detection of user states: emotions, stress, happiness, sadness, anger, etc.;

Sound events in the home environment (Smart Home);

Ambient assisted living (quality of life of older and disabled people);

Detection of the environment (street, office, café, beach, nature, etc.);

Applications to health problems (pathological voice, type of cough, etc.);

Mobile applications for audio sensing;

Audio sensing in sports activities (quantitative measurements, performance evaluation);

Industrial uses (predictive maintenance).

Dr. Jose J. Lopez
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Audio sensing
  • Audio processing
  • Sound detection
  • Machine learning
  • Deep neural networks (DNN)
  • Feature extraction
  • Mobile applications

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

17 pages, 2235 KiB  
Article
Unsupervised Single-Channel Singing Voice Separation with Weighted Robust Principal Component Analysis Based on Gammatone Auditory Filterbank and Vocal Activity Detection
by Feng Li, Yujun Hu and Lingling Wang
Sensors 2023, 23(6), 3015; https://0-doi-org.brum.beds.ac.uk/10.3390/s23063015 - 10 Mar 2023
Viewed by 1952
Abstract
Singing-voice separation is a separation task that involves a singing voice and musical accompaniment. In this paper, we propose a novel, unsupervised methodology for extracting a singing voice from the background in a musical mixture. This method is a modification of robust principal [...] Read more.
Singing-voice separation is a separation task that involves a singing voice and musical accompaniment. In this paper, we propose a novel, unsupervised methodology for extracting a singing voice from the background in a musical mixture. This method is a modification of robust principal component analysis (RPCA) that separates a singing voice by using weighting based on gammatone filterbank and vocal activity detection. Although RPCA is a helpful method for separating voices from the music mixture, it fails when one single value, such as drums, is much larger than others (e.g., the accompanying instruments). As a result, the proposed approach takes advantage of varying values between low-rank (background) and sparse matrices (singing voice). Additionally, we propose an expanded RPCA on the cochleagram by utilizing coalescent masking on the gammatone. Finally, we utilize vocal activity detection to enhance the separation outcomes by eliminating the lingering music signal. Evaluation results reveal that the proposed approach provides superior separation outcomes than RPCA on ccMixter and DSD100 datasets. Full article
(This article belongs to the Special Issue Audio Signal Processing for Sensing Technologies)
Show Figures

Figure 1

23 pages, 4857 KiB  
Article
3D Multiple Sound Source Localization by Proposed T-Shaped Circular Distributed Microphone Arrays in Combination with GEVD and Adaptive GCC-PHAT/ML Algorithms
by Ali Dehghan Firoozabadi, Pablo Irarrazaval, Pablo Adasme, David Zabala-Blanco, Pablo Palacios Játiva and Cesar Azurdia-Meza
Sensors 2022, 22(3), 1011; https://0-doi-org.brum.beds.ac.uk/10.3390/s22031011 - 28 Jan 2022
Cited by 6 | Viewed by 2765
Abstract
Multiple simultaneous sound source localization (SSL) is one of the most important applications in the speech signal processing. The one-step algorithms with the advantage of low computational complexity (and low accuracy), and the two-step methods with high accuracy (and high computational complexity) are [...] Read more.
Multiple simultaneous sound source localization (SSL) is one of the most important applications in the speech signal processing. The one-step algorithms with the advantage of low computational complexity (and low accuracy), and the two-step methods with high accuracy (and high computational complexity) are proposed for multiple SSL. In this article, a combination of one-step-based method based on the generalized eigenvalue decomposition (GEVD), and a two-step-based method based on the adaptive generalized cross-correlation (GCC) by using the phase transform/maximum likelihood (PHAT/ML) filters along with a novel T-shaped circular distributed microphone array (TCDMA) is proposed for 3D multiple simultaneous SSL. In addition, the low computational complexity advantage of the GCC algorithm is considered in combination with the high accuracy of the GEVD method by using the distributed microphone array to eliminate spatial aliasing and thus obtain more appropriate information. The proposed T-shaped circular distributed microphone array-based adaptive GEVD and GCC-PHAT/ML algorithms (TCDMA-AGGPM) is compared with hierarchical grid refinement (HiGRID), temporal extension of multiple response model of sparse Bayesian learning with spherical harmonic (SH) extension (SH-TMSBL), sound field morphological component analysis (SF-MCA), and time-frequency mixture weight Bayesian nonparametric acoustical holography beamforming (TF-MW-BNP-AHB) methods based on the mean absolute estimation error (MAEE) criteria in noisy and reverberant environments on simulated and real data. The superiority of the proposed method is presented by showing the high accuracy and low computational complexity for 3D multiple simultaneous SSL. Full article
(This article belongs to the Special Issue Audio Signal Processing for Sensing Technologies)
Show Figures

Figure 1

20 pages, 1341 KiB  
Article
A Physics-Informed Neural Network Approach for Nearfield Acoustic Holography
by Marco Olivieri, Mirco Pezzoli, Fabio Antonacci and Augusto Sarti
Sensors 2021, 21(23), 7834; https://0-doi-org.brum.beds.ac.uk/10.3390/s21237834 - 25 Nov 2021
Cited by 14 | Viewed by 3323
Abstract
In this manuscript, we describe a novel methodology for nearfield acoustic holography (NAH). The proposed technique is based on convolutional neural networks, with autoencoder architecture, to reconstruct the pressure and velocity fields on the surface of the vibrating structure using the sampled pressure [...] Read more.
In this manuscript, we describe a novel methodology for nearfield acoustic holography (NAH). The proposed technique is based on convolutional neural networks, with autoencoder architecture, to reconstruct the pressure and velocity fields on the surface of the vibrating structure using the sampled pressure soundfield on the holographic plane as input. The loss function used for training the network is based on a combination of two components. The first component is the error in the reconstructed velocity. The second component is the error between the sound pressure on the holographic plane and its estimate obtained from forward propagating the pressure and velocity fields on the structure through the Kirchhoff–Helmholtz integral; thus, bringing some knowledge about the physics of the process under study into the estimation algorithm. Due to the explicit presence of the Kirchhoff–Helmholtz integral in the loss function, we name the proposed technique the Kirchhoff–Helmholtz-based convolutional neural network, KHCNN. KHCNN has been tested on two large datasets of rectangular plates and violin shells. Results show that it attains very good accuracy, with a gain in the NMSE of the estimated velocity field that can top 10 dB, with respect to state-of-the-art techniques. The same trend is observed if the normalized cross correlation is used as a metric. Full article
(This article belongs to the Special Issue Audio Signal Processing for Sensing Technologies)
Show Figures

Figure 1

18 pages, 6293 KiB  
Article
Deep Neural Network-Based Respiratory Pathology Classification Using Cough Sounds
by B T Balamurali, Hwan Ing Hee, Saumitra Kapoor, Oon Hoe Teoh, Sung Shin Teng, Khai Pin Lee, Dorien Herremans and Jer Ming Chen
Sensors 2021, 21(16), 5555; https://0-doi-org.brum.beds.ac.uk/10.3390/s21165555 - 18 Aug 2021
Cited by 8 | Viewed by 3752
Abstract
Intelligent systems are transforming the world, as well as our healthcare system. We propose a deep learning-based cough sound classification model that can distinguish between children with healthy versus pathological coughs such as asthma, upper respiratory tract infection (URTI), and lower respiratory tract [...] Read more.
Intelligent systems are transforming the world, as well as our healthcare system. We propose a deep learning-based cough sound classification model that can distinguish between children with healthy versus pathological coughs such as asthma, upper respiratory tract infection (URTI), and lower respiratory tract infection (LRTI). To train a deep neural network model, we collected a new dataset of cough sounds, labelled with a clinician’s diagnosis. The chosen model is a bidirectional long–short-term memory network (BiLSTM) based on Mel-Frequency Cepstral Coefficients (MFCCs) features. The resulting trained model when trained for classifying two classes of coughs—healthy or pathology (in general or belonging to a specific respiratory pathology)—reaches accuracy exceeding 84% when classifying the cough to the label provided by the physicians’ diagnosis. To classify the subject’s respiratory pathology condition, results of multiple cough epochs per subject were combined. The resulting prediction accuracy exceeds 91% for all three respiratory pathologies. However, when the model is trained to classify and discriminate among four classes of coughs, overall accuracy dropped: one class of pathological coughs is often misclassified as the other. However, if one considers the healthy cough classified as healthy and pathological cough classified to have some kind of pathology, then the overall accuracy of the four-class model is above 84%. A longitudinal study of MFCC feature space when comparing pathological and recovered coughs collected from the same subjects revealed the fact that pathological coughs, irrespective of the underlying conditions, occupy the same feature space making it harder to differentiate only using MFCC features. Full article
(This article belongs to the Special Issue Audio Signal Processing for Sensing Technologies)
Show Figures

Figure 1

Back to TopTop