Indoor Sound Source Localization via Inverse Element-Free Simulation Based on Joint Sparse Recovery

Wang, Haitao; He, Qunyi; Peng, Shiwei; Zeng, Xiangyang

doi:10.3390/electronics13010069

Open AccessArticle

Indoor Sound Source Localization via Inverse Element-Free Simulation Based on Joint Sparse Recovery

School of Marine Science and Technology, Northwestern Polytechnical University, Xi’an 710072, China

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(1), 69; https://0-doi-org.brum.beds.ac.uk/10.3390/electronics13010069

Submission received: 9 November 2023 / Revised: 18 December 2023 / Accepted: 20 December 2023 / Published: 22 December 2023

(This article belongs to the Section Computer Science & Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

Indoor sound source localization is a key technique in many engineering applications, and an inverse element-free method based on joint sparse recovery in a Bayesian framework is proposed for reverberant environments. In this method, a discrete wave model is constructed to represent the relationships between the sampled sound pressure and the source intensity distribution, and localization in the reverberant environment is realized via inversion from the wave model. By constructing a compact supporting domain, the source intensity can be sparsely represented in subdomains, and the sparse Bayesian framework is used to recover the source intensity. In particular, joint sparse recovery in the frequency domain is exploited to improve the recovery performance. Numerical and experimental verifications show that, compared with another state-of-the-art method, the proposed method achieves high source-localization accuracy and low sidelobes with low computational complexity in highly reverberant environments.

Keywords:

indoor sound source localization; inverse element-free simulation; joint sparse Bayesian recovery

1. Introduction

Accurate localization of indoor sound sources is highly desirable and would have a considerable potential impact on many applications, such as noise-source identification, security surveillance, and virtual reality. This topic has attracted sustained focus in the areas of acoustics and signal processing, and new methods are constantly being proposed to obtain localization that is more precise and robust.

A basic assumption in most commonly used sound source localization (SSL) methods, such as beamforming [1] and spectral estimation [2], is either that the sound propagates in free space, or the enclosed space is large enough to be considered as a free one. This is a key problem faced by most methods when used for indoor environments, where sound propagation is subjected to the multipath effect due to reflections from walls. In indoor environments, conventional methods suffer from many problems, such as low robustness in the reverberant field and confusion about the time of arrival, thereby degrading the performance and applicability of these methods substantially [3]. Another frequent assumption in localization methods is that the sound sources are in the far field, which means that the wavefront generated by a source arrives at all microphones from the same direction. This assumption is also often invalid in indoor environments because the size of the enclosure is usually insufficient to generate a far field.

Improving the classical beamforming [4,5,6,7] and Multiple Signal Classification (MUSIC) [8,9] method is a common idea for precise indoor SSL. These methods estimate the direction of arrival of the wave from a source by means of signal component analysis or noise reduction. There is also a series of supporting techniques for reducing the influence of reverberation; efforts to that end include (i) reducing the background noise based on eigenvalue identification [10], (ii) single-source points enhancement [11,12], (iii) using early response to extract direct signals [13,14], and (iv) sampled data processing via weighted clusters [15] and machine learning [16,17,18,19], to name a few. In particular, sparse recovery has been used in recent years as a basic framework for solving ill-posed problems and very widely in various types of localization methods.

By making proper modifications, the vulnerability of the conventional methods can be alleviated markedly. However, challenges remain for these methods in reverberant localization problems, such as (i) the localization can usually be realized only in specifically shaped spaces or the local space near the array, (ii) there is huge quantitative burden on measurement points, (iii) early direct sound is absent when there is no line of sight from the source to the sampling points, and (iv) only the direction of arrival is obtained, not the exact position of the sound source.

Theoretically, the aforementioned methods can be considered as a type of non-room prior knowledge-based one in which the acoustic wave-propagating model determined by the room physical property is rarely used to support the localization. But in fact, the acoustic model in indoor environments plays a basic and key role in forming the inner sound field. Much better localization is hopeful to be realized if the indoor acoustics model can be used properly in the localization. Recently, model-driven localization methods have been frequently reported and attracted growing attentions [20]. In these methods, the geometrical and physical (e.g., the absorption and scattering) information of the room is used as prior knowledge based on which the transfer function between the spatial sound pressure and the initial source intensity distribution can be constructed. By sampling the sound pressure properly in the space, the source information, including its position and intensity, can be finally recovered based on the transfer function. This processing is actually an inverse procedure of the sound field simulation. Compared to classical localization methods, the model-driven method requires that the room information, especially its geometry, should be known. Nowadays, a basic step in designing architectures or other types of enclosures is computer-aided design (CAD), which can provide sufficient geometrical and physical information, where the prior knowledge needed for model-building is no longer a problem.

The method of image source is commonly used in the sound field simulation and has been used in some localization methods. In these methods, by setting image sensors corresponding to real sensors, the room is expanded into a large free space. Then, this space is discretized into subdomains and the channel responses between the sensors and each subdomain are simulated. By solving this inverse problem, the distribution of the source intensity on these subdomains is obtained and the largest element in the recovered intensity vector gives the position of the source. The method realizes the good performance of localization under a strongly reverberant room. In addition, with only a few sensors, this method can yield low sidelobes, which could give the locations of multiple sources more explicitly [21,22,23]. However, a basic rule in the image method is that the order of the image sources should be large enough to ensure that the enclosure can be considered as a free space. Consequently, the number of image sensors increases sharply with more room walls, thereby degrading the computational accuracy and efficiency substantially.

Other types of commonly used sound field simulation methods in room acoustics are those based on wave theory, such as the finite element method [24] and boundary element method [25,26]. These methods divide the enclosure into smaller, simpler parts and yield responses through solving the system equation derived from the Helmholtz equation. The localization method according to these methods is also potentially beneficial for giving robust results in reverberant environments. It has been reported that the method based on inverse FEM analysis is effective for SSL in strongly reverberant environments [27]. The procedure of this method is similar with the image method that the transfer functions between subdomains and sensors are constructed based on the Helmholtz equation and then the source intensity distributed on nodes is recovered to obtain the position of the source. Since the influence of boundary on wave propagation has been applied in the model, the method is capable of giving correct results in the reverberant environment. However, in classical FEM analysis, the propagating model is constructed based on elements where the adjacent ones are overlapped at the nodes. It leads that the post-processing is needed to smooth the nodal values, which increases the computational complexity.

Inspired by model-based insights, proposed herein is a model-driven indoor SSL method based on inverse element-free simulation. The element-free method is a type of wave-based numerical method which discretizes the space with nodes instead of elements [28]. Its main difference with the classical FEM is that it constructs the shape function based on nodes spread over the problem domain but not a local element, which allows that the shape function is smooth in the whole domain. Based on this property, the position of the source is indicated explicitly by the maximum value of the recovered source intensity.

This paper is organized as follows. The proposed method is derived in Section 2, the numerical verification and performance evaluations are given in Section 3, and the experimental verification is given in Section 4.

2. Sound Source Localization Using Inverse Element-Free Simulation

2.1. Element-Free Simulation Model

The assumed task is localizing sound sources in a known room. The propagation of sound from a source to a receiver in the room is complicated by reverberation, so to realize precise localization, we begin by building a numerical indoor acoustic model.

Assuming a room contains a sound source and its location is

s

, it provides

ρ_{0} q (s, t)

medium quality in unit time. Then, the well-known acoustic wave equation is given by

\nabla^{2} p - \frac{1}{c_{0}^{2}} \frac{\partial^{2} p}{\partial t^{2}} = - ρ_{0} \frac{\partial q}{\partial t}

(1)

where p is the sound pressure; t is time;

c_{0}

is the speed of sound in the air;

ρ_{0}

is the equilibrium density of air; and q is the volume velocity, also known as the source intensity.

For a single frequency

f = ω / 2 π

, the problem can be analyzed in a time-harmonic field. Then, by converting Equation (1) into the frequency domain, the Helmholtz equation and the boundary condition can be deduced as

\{\begin{cases} \nabla^{2} p_{ω} (x) + k^{2} p_{ω} (x) + j ρ_{0} ω q_{ω} = 0, x \in Ω \\ \frac{\partial p_{ω} (x)}{\partial n} = - j ρ_{0} ω \frac{p_{ω} (x)}{Z_{s}}, x \in Γ \end{cases}

(2)

where k is the wave number defined by

ω / c_{0}

;

$p_{ω}$ is the sound pressure at location x in the frequency domain;
$q_{ω}$ is the source intensity of the sound source located at position s;
$Ω$ and $Γ$ denote the inner domain and boundary of the space, respectively;
n denotes the outward normal at the boundary;
and $Z_{s}$ is the acoustic impedance of the boundary.

Usually, Equation (2) cannot be solved analytically when the room has a complex shape, in which case a numerical method (e.g., element-free simulation) is often used to give approximate solutions. To solve the problem, the element-free method subdivides a large problem domain into smaller, simpler parts. But unlike FEM, the element-free method uses nodes but not elements to represent the subdomains. This is an effective strategy to avoid negative influence on the simulation due to poor meshing quality. Based on these pre-defined subdomains, the differential calculation in the Helmholtz equation is converted into integral ones based on the weighted residual method. The method then yields approximate values of the unknowns at these discrete nodes over the domain.

Assuming that the room has been subdivided into a set of subdomains with n nodes, the system equation for calculating the field pressure can be expressed as [29]

(K - k^{2} M + j k C) p_{ω} = F

(3)

where

p_{ω} \in ℂ^{n \times 1}

is the unknown vector of the sound pressure at all nodes, and K, M, C, and F is the system matrices defined by

\{\begin{cases} K = \int_{Ω} [\nabla N (x)] {[\nabla N (x)]}^{T} d x \in ℂ^{n \times n} \\ M = \int_{Ω} [N (x)] {[N (x)]}^{T} d x \in ℂ^{n \times n} \\ C = \frac{ρ_{0}}{Z_{s}} \int_{Γ} [N (x)] {[N (x)]}^{T} d x \in ℂ^{n \times n} \\ F = - j ρ_{0} ω N_{s} q_{ω} \in ℂ^{n \times 1} \end{cases}

(4)

In these expressions,

N (x) \in ℂ^{n \times 1}

is the interpolating function expressing the relationship between the pre-defined nodes and a target point x, and n is the number of nodes. Since N(x) is determined only by the shape of the room, it is also called the shape function. N_s is the shape function spreading the intensity of the source into the load vector.

By integrating over the whole domain, the system matrices contain rich information about how the room affects the formation of its interior sound field. Analogously to harmonic vibration systems, these matrices are usually called the stiffness matrix, the mass matrix, the damping matrix, and the load vector, respectively, thereby showing that they are closely related to the room’s intrinsic features.

Solving Equation (3) gives the sound pressure on pre-defined nodes, whereupon the sound pressure at any field point x_r in the room can be obtained as

p_{x_{r}} = N_{x_{r}}^{T} p_{ω}

(5)

2.2. Localization Based on Inverse Element-Free Simulation

As illustrated in Figure 1, to simulate the sound field in the room, the intensity of the source is decomposed and spread over the nodes to form an initial distribution of the intensity in the element-free method.

It should be note that the initial distribution of the intensity on nodes is not a sound field formed in the room at some time instant. It actually describes the influencing level of the source on the nodes, which is essentially determined by their distance relationships according to the construction of the shape function N_s. Therefore, the node closest to the sound source has the largest initial intensity, while this rule is not valid for the distribution of sound field due to the reverberant effect.

From an inverse perspective, SSL can be considered as the problem of recovering the initially distributed intensity via an inverse procedure of the aforementioned sound field simulation. We assume that there are l sensors in the room sampling the sound signals, with the ith sensor at position r_i. Combining Equations (3) and (5) gives the signal sampled by the ith sensor as

p_{r_{i}} = N_{r_{i}}^{T} {(K - k^{2} M + j k C)}^{- 1} F

(6)

where

p_{r_{i}}

is the sampled sound pressure at a single frequency. By combining all l sampled signals, Equation (6) can be rewritten in general form as

y = D x

(7)

where

y = p_{r} = {[p_{r_{1}}, p_{r_{2}}, \dots, p_{r_{l}}]}^{T}

is the vector of l sampled signals;

$D = - N_{r_{i}} {(K - k^{2} M + j k C)}^{- 1} j ρ_{0} ω$ is the governing matrix;
and $x = N_{s} q_{ω}$ is the vector to be solved.

Solving Equation (7) gives x, and the position of the maximum value in x denotes the location of the sound source. Note that the position of sound source obtained in this way is the position of the node closest to the source, but not the exact position of the sound source.

2.3. Construction of Global and Sparse Shape Function

Since the initial distribution of the intensity is actually governed by the shape function as shown in Equation (4), choosing a shape function with proper properties is very important for the localization performance. On one hand, to realize localization in the global space of the indoor environment, the shape function should be a global one so that it contains the value corresponding to each subdomain. On the other hand, x in Equation (7) is usually a large vector, thereby requiring a large number of sensors to make Equation (7) solvable. To reduce the measurement burden, recovering x by means of a sparse method is an effective strategy, which requires that the shape function is also sparse.

For these two problems, a global and sparse shape function is constructed herein using the moving least square (MLS) [28] method, in which a function

u^{h} (x)

that approximates the exact value u(x) at a random field point x is defined as the sum of the products of a set of basis functions and their coefficients as

u (x) \approx u^{h} (x) = \sum_{i = 1}^{g} b_{i} (x) a_{i} (x) = b^{T} (x) a (x)

(8)

where

b (x) = {[b_{1} (x), b_{2} (x), \dots, b_{g} (x)]}^{T}

is the basis function,

a (x) = {[a_{1} (x), a_{2} (x), \dots a_{g} (x)]}^{T}

is the vector of unknown coefficients, and g is the number of bases.

Several types of basis functions can be used in MLS, such as monomial, trigonometric, and wavelet ones [28]. Herein, the basis function is the monomial function that is defined in three dimensions as

\{\begin{cases} b (x) = {[1, x, y, z]}^{T}, (g = 4, l i n e a r b a s i s) \\ b (x) = {[1, x, y, z, x^{2}, x y, y^{2}, y z, z^{2}, x z]}^{T}, (g = 10, q u a d r a t i c b a s i s) \end{cases}

(9)

where

(x, y, z)

is the spatial coordinates of position x.

To obtain the unknown coefficients

a (x)

, n interpolation points

[{\bar{x}}_{1}, {\bar{x}}_{2}, \dots, {\bar{x}}_{n}]

are set in the neighborhood of x, based on which a discrete weighted L₂ norm is constructed as

J = \sum_{I = 1}^{n} w_{I} (|x - {\bar{x}}_{I}|) {[b^{T} ({\bar{x}}_{I}) a (x) - u ({\bar{x}}_{I})]}^{2} = \sum_{I = 1}^{n} w_{I} (|x - {\bar{x}}_{I}|) {[\sum_{i = 1}^{g} b_{i} ({\bar{x}}_{I}) a_{i} (x) - u ({\bar{x}}_{I})]}^{2}

(10)

where

u ({\bar{x}}_{I})

is the nodal value at

{\bar{x}}_{I}

, and

w_{I} (|x - {\bar{x}}_{I}|)

is the weight function determined by the distance between the target point and the Ith interpolation point.

In the element-free method, the nodes for subdividing the space can be considered as the interpolation points in Equation (10). To construct a global shape function, all nodes must be involved in the calculation, but this will lead to the shape function having too many non-zero elements, thereby requiring a huge number of sensors.

To construct a sparse shape function, a compactly supporting domain as illustrated in Figure 2 is defined on the weight function. In three-dimensional problems, the compactly supporting domain is usually a sphere whose center is the target point. The weight function will set to be zero for the node located outside the compactly supporting domain, i.e.,

|x - {\bar{x}}_{I}| > d

. By adjusting the radius of the compactly supporting domain, only a small number of nodes are located therein; therefore, a sparse shape function can be constructed.

With minimizing J

\frac{\partial J}{\partial a_{j} (x)} = 2 \sum_{I = 1}^{n} w_{I} (|x - {\bar{x}}_{I}|) [\sum_{i = 1}^{g} b_{i} ({\bar{x}}_{I}) a_{i} (x) - u ({\bar{x}}_{I})] b_{j} ({\bar{x}}_{I}) = 0 j = 1, 2, \dots, g

(11)

The following equation can be obtained after collocation

A (x) a (x) = B (x) u

(12)

where A(x), B(x) are matrices defined as

\{\begin{cases} A (x) = \sum_{I = 1}^{n} w_{I} (|x - {\bar{x}}_{I}|) b ({\bar{x}}_{I}) b^{T} ({\bar{x}}_{I}) \\ B (x) = [w_{1} (|x - {\bar{x}}_{1}|) b ({\bar{x}}_{1}), w_{2} (|x - {\bar{x}}_{2}|) b ({\bar{x}}_{2}), \dots, w_{n} (|x - {\bar{x}}_{n}|) b ({\bar{x}}_{n})] \end{cases}

(13)

After solving Equation (12), the following equation can be obtained by substituting a(x) into Equation (8) and the shape function N(x) is obtained consequently.

u^{h} (x) = \overset{1 \times g}{\overset{︷}{b^{T} (x)}} \overset{g \times g}{\overset{︷}{A^{- 1} (x)}} \overset{g \times n}{\overset{︷}{B (x)}} \overset{n \times 1}{\overset{︷}{u}} = N^{T} (x) u

(14)

2.4. Joint Sparse Bayesian Recovery of Sound Source

To ensure that the sound source can be recovered correctly and economically when solving Equation (7), the sparse theory which has been widely accepted as an efficient recovery tool in acoustic problems is used here. Sparsity theory suggests that if a signal is sparse or compressive and the measurement is highly incoherent with the dictionary, it can be reconstructed from a limited number of measurements by solving an under-determined inverse problem.

In Equation (7), the unknown vector to be determined is x, which contains the position of the sound source, and the aforementioned theory indicates that x is actually governed by the shape function of the sound source. Moreover, based on the compactly supporting domain for constructing the shape function, x has only a tiny number of non-zero elements. Therefore, x can be considered as a sparse vector, so the sparse recovery theory can be suitably used here.

If measurement noise is present, Equation (7) can be reformulated as

y = D x + n

(15)

To ensure that the residual

y - D x

is small and x has sufficient sparsity, Equation (15) is usually solved in an unconstrained form and the solution can be obtained as

\tilde{x} = \underset{x \in ℂ^{n \times 1}}{argmin} {‖y - D x‖}_{2}^{2} / 2 + η {‖x‖}_{1}

(16)

where

η

is the regularization parameter controlling the tradeoff between the sparsity of the spectrum and the residual norm.

Since the dictionary D physically demonstrates the geometrical influence of the room on the sound field, its atoms are high correlated among themselves. Equation (15) is formulated within a sparse Bayesian learning framework that has been shown both empirically and theoretically to outperform the conventional greedy and regularized convex optimization methods for dictionary with high coherence [30,31,32]. In addition, the labor-intensive tuning procedure for the regularization parameter

η

can also be avoided.

In particular, a characteristic in source localization problem is that the sparse signal x should have the same position of maximum value in the vector for each frequency, so the joint sparsity of x is exploited to enhance the performances of sparse recovery.

In sparse Bayesian learning, by assuming

p (y | x)

to be Gaussian with noise variance

α_{0}^{- 1}

for all signals, the likelihood function of the signal in Equation (15) is given as

p (y | x, α_{0}) = C N (y | D x, α_{0}^{- 1} I)

(17)

where I is an identity matrix. For a tractable update of

α_{0}

, a gamma prior with parameters a and b is imposed

p (α_{0}) = Γ (α_{0} | a, b)

(18)

where a and b are the shape parameter and the scale parameter, respectively.

Since the sparse signal x for all consecutive frequencies has the same support, a joint sparsity of x can be imposed via a hierarchical prior. Vector x in a narrow band

f \in [f_{1}, f_{F}]

is supposed to follow a zero mean Gaussian prior with the same precision vector,

p (x_{f} | λ) = C N (0, Λ^{- 1} I)

(19)

where

λ = {[λ_{1}, λ_{2}, \dots, λ_{N}]}^{T}

is the vector of the reciprocal of variance and

Λ = d i a g (λ)

is a diagonal matrix with the elements of vector

λ

on its main diagonal.

To update

λ

tractably, we assume that

λ

follows the independent Gamma distribution

p (λ) = \prod_{n = 1}^{N} Γ (λ_{n} | c, d)

(20)

where c and d are hyper parameters.

The probabilistic graphic model in Figure 3 illustrates the described probabilistic modeling of the variables, based on which x can be obtained.

The above derivation gives the basic theory of the proposed SSL method in indoor environments. Its core idea is using the geometrical information of the room to construct a wave-based model to achieve the accurate localization under strong reverberation. Also, using a global shape function and sparse recovery allows this method to realize the reconstruction in the global problem domain in an economical way.

3. Verification and Performance Analysis

To evaluate the accuracy and performance of the proposed method, we begin by reporting numerical verifications performed in a rectangular space as illustrated in Figure 4.

In the basic accuracy verification, the space is divided into

6 \times 6 \times 6 = 216

nodes to construct the wave-based model. The sensor array consisting of 28 sensors is illustrated in Figure 3. The center of the array is located at [1.5, 1.1, 0.5] m and the sound source at [0.60, 0.55, 0.48] m. The sound pressure sampled by the sensors is simulated using the modal superposition method based on normal modes. The density of air is

ρ_{0} = 1.21

kg/m³ and the speed of sound is

c_{0} = 340

m/s. To evaluate the performance of the proposed method under different reverberation levels, the specific acoustic impedances of inner walls is set to

10 ρ_{0} c_{0}

,

20 ρ_{0} c_{0}

, and

100 ρ_{0} c_{0}

, corresponding to an approximate absorption coefficients of 0.19, 0.1, and 0.02, respectively. The results are calculated in a narrow 10 Hz band based on the joint calculation. For convenience, we represent a narrow band using its starting frequency, e.g., using 50 Hz to represent the band of 50–59 Hz.

The proposed method is compared with that [21] which is also a physics-driven method based on image source theory. The recovered normalized intensity distribution on all nodes under the specific acoustic impedance of

10 ρ_{0} c_{0}

at 50 Hz, 100 Hz, and 150 Hz are illustrated in Figure 5.

Figure 5 shows the recovered initial intensity distributions at three frequencies. The real data shows that most of the intensity of the source is distributed on node 87, which is the node nearest to the source position. Figure 5 also shows that the proposed method’s recovered results are very similar with the real data, showing clearly that the source is located at the position of node 87. The reference method also gives correct results at 50 Hz and 100 Hz but has a tiny error at 150 Hz. It also shows that the proposed method gives robust low sidelobes at all three frequencies, whereas the reference method gives low sidelobes at 100 Hz but fails to do so at 50 Hz and 150 Hz.

Figure 6 shows that the localization results of the proposed and reference methods under three reverberation levels from 30 Hz to 500 Hz, from which two pieces of information can be obtained.

First, it is demonstrated by Figure 6 that the proposed approach yields precise and convergent outcomes at low frequencies, whereas it less so at high frequencies; the reference approach exhibits comparable results. Essentially, both techniques are founded on discrete spatial modeling. In this instance, the space is divided into 216 nodes, indicating a relatively low degree of discretization. Figure 7 indicates that, at low frequencies, the sound field exhibits no salient disparities across various spatial domains. Hence, a lower degree of discretization suffices to describe the entire sound field. However, at higher frequencies, sound pressure fluctuates rapidly with spatial location, necessitating finer discretization to ensure that there is no substantial variation in the sound field among subdomains. Localization tests were performed using the two methods with the configuration of

9 \times 9 \times 9 = 729

nodes and

Z_{s} = 100 ρ_{0} c_{0}

, and the results are shown in Figure 7. This shows that increasing the level of discretization is an effective way to improve the accuracy in high frequencies for both methods. The proposed method gives correct results up to 600 Hz, which is higher than the reference method.

Second, Figure 6 shows that both methods perform better under a low reverberation level. For example, both methods fail to give correct results until 400 Hz for

Z_{s} = 10 ρ_{0} c_{0}

, whereas this frequency is 220 Hz for

Z_{s} = 100 ρ_{0} c_{0}

. According to the theory of sound waves in a room [33], increasing the damping factor, namely the acoustic impedance, reduces the eigenfrequency, which is equivalent to performing the localization at a lower frequency. This leads to both methods having a high upper frequency limit under a low reverberation level.

The normalized recovered intensity distributions from 30 Hz to 500 Hz are shown in Figure 8. It illustrates that the proposed method gives a robust and exact localization result below the upper frequency limit, as well as yielding very low sidelobes. As can be seen, the reference method fails to give such a result and yields low sidelobes only at a limited number of frequencies.

The timings for both techniques are summarized herein. Using a model with 216 nodes and setting the maximum number of iterations to 200, the proposed approach completed the calculation for each joint frequency band in just 1.74 s on average, while the reference method took 76.26 s. The fundamental theories used for localization resulted in this significant contrast in processing time between the two methods. In the reference technique, the area is expanded evenly along its surfaces to accommodate the image sources in a significantly larger space. As an illustration, using the division of space into 216 nodes (or 125 spatial grids), the extended area comprises 27 times more grids than the initial space, resulting in 3375 grids. Consequently, the reference method yields high sidelobes since retrieving such a large vector with only 28 sensors is challenging. Apparently, the proposed method performs better regarding both recovery robustness and calculation speed.

Next, the influence of sampling factors on the performance of the proposed method is evaluated. To quantitively analyze the error, a relative distance error is defined as

ε = \frac{d_{r}}{d_{m}} \times 100 %

(21)

where d_r is the distance between the recovered and real position, and d_m is the maximum distance in the space, for example in a

1.6 m \times 1.4 m \times 1.2 m

room, the latter is d_m = 2.4413 m.

To examine the impact of sensor positioning on performance, the presented method was examined using two different sensor arrays, each comprising 28 sensors. The results are illustrated in Figure 9 and indicate that random sensor placement yields superior results compared to planar array setups, as random positions facilitate better sampling differentiation between sensors, despite utilizing identical sensor numbers. Based on these findings, a performance comparison was conducted in Figure 9 by varying the number of sensors. The results, obtained by averaging 50 independent trials with random sensor positions, showed that increasing the number of sensors is an effective method for enhancing the performance of the proposed approach. This improvement is evidenced by an increase in the upper frequency limit and a reduction in the relative distance error.

To evaluate the performance of the proposed method in a real environment, a localization experiment was performed in a cubic enclosure as illustrated in Figure 10. The enclosure was

1.0 m \times 0.9 m \times 0.6 m

in size and made of acrylic plate, and its absorption coefficient was approximately 0.01. A speaker was placed at [0.2, 0.2, 0] m and played white noise, and there were 32 sensors near the top surface as illustrated in Figure 10. In this experimental verification, the space was divided into

6 \times 6 \times 6 = 216

nodes. To ensure that the played noise and sampled signals are trustable due to the device factor, the localization is tested from 100 Hz to 500 Hz. The localization results are shown in Figure 11.

The spectrum in Figure 11 demonstrates that the sampled signal contained obvious noise and, under this condition, the proposed method was less stable than in the numerical verifications. However, it still gave a correct result at most frequencies below 340 Hz. The incorrect results at some frequencies were nevertheless on the node next to the exact position. The recovered intensity distribution shows that the proposed method yielded low sidelobes at most frequencies, and this proves that the proposed method is effective in a real environment.

4. Conclusions

An efficient indoor source localization method has been proposed based on wave-based modeling within a sparse Bayesian framework. In the proposed algorithm, the indoor space is divided into a set of nodes and a wave model is first built based on these nodes according to the element-free theory. This model expresses the transfer relation between the sampled sound pressure and the source intensity distribution in the space. By calculating an inverse form of the wave model, the method is capable of localizing the sound source in a reverberant environment. The joint sparse Bayesian framework is also used to recover the source intensity in calculating the inversion of the wave model. This algorithm effectively reduces the number of required sample points, increasing the feasibility of its practical applications.

The numerical verifications show that the method gives precise results at low frequencies in an indoor environment. It also yields lower sidelobes and a higher calculating efficiency compared to other indoor sound source localizing methods. The performance of this method is closely related to the magnitude of the wall impedance. The accuracy of the method tends to decrease as the wall impedance increases. This is because high wall impedance causes the sound field to become more diffuse, making it challenging to capture signals with sufficient contrast. Due to the frequency limitation imposed by wave simulation theory, the method exhibits higher accuracy in the low-frequency range. Increasing the level of discretization is an effective approach to enhancing accuracy at higher frequencies, but it comes at the cost of increased computation time. Sampling sound pressure in the space is also an important aspect affecting the accuracy. Numerical verification has shown that capturing information with higher contrast during sampling is advantageous for achieving better results. Experimental verification has also been performed and the results demonstrate that the method still maintains satisfactory accuracy even in the presence of measurement noise. The proposed method establishes a new framework for localizing sound sources in reverberant environments. It has the potential to be an effective alternative method in engineering applications involving indoor sound source localization.

Author Contributions

H.W.: Writing—original draft, Methodology, and Validation; Q.H.: Software; S.P.: Visualization; X.Z.: Writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

The authors would like to gratefully acknowledge the financial support provided by the National Natural Science Foundation of China (grant no. 12074317).

Data Availability Statement

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Chiariotti, P.; Martarelli, M.; Castellini, P. Acoustic beamforming for noise source localization—Reviews, methodology and applications. Mech. Syst. Signal Process. 2019, 120, 422–448. [Google Scholar] [CrossRef]
Hurt, N.E. Maximum-likelihood-estimation and music in array localization signal-processing-A review. Multidimens. Syst. Signal Process. 1990, 1, 279–325. [Google Scholar] [CrossRef]
Zhong, X.L.; Yang, Z.H.; Yu, S.F.; Song, H.; Gu, Z.H. Comparison of sound location variations in free and reverberant fields: An event-related potential study. J. Acoust. Soc. Am. 2020, 148, EL14–EL19. [Google Scholar] [CrossRef] [PubMed]
Huang, F.; Sheng, W.; Ma, X. Modified projection approach for robust adaptive array beamforming. Signal Process 2012, 92, 1758–1763. [Google Scholar] [CrossRef]
Mathews, J.; Braasch, J. Sparse iterative beamforming using spherical microphone arrays for low-latency direction of arrival estimation in reverberant environments. J. Audio Eng. Soc. 2021, 69, 967–977. [Google Scholar] [CrossRef]
Fischer, J.; Doolan, C. Improving acoustic beamforming maps in a reverberant environment by modifying the cross-correlation matrix. J. Sound Vib. 2017, 411, 129–147. [Google Scholar] [CrossRef]
SongGong, K.K.; Chen, H.W.; Wang, W.W. Indoor multi-speaker localization based on Bayesian nonparametrics in the circular harmonic domain. IEEE/ACM Trans. Audio Speech Lang. Process. 2021, 29, 1864–1880. [Google Scholar] [CrossRef]
Sewtz, M.; Bodenmuller, T.; Triebel, R. Robust MUSIC-based sound source localization in reverberant and echoic environments. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 24 October 2020–24 January 2021; pp. 2474–2480. [Google Scholar]
Jia, M.S.; Wu, Y.X.; Bao, C.C.; Ritz, C. Multi-source DOA estimation in reverberant environments by jointing detection and modeling of time-frequency points. IEEE/ACM Trans. Audio Speech Lang. Process. 2021, 29, 379–392. [Google Scholar] [CrossRef]
Fischer, J.; Doolan, C. An improved eigenvalue background noise reduction method for acoustic beamforming. Mech. Syst. Signal Process. 2020, 140, 106702. [Google Scholar] [CrossRef]
Pavlidi, D.; Griffin, A.; Puigt, M.; Mouchtaris, A. Real-time multiple sound source localization and counting using a circular microphone array. IEEE Trans. Audio Speech Lang. Process. 2013, 21, 2193–2206. [Google Scholar] [CrossRef]
Jia, M.S.; Jia, Y.T.; Gao, S.; Wang, J.; Wang, S.S. Multi-source DOA estimation in reverberant environments using potential single-source points enhancement. Appl. Acoust. 2021, 174, 107782. [Google Scholar] [CrossRef]
Shlomo, T.; Rafaely, B. Blind localization of early room reflections using phase aligned spatial correlation. IEEE Trans. Signal Process. 2021, 69, 1213–1225. [Google Scholar] [CrossRef]
Achdjian, H.; Moulin, E.; Benmeddour, E.; Assaad, J.; Chehami, L. Source Localisation in a Reverberant Plate Using Average Coda Properties and Early Signal Strength. Acta Acust. United Acust. 2014, 100, 834–841. [Google Scholar] [CrossRef]
Kuhne, M.; Togneri, R.; Nordholm, S. Robust source localization in reverberant environments based on weighted fuzzy clustering. IEEE Signal Process. Lett. 2009, 16, 85–88. [Google Scholar] [CrossRef]
Fahim, A.; Samarasinghe, P.N.; Abhayapala, T.D. Multi-source DOA estimation through pattern recognition of the modal coherence of a reverberant soundfield. IEEE/ACM Trans. Audio Speech Lang. Process. 2020, 28, 605–618. [Google Scholar] [CrossRef]
Woodward, S.F.; Reiss, D.; Magnasco, M.O. Learning to localize sounds in a highly reverberant environment: Machine-learning tracking of dolphin whistle-like sounds in a pool. PLoS ONE 2020, 15, e0235155. [Google Scholar] [CrossRef] [PubMed]
Liu, N.; Chen, H.W.; Songgong, K.K.; Li, Y.W. Deep learning assisted sound source localization using two orthogonal first-order differential microphone arraysa. J. Acoust. Soc. Am. 2021, 149, 1069–1084. [Google Scholar] [CrossRef]
Vargas, E.; Hopgood, J.R.; Brown, K.; Subr, K. On improved training of CNN for acoustic source localiation. IEEE/ACM Trans. Audio Speech Lang. Process. 2021, 29, 720–732. [Google Scholar] [CrossRef]
Gombots, S.; Kaltenbacher, M.; Kaltenbacher, B. Capabilities of inverse scheme for acoustic source localization at low frequencies. Acta Acust. 2021, 5, 44. [Google Scholar] [CrossRef]
Wang, L.; Liu, Y.S.; Zhao, L.F.; Wang, Q.; Zeng, X.Y.; Chen, K.A. Acoustic source localization in strong reverberant environment by parametric Bayesian dictionary learning. Signal Process. 2018, 143, 232–240. [Google Scholar] [CrossRef]
Yang, J.L.; Zhong, X.H.; Chen, W.G.; Wang, W.W. Multiple acoustic source localization in microphone array networks. IEEE/ACM Trans. Audio Speech Lang. Process. 2021, 29, 334–347. [Google Scholar] [CrossRef]
Liu, Y.S.; Wang, L.; Zeng, X.Y.; Wang, H.T. Sound source localization in reverberant environments based on structural sparse Bayesian learning. Acta Acust. United Acust. 2018, 104, 528–541. [Google Scholar] [CrossRef]
Thompson, L.L. A review of finite-element methods for time-harmonic acoustics. J. Acoust. Soc. Am. 2006, 119, 1315–1330. [Google Scholar] [CrossRef]
Piscoya, R.; Ochmann, M. Acoustical Boundary Elements: Theory and Virtual Experiments. Arch. Acoust. 2014, 39, 453–465. [Google Scholar] [CrossRef]
Abawi, A.T. Finite element and boundary methods in structural acoustics and vibration. J. Acoust. Soc. Am. 2017, 141, 4300. [Google Scholar] [CrossRef]
Dokmanic, I.; Vetterli, M. Room helps: Acoustic localization with finite elements. In Proceedings of the ICASSP-IEEE International Conference on Acoustics, Speech and Signal Processing, Kyoto, Japan, 25–30 March 2012. [Google Scholar]
Belytschko, T.; Lu, Y.Y.; Gu, L. Element-free Galerkin methods. Int. J. Numer. Methods Eng. 1994, 37, 229–256. [Google Scholar] [CrossRef]
Wang, H.T.; Zeng, X.Y. Calculation of sound fields in small enclosures using a meshless model. Appl. Acoust. 2013, 74, 459–466. [Google Scholar] [CrossRef]
Tipping, M.E. Sparse bayesian learning and the relevance vector machine. J. Mach. Learn. Res. 2001, 1, 211–244. [Google Scholar]
Ping, G.L.; Fernandez-Grande, E.; Gerstoft, P.; Chu, Z.G. Three-dimensional source localization using sparse Bayesian learning on a spherical microphone array. J. Acoust. Soc. Am. 2020, 147, 3895–3904. [Google Scholar] [CrossRef]
Xenaki, A.; Fernandez-Grande, E.; Gerstoft, P. Block-sparse beamforming for spatially extended sources in a Bayesian formulation. J. Acoust. Soc. Am. 2016, 140, 1828–1838. [Google Scholar] [CrossRef]
Kuttruff, H. Room Acoustics, 5th ed.; Spon: London, UK, 2009. [Google Scholar]

Figure 1. Schematic of source intensity distributing on nodes. In the element-free simulation, the sound source intensity is decomposed and spread over nodes based on the shape function and the load vector can be obtained.

Figure 2. Schematic of shape function constructed based on compactly supporting domain. By setting a support domain, only a few nodes are contributed by the source when the load vector is built. Thus, the initial intensity distribution with the property of sparsity can be obtained, and then the sparse method can be used in its recovery to reduce the number of measurement points.

Figure 3. The probabilistic graphic model of the joint sparse Bayesian recovery algorithm.

Figure 4. Schematic of enclosed space with a size of

1.6 m \times 1.4 m \times 1.2 m

. It is divided into

6 \times 6 \times 6 = 216

nodes to construct the wave-based model. The sound source expressed by red dot is located at [0.60, 0.55, 0.48] m and the planar sensor array consists of 28 sensors.

Figure 4. Schematic of enclosed space with a size of

1.6 m \times 1.4 m \times 1.2 m

. It is divided into

6 \times 6 \times 6 = 216

nodes to construct the wave-based model. The sound source expressed by red dot is located at [0.60, 0.55, 0.48] m and the planar sensor array consists of 28 sensors.

Figure 5. Recovered initial intensity distributions on nodes at 50 Hz, 100 Hz, and 150 Hz. The recovered intensity distribution denotes the initial pattern of the load vector, in which the position of the maximum value represents the location of the source.

Figure 6. Localization results from 30 Hz to 500 Hz under the acoustic impedances of

Z_{s} = 10 ρ_{0} c_{0}

,

20 ρ_{0} c_{0}

, and

100 ρ_{0} c_{0}

, respectively. The black line indicates the actual source position, which is represented by that of node 87. The blue and red dots lying on different ordinal numbers of nodes indicate the localization results of the proposed and reference method, respectively.

Figure 6. Localization results from 30 Hz to 500 Hz under the acoustic impedances of

Z_{s} = 10 ρ_{0} c_{0}

,

20 ρ_{0} c_{0}

, and

100 ρ_{0} c_{0}

, respectively. The black line indicates the actual source position, which is represented by that of node 87. The blue and red dots lying on different ordinal numbers of nodes indicate the localization results of the proposed and reference method, respectively.

Figure 7. Sound fields at 100 Hz and 400 Hz, and localization results with increasing the discretization level. (a) Sound field for z = 0.4 m at 100 Hz, for which a correct result can be obtained with a relatively low discretization level; (b) sound field for z = 0.4 m at 400 Hz, for which there is a more complicated spatial variation than that in lower frequency, so finer discretization is necessary to ensure that the change in sound field in a single subdomain is not too rapid; and (c) localization result under a discretization level of

9 \times 9 \times 9 = 729

nodes.

Figure 7. Sound fields at 100 Hz and 400 Hz, and localization results with increasing the discretization level. (a) Sound field for z = 0.4 m at 100 Hz, for which a correct result can be obtained with a relatively low discretization level; (b) sound field for z = 0.4 m at 400 Hz, for which there is a more complicated spatial variation than that in lower frequency, so finer discretization is necessary to ensure that the change in sound field in a single subdomain is not too rapid; and (c) localization result under a discretization level of

9 \times 9 \times 9 = 729

nodes.

Figure 8. Recovered initial intensity distributions from 30 Hz to 500 Hz under acoustic impedances of

Z_{s} = 10 ρ_{0} c_{0}

,

20 ρ_{0} c_{0}

, and

100 ρ_{0} c_{0}

, respectively.

Figure 8. Recovered initial intensity distributions from 30 Hz to 500 Hz under acoustic impedances of

Z_{s} = 10 ρ_{0} c_{0}

,

20 ρ_{0} c_{0}

, and

100 ρ_{0} c_{0}

, respectively.

Figure 9. Error analyses regarding sampling factors. (a) Comparison of results obtained using a planar array and a random array in which the sensors are distributed randomly in the space; (b) comparison of results obtained using 28, 56, and 84 randomly distributed sensors, in this case, each result is calculated through 50 independent trials with random sensor positions.

Figure 10. Measurement setup comprising a speaker and 32 sensors in a cubic enclosure.

Figure 11. Results of the experimental verification. (a) Sampled signal of a sensor in the frequency domain, containing obvious noise; (b) recovered initial intensity distributions; (c) localization results where the black line indicates the source position and green lines indicate the neighboring nodal positions of the source; and (d) relative distance error.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, H.; He, Q.; Peng, S.; Zeng, X. Indoor Sound Source Localization via Inverse Element-Free Simulation Based on Joint Sparse Recovery. Electronics 2024, 13, 69. https://0-doi-org.brum.beds.ac.uk/10.3390/electronics13010069

AMA Style

Wang H, He Q, Peng S, Zeng X. Indoor Sound Source Localization via Inverse Element-Free Simulation Based on Joint Sparse Recovery. Electronics. 2024; 13(1):69. https://0-doi-org.brum.beds.ac.uk/10.3390/electronics13010069

Chicago/Turabian Style

Wang, Haitao, Qunyi He, Shiwei Peng, and Xiangyang Zeng. 2024. "Indoor Sound Source Localization via Inverse Element-Free Simulation Based on Joint Sparse Recovery" Electronics 13, no. 1: 69. https://0-doi-org.brum.beds.ac.uk/10.3390/electronics13010069

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Indoor Sound Source Localization via Inverse Element-Free Simulation Based on Joint Sparse Recovery

Abstract

1. Introduction

2. Sound Source Localization Using Inverse Element-Free Simulation

2.1. Element-Free Simulation Model

2.2. Localization Based on Inverse Element-Free Simulation

2.3. Construction of Global and Sparse Shape Function

2.4. Joint Sparse Bayesian Recovery of Sound Source

3. Verification and Performance Analysis

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI