Psychoacoustic Engineering and Applications

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Acoustics and Vibrations".

Deadline for manuscript submissions: closed (31 March 2019) | Viewed by 70045

Special Issue Editor

Applied Psychoacoustics Lab, University of Huddersfield, Huddersfield HD1 3DH, UK
Interests: psychoacoustics; spatial audio; 3D audio; virtual acoustics; extended reality
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues, 

Psychoacoustics is an important research field not only for understanding the fundamental mechanism of human auditory perception, but also for developing methods of capturing, processing, and measuring audio signals for practical applications. The present Special Issue aims to introduce the state-of-the-art of psychoacoustics engineering in various application areas and explore new directions of psychoacoustics research in the age of technological convergence and artificial intelligence. This Special Issue will aim to collect 10 to 20 papers and will be published as a book collection.  

Topics of interest include, but are not limited to, the following:

  • Audio coding
  • Audio for virtual and augmented reality
  • Audio using artificial intelligence
  • Hearing aids
  • Interactive audio
  • Building/Environmental acoustics
  • Soundscapes
  • Spatial audio (multichannel and binaural)
  • Sound recording and mixing techniques
  • The sound-quality prediction model

Dr. Hyunkook Lee
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • audio coding
  • interactive audio
  • artificial intelligence
  • virtual/augmented reality
  • building/environmental acoustics
  • hearing aid
  • soundscape
  • sound recording
  • sound quality
  • spatial audio

Published Papers (13 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

11 pages, 1510 KiB  
Article
Localisation of Vertical Auditory Phantom Image with Band-limited Reductions of Vertical Interchannel Crosstalk
by Rory Wallis and Hyunkook Lee
Appl. Sci. 2020, 10(4), 1490; https://0-doi-org.brum.beds.ac.uk/10.3390/app10041490 - 21 Feb 2020
Cited by 1 | Viewed by 2601
Abstract
Direct sound that is captured by the upper layer of a three-dimensional (3D) microphone array is typically regarded as vertical interchannel crosstalk (VIC), since it tends to produce an undesired effect of the sound source image being elevated from the ear-level loudspeaker layer [...] Read more.
Direct sound that is captured by the upper layer of a three-dimensional (3D) microphone array is typically regarded as vertical interchannel crosstalk (VIC), since it tends to produce an undesired effect of the sound source image being elevated from the ear-level loudspeaker layer position (0°) in reproduction. The present study examined the effectiveness of band-limited VIC attenuation methods on preventing the vertical image shift problem. In a subjective experiment, five natural sound sources were presented as vertically-oriented phantom images while using two stereophonic loudspeaker pairs elevated at 0° and 30° in front of the listener. The upper layer signal (i.e., VIC) was attenuated in various octave-band-dependent conditions that were based on vertical localisation thresholds obtained from previous studies. The results showed that it was possible to achieve the goal of panning the phantom image at the same height as the image produced by the main loudspeaker layer by attenuating only a single octave band with the centre frequency of 4 kHz or 8 kHz or multiple bands at 1 kHz and above. This has a useful practical implication in 3D sound recording and mixing where a vertically oriented phantom image is rendered. Full article
(This article belongs to the Special Issue Psychoacoustic Engineering and Applications)
Show Figures

Figure 1

19 pages, 1502 KiB  
Article
Effects of Background Sounds on Annoyance Reaction to Foreground Sounds in Psychoacoustic Experiments in the Laboratory: Limits and Consequences
by Armin Taghipour and Eduardo Pelizzari
Appl. Sci. 2019, 9(9), 1872; https://0-doi-org.brum.beds.ac.uk/10.3390/app9091872 - 07 May 2019
Cited by 5 | Viewed by 2971
Abstract
In a variety of applications, e.g., psychoacoustic experiments, virtual sound propagation demonstration, or synthesized noise production, noise samples are played back in laboratories. To simulate realistic scenes or to mask unwanted background sounds, it is sometimes preferable to add background ambient sounds to [...] Read more.
In a variety of applications, e.g., psychoacoustic experiments, virtual sound propagation demonstration, or synthesized noise production, noise samples are played back in laboratories. To simulate realistic scenes or to mask unwanted background sounds, it is sometimes preferable to add background ambient sounds to the noise. However, this can influence noise perception. It should be ensured that either background sounds do not affect, e.g., annoyance from foreground noise or that possible effects can be quantified. Two laboratory experiments are reported, in which effects of mixing background sounds to foreground helicopter samples were investigated. By means of partially balanced incomplete block designs, possible effects of three independent variables, i.e., helicopter’s sound exposure level, background type, and background sound pressure level were tested on the dependent variable annoyance, rated on the ICBEN 11-point numerical scale. The main predictor of annoyance was helicopter’s sound exposure level. Stimuli with eventful background sounds were found to be more annoying than those with less eventful background sounds. Furthermore, background type and level interacted significantly. For the major part of the background sound level range, increasing the background level was associated with increased or decreased annoyance for stimuli with eventful and less eventful background sounds, respectively. Full article
(This article belongs to the Special Issue Psychoacoustic Engineering and Applications)
Show Figures

Figure 1

22 pages, 3100 KiB  
Article
Automatic Spatial Audio Scene Classification in Binaural Recordings of Music
by Sławomir K. Zieliński and Hyunkook Lee
Appl. Sci. 2019, 9(9), 1724; https://0-doi-org.brum.beds.ac.uk/10.3390/app9091724 - 26 Apr 2019
Cited by 10 | Viewed by 3462
Abstract
The aim of the study was to develop a method for automatic classification of the three spatial audio scenes, differing in horizontal distribution of foreground and background audio content around a listener in binaurally rendered recordings of music. For the purpose of the [...] Read more.
The aim of the study was to develop a method for automatic classification of the three spatial audio scenes, differing in horizontal distribution of foreground and background audio content around a listener in binaurally rendered recordings of music. For the purpose of the study, audio recordings were synthesized using thirteen sets of binaural-room-impulse-responses (BRIRs), representing room acoustics of both semi-anechoic and reverberant venues. Head movements were not considered in the study. The proposed method was assumption-free with regards to the number and characteristics of the audio sources. A least absolute shrinkage and selection operator was employed as a classifier. According to the results, it is possible to automatically identify the spatial scenes using a combination of binaural and spectro-temporal features. The method exhibits a satisfactory classification accuracy when it is trained and then tested on different stimuli but synthesized using the same BRIRs (accuracy ranging from 74% to 98%), even in highly reverberant conditions. However, the generalizability of the method needs to be further improved. This study demonstrates that in addition to the binaural cues, the Mel-frequency cepstral coefficients constitute an important carrier of spatial information, imperative for the classification of spatial audio scenes. Full article
(This article belongs to the Special Issue Psychoacoustic Engineering and Applications)
Show Figures

Figure 1

17 pages, 2911 KiB  
Article
Timbre Preferences in the Context of Mixing Music
by Felix A. Dobrowohl, Andrew J. Milne and Roger T. Dean
Appl. Sci. 2019, 9(8), 1695; https://0-doi-org.brum.beds.ac.uk/10.3390/app9081695 - 24 Apr 2019
Cited by 5 | Viewed by 3500
Abstract
Mixing music is a highly complex task. This is exacerbated by the fact that timbre perception is still poorly understood. As a result, few studies have been able to pinpoint listeners’ preferences in terms of timbre. In order to investigate timbre preference in [...] Read more.
Mixing music is a highly complex task. This is exacerbated by the fact that timbre perception is still poorly understood. As a result, few studies have been able to pinpoint listeners’ preferences in terms of timbre. In order to investigate timbre preference in a music production context, we let participants mix multiple individual parts of musical pieces (bassline, harmony, and arpeggio parts, all sounded with a synthesizer) by adjusting four specific timbral attributes of the synthesizer (lowpass, sawtooth/square wave oscillation blend, distortion, and inharmonicity). After participants mixed all parts of a musical piece, they were asked to rate multiple mixes of the same musical piece. Listeners showed preferences for their own mixes over random, fixed sawtooth, or expert mixes. However, participants were unable to identify their own mixes. Despite not being able to accurately identify their own mixes, participants consistently preferred the mix they thought to be their own, regardless of whether or not this mix was indeed their own. Correlations and cluster analysis of the participants’ mixing settings show most participants behaving independently in their mixing approaches and one moderate sized cluster of participants who are actually rather similar. In reference to the starting-settings, participants applied the biggest changes to the sound with the inharmonicity manipulation (measured in the perceptual distance) despite often mentioning that they do not find this manipulation particularly useful. The results show that listeners have a consistent, yet individual timbre preference and are able to reliably evoke changes in timbre towards their own preferences. Full article
(This article belongs to the Special Issue Psychoacoustic Engineering and Applications)
Show Figures

Figure 1

14 pages, 3878 KiB  
Article
Impact of Structural Parameters on the Auditory Perception of Musical Sounds in Closed Spaces: An Experimental Study
by Lei Wang, Xiyue Ma, Rong Li and Xiangyang Zeng
Appl. Sci. 2019, 9(7), 1416; https://0-doi-org.brum.beds.ac.uk/10.3390/app9071416 - 04 Apr 2019
Viewed by 2295
Abstract
This study attempts to investigate the impact of structural parameters (volume, shape, and the wall absorption coefficient) in closed space on the auditory perception of three different musical sound types. With binaural audibility technology and room impulse response measurement (RIR), this paper first [...] Read more.
This study attempts to investigate the impact of structural parameters (volume, shape, and the wall absorption coefficient) in closed space on the auditory perception of three different musical sound types. With binaural audibility technology and room impulse response measurement (RIR), this paper first verifies the reliability of using ODEON software in simulating simplified closed-space auditory scenes. Then, 96 music binaural signals produced in eight simulated closed spaces with different structural parameters are synthesized. Finally, auditory perception experiment is conducted on the synthesized binaural signals by using pair comparison method, and variance analysis is also made on the experimental results. It is concluded that (1) a hemispherical cabin with a small volume and large wall sound absorption coefficient is most suitable for playing a single instrument, such as the flute or violin, and (2) a cabin with large volume is suitable for playing multiple instruments music such as symphony, but the walls should not be totally reflective. The experimental scheme and results of current study provide guidance for designing the inner structure of the concert hall to achieve preferable auditory perception in practice. Full article
(This article belongs to the Special Issue Psychoacoustic Engineering and Applications)
Show Figures

Figure 1

21 pages, 12545 KiB  
Article
Interaural Level Difference Optimization of Binaural Ambisonic Rendering
by Thomas McKenzie, Damian T. Murphy and Gavin Kearney
Appl. Sci. 2019, 9(6), 1226; https://0-doi-org.brum.beds.ac.uk/10.3390/app9061226 - 23 Mar 2019
Cited by 6 | Viewed by 3936
Abstract
Ambisonics is a spatial audio technique appropriate for dynamic binaural rendering due to its sound field rotation and transformation capabilities, which has made it popular for virtual reality applications. An issue with low-order Ambisonics is that interaural level differences (ILDs) are often reproduced [...] Read more.
Ambisonics is a spatial audio technique appropriate for dynamic binaural rendering due to its sound field rotation and transformation capabilities, which has made it popular for virtual reality applications. An issue with low-order Ambisonics is that interaural level differences (ILDs) are often reproduced with lower values when compared to head-related impulse responses (HRIRs), which reduces lateralization and spaciousness. This paper introduces a method of Ambisonic ILD Optimization (AIO), a pre-processing technique to bring the ILDs produced by virtual loudspeaker binaural Ambisonic rendering closer to those of HRIRs. AIO is evaluated objectively for Ambisonic orders up to fifth order versus a reference dataset of HRIRs for all locations on the sphere via estimated ILD and spectral difference, and perceptually through listening tests using both simple and complex scenes. Results conclude AIO produces an overall improvement for all tested orders of Ambisonics, though the benefits are greatest at first and second order. Full article
(This article belongs to the Special Issue Psychoacoustic Engineering and Applications)
Show Figures

Figure 1

17 pages, 9733 KiB  
Article
A Study on Affective Dimensions to Engine Acceleration Sound Quality Using Acoustic Parameters
by Soyoun Moon, Sunghwan Park, Donggun Park, Wonjoon Kim, Myung Hwan Yun and Dongchul Park
Appl. Sci. 2019, 9(3), 604; https://0-doi-org.brum.beds.ac.uk/10.3390/app9030604 - 12 Feb 2019
Cited by 17 | Viewed by 4913
Abstract
The technical performance of recent automobiles is highly progressed and standardized across different manufacturers. This study seeks to derive a semantic space of engine acceleration sound quality for end users and identify the relation with sound characteristics. For this study, two affective attributes: [...] Read more.
The technical performance of recent automobiles is highly progressed and standardized across different manufacturers. This study seeks to derive a semantic space of engine acceleration sound quality for end users and identify the relation with sound characteristics. For this study, two affective attributes: ‘refined’ and ‘powerful’, and eight acoustic parameters considering revolutions per minute were used to determine the correlation coefficient for those affective attributes. In the experiment, a total of 35 automobiles were selected. Each of the 3rd gear wide open throttle sounds was recorded and evaluated by 42 adult subjects with normal hearing ability and driving license. Their subjective evaluations were analyzed using factor analysis, independent t-test, correlation analysis, and regression analysis. The prediction models for the affective dimensions show distinct differences for the revolutions per minute. From the experiment, it was confirmed that the customers’ affective response can be predicted through the acoustic parameters. In addition, it was found that the initial revolutions per minute in the accelerated condition had the greatest influence on the affective response. This study can be a useful guideline to design engine acceleration sounds that satisfy customers’ affective experience. Full article
(This article belongs to the Special Issue Psychoacoustic Engineering and Applications)
Show Figures

Figure 1

18 pages, 674 KiB  
Article
Modelling Timbral Hardness
by Andy Pearce, Tim Brookes and Russell Mason
Appl. Sci. 2019, 9(3), 466; https://0-doi-org.brum.beds.ac.uk/10.3390/app9030466 - 30 Jan 2019
Cited by 7 | Viewed by 3205
Abstract
Hardness is the most commonly searched timbral attribute within freesound.org, a commonly used online sound effect repository. A perceptual model of hardness was developed to enable the automatic generation of metadata to facilitate hardness-based filtering or sorting of search results. A training dataset [...] Read more.
Hardness is the most commonly searched timbral attribute within freesound.org, a commonly used online sound effect repository. A perceptual model of hardness was developed to enable the automatic generation of metadata to facilitate hardness-based filtering or sorting of search results. A training dataset was collected of 202 stimuli with 32 sound source types, and perceived hardness was assessed by a panel of listeners. A multilinear regression model was developed on six features: maximum bandwidth, attack centroid, midband level, percussive-to-harmonic ratio, onset strength, and log attack time. This model predicted the hardness of the training data with R 2 = 0.76. It predicted hardness within a new dataset with R 2 = 0.57, and predicted the rank order of individual sources perfectly, after accounting for the subjective variance of the ratings. Its performance exceeded that of human listeners. Full article
(This article belongs to the Special Issue Psychoacoustic Engineering and Applications)
Show Figures

Figure 1

16 pages, 3743 KiB  
Article
The Role of Reverberation and Magnitude Spectra of Direct Parts in Contralateral and Ipsilateral Ear Signals on Perceived Externalization
by Song Li, Roman Schlieper and Jürgen Peissig
Appl. Sci. 2019, 9(3), 460; https://0-doi-org.brum.beds.ac.uk/10.3390/app9030460 - 29 Jan 2019
Cited by 7 | Viewed by 4110
Abstract
Several studies show that the reverberation and spectral details in direct sounds are two essential cues for perceived externalization of virtual sound sources in reverberant environments. The present study investigated the role of these two cues in contralateral and ipsilateral ear signals on [...] Read more.
Several studies show that the reverberation and spectral details in direct sounds are two essential cues for perceived externalization of virtual sound sources in reverberant environments. The present study investigated the role of these two cues in contralateral and ipsilateral ear signals on perceived externalization of headphone-reproduced binaural sound images at different azimuth angles. For this purpose, seven pairs of non-individual binaural room impulse responses (BRIRs) were measured at azimuth angles of −90°, −60°, −30°, 0°, 30°, 60°, and 90° in a listening room. The magnitude spectra of direct parts were smoothed, and the reverberation was removed, either in left or right ear BRIRs. Such modified BRIRs were convolved with a speech signal, and the resulting binaural sounds were presented over headphones. Subjects were asked to assess the degree of perceived externalization for the presented stimuli. The result of the subjective listening experiment revealed that the magnitude spectra of direct parts in ipsilateral ear signals and the reverberation in contralateral ear signals are important for perceived externalization of virtual lateral sound sources. Full article
(This article belongs to the Special Issue Psychoacoustic Engineering and Applications)
Show Figures

Graphical abstract

24 pages, 6018 KiB  
Article
Influence of Contextual Factors on Soundscape in Urban Open Spaces
by Xiaolong Zhao, Shilun Zhang, Qi Meng and Jian Kang
Appl. Sci. 2018, 8(12), 2524; https://0-doi-org.brum.beds.ac.uk/10.3390/app8122524 - 06 Dec 2018
Cited by 14 | Viewed by 3655
Abstract
The acoustic environment in urban open spaces has played a key role for users. This study analyzed the different effects of contextual factors, including shop openness, season, and commercial function, on the soundscape in two typical commercial pedestrian streets. The following observations were [...] Read more.
The acoustic environment in urban open spaces has played a key role for users. This study analyzed the different effects of contextual factors, including shop openness, season, and commercial function, on the soundscape in two typical commercial pedestrian streets. The following observations were based on a series of measurements, including crowd measurements, acoustic environment measurements, and a questionnaire survey. First, the number of talkers in Central Avenue was greater than the number of talkers in Kuan Alley in cases with the same crowd density, while there was no significant difference in the sound pressure level. When the crowd density increased, acoustic comfort trended downward in Kuan Alley, while the value of acoustic comfort in Central Avenue took a parabolic shape. Second, there was no significant difference between the number of talkers in summer and the number of talkers in winter; however, when crowd density increased by 0.1 persons/m2, the level of sound pressure increased by 1.3 dBA in winter and 2.2 dBA in summer. Acoustic comfort took a parabolic shape that first increased and then decreased in both winter and summer. Regarding commercial function, as the crowd density increased, the number of talkers and the level of sound pressure both increased, while acoustic comfort decreased in three zones with different commercial functions. In addition, a cross-tab analysis was used to discuss the relationship between the number of talkers and the level of sound pressure, and it was found to be positive. Full article
(This article belongs to the Special Issue Psychoacoustic Engineering and Applications)
Show Figures

Figure 1

21 pages, 33140 KiB  
Article
A Perceptual Evaluation of Individual and Non-Individual HRTFs: A Case Study of the SADIE II Database
by Cal Armstrong, Lewis Thresh, Damian Murphy and Gavin Kearney
Appl. Sci. 2018, 8(11), 2029; https://0-doi-org.brum.beds.ac.uk/10.3390/app8112029 - 23 Oct 2018
Cited by 68 | Viewed by 11296
Abstract
As binaural audio continues to permeate immersive technologies, it is vital to develop a detailed understanding of the perceptual relevance of HRTFs. Previous research has explored the benefit of individual HRTFs with respect to localisation. However, localisation is only one metric with which [...] Read more.
As binaural audio continues to permeate immersive technologies, it is vital to develop a detailed understanding of the perceptual relevance of HRTFs. Previous research has explored the benefit of individual HRTFs with respect to localisation. However, localisation is only one metric with which it is possible to rate spatial audio. This paper evaluates the perceived timbral and spatial characteristics of both individual and non-individual HRTFs and compares the results to overall preference. To that end, the measurement and evaluation of a high-resolution multi-environment binaural Impulse Response database is presented for 20 subjects, including the KU100 and KEMAR binaural mannequins. Post-processing techniques, including low frequency compensation and diffuse field equalisation are discussed in relation to the 8802 unique HRTFs measured for each mannequin and 2818/2114 HRTFs measured for each human. Listening test results indicate that particular HRTF sets are preferred more generally by subjects over their own individual measurements. Full article
(This article belongs to the Special Issue Psychoacoustic Engineering and Applications)
Show Figures

Figure 1

17 pages, 9545 KiB  
Article
Diffuse-Field Equalisation of Binaural Ambisonic Rendering
by Thomas McKenzie, Damian T. Murphy and Gavin Kearney
Appl. Sci. 2018, 8(10), 1956; https://0-doi-org.brum.beds.ac.uk/10.3390/app8101956 - 17 Oct 2018
Cited by 15 | Viewed by 5020
Abstract
Ambisonics has enjoyed a recent resurgence in popularity due to virtual reality applications. Low order Ambisonic reproduction is inherently inaccurate at high frequencies, which causes poor timbre and height localisation. Diffuse-Field Equalisation (DFE), the theory of removing direction-independent frequency response, is applied to [...] Read more.
Ambisonics has enjoyed a recent resurgence in popularity due to virtual reality applications. Low order Ambisonic reproduction is inherently inaccurate at high frequencies, which causes poor timbre and height localisation. Diffuse-Field Equalisation (DFE), the theory of removing direction-independent frequency response, is applied to binaural (over headphones) Ambisonic rendering to address high-frequency reproduction. DFE of Ambisonics is evaluated by comparing binaural Ambisonic rendering to direct convolution via head-related impulse responses (HRIRs) in three ways: spectral difference, predicted sagittal plane localisation and perceptual listening tests on timbre. Results show DFE successfully improves frequency reproduction of binaural Ambisonic rendering for the majority of sound source locations, as well as the limitations of the technique, and set the basis for further research in the field. Full article
(This article belongs to the Special Issue Psychoacoustic Engineering and Applications)
Show Figures

Figure 1

Review

Jump to: Research

22 pages, 1553 KiB  
Review
Psychoacoustic Models for Perceptual Audio Coding—A Tutorial Review
by Jürgen Herre and Sascha Dick
Appl. Sci. 2019, 9(14), 2854; https://0-doi-org.brum.beds.ac.uk/10.3390/app9142854 - 17 Jul 2019
Cited by 12 | Viewed by 17037
Abstract
Psychoacoustic models of human auditory perception have found an important application in the realm of perceptual audio coding, where exploiting the limitations of perception and removal of irrelevance is key to achieving a significant reduction in bitrate while preserving subjective audio quality. To [...] Read more.
Psychoacoustic models of human auditory perception have found an important application in the realm of perceptual audio coding, where exploiting the limitations of perception and removal of irrelevance is key to achieving a significant reduction in bitrate while preserving subjective audio quality. To this end, psychoacoustic models do not need to be perfect to satisfy their purpose, and in fact the commonly employed models only represent a small subset of the known properties and abilities of the human auditory system. This paper provides a tutorial introduction of the most commonly used psychoacoustic models for low bitrate perceptual audio coding. Full article
(This article belongs to the Special Issue Psychoacoustic Engineering and Applications)
Show Figures

Figure 1

Back to TopTop