Sound and Music Computing -- Music and Interaction

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Acoustics and Vibrations".

Deadline for manuscript submissions: closed (31 January 2020) | Viewed by 54734

Special Issue Editors


E-Mail Website
Guest Editor
Multisensory Experience Lab, Department of Architecture, Design and Media Technology, Aalborg University, 2450 Copenhagen SV, Denmark
Interests: sonic interaction design; sound for virtual and augmented reality; audio-haptic interaction; sound synthesis by physical models; multimodal interfaces; multimodal perception and cognition; virtual and augmented reality
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Lab of Music Informatics, Department of Computer Science, University of Milano. Via Celoria 18, 20133 Milano, Italy

E-Mail Website
Guest Editor
Application of Information and Communication Technologies (ATIC) Research Group, ETSI Telecomunicación, Campus Universitario de Teatinos s/n, 29071 Malaga, Spain
Interests: music information retrieval; audio signal processing; machine learning; musical acoustics; serious games; eeg signal processing; multimedia aplications
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
ATIC Research Group, Universidad de Málaga, 29007 Málaga, Spain
Interests: serious games; digital audio and image processing; pattern analysis and recognition and applications of signal processing techniques and methods
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear colleagues,

Sound and Music Computing is a highly multidisciplinary research field. It combines scientific, technological, and artistic methods to produce, model, and understand audio and sonic arts with the help of computers. Sound and music computing borrows methods from computer science, electrical engineering, mathematics, musicology, psychology, etc.

In this Special Issue, we aim for papers covering a wide selection of topics related to acoustics, psychoacoustics, music, technology for music, audio analysis, musicology, sonification, music games, machine learning, serious games, immersive audio, sound synthesis, etc. Therefore, the following topics will be considered:

  • Acoustics and psychoacoustics;
  • AI and music performance;
  • Analysis/synthesis of the singing voice;
  • Applications in audio and music;
  • Architectural acoustics modeling and auralization;
  • Assistive technologies;
  • Audio and music for AR/VR;
  • Audio and music for games;
  • Audio interactions;
  • Audio recognition and bird-singing;
  • Auditory display;
  • Automatic music generation/accompaniment systems;
  • Bioacoustic modeling;
  • Biomusic and sound installations;
  • Computational archeomusicology;
  • Computational musicology;
  • Computational ethnomusicology;
  • Computational ornithomusicology;
  • Computer-aided real time composition;
  • Computer music software and programming languages;
  • Data sonification;
  • Digital signal processing;
  • Digital systems of tuning;
  • Ethics of sound and new technologies;
  • Gesture, motion, and music;
  • History and aesthetics of electroacoustic music;
  • Immersive audio/soundscape environments;
  • Interaction and improvisation;
  • Interactive environments for voice training;
  • Interactive performance systems;
  • Jazz performance and machine learning;
  • Mathematical music theory;
  • Music and robotics;
  • Music games and music for games;
  • Music information retrieval;
  • Music technology in education;
  • Music therapy and technology for special needs;
  • New interfaces for musical expression;
  • New musical instruments;
  • Perception and cognition of sound and music;
  • Recording and mastering automation techniques;
  • Sonification;
  • Sound/music and the neurosciences;
  • Spatial sound and spatialization techniques;
  • Physical models for sound synthesis;
  • VR applications and technologies for sound and music.

Submissions are invited for both original research and review articles. Additionally, invited papers based on excellent contributions to the 2019 Sound and Music Computing Conference SMC-19 will be included. We hope that this collection of papers will serve as an inspiration for those interested in sound and music computing.

Prof. Dr. Stefania Serafin
Prof. Dr. Federico Avanzini
Prof. Dr. Isabel Barbancho
Prof. Dr. Lorenzo J. Tardón
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Audio signal processing
  • Computer music
  • Multimedia
  • Music cognition
  • Music information retrieval
  • Music technology
  • Sonic interaction design
  • Virtual reality
  • Interaction with music
  • Serious game for music

Published Papers (14 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Other

16 pages, 2065 KiB  
Article
Efficient Melody Extraction Based on Extreme Learning Machine
by Weiwei Zhang, Qiaoling Zhang, Sheng Bi, Shaojun Fang and Jinliang Dai
Appl. Sci. 2020, 10(7), 2213; https://0-doi-org.brum.beds.ac.uk/10.3390/app10072213 - 25 Mar 2020
Viewed by 2028
Abstract
Melody extraction is an important task in music information retrieval community and it is unresolved due to the complex nature of real-world recordings. In this paper, the melody extraction problem is addressed in the extreme learning machine (ELM) framework. More specifically, the input [...] Read more.
Melody extraction is an important task in music information retrieval community and it is unresolved due to the complex nature of real-world recordings. In this paper, the melody extraction problem is addressed in the extreme learning machine (ELM) framework. More specifically, the input musical signal is first pre-processed to mimic the human auditory system. The music features are then constructed by constant-Q transform (CQT), and the concentration strategy is introduced to make use of contextual information. Afterwards, the rough melody pitches are determined by ELM network, according to its pre-trained parameters. Finally, the rough melody pitches are fine-tuned by the spectral peaks around the frame-wise rough pitches. The proposed method can extract melody from polyphonic music efficiently and effectively, where pitch estimation and voicing detection are conducted jointly. Some experiments have been conducted based on three publicly available datasets. The experimental results reveal that the proposed method achieves higher overall accuracies with very fast speed. Full article
(This article belongs to the Special Issue Sound and Music Computing -- Music and Interaction)
Show Figures

Figure 1

14 pages, 425 KiB  
Article
Mining Characteristic Patterns for Comparative Music Corpus Analysis
by Kerstin Neubarth and Darrell Conklin
Appl. Sci. 2020, 10(6), 1991; https://0-doi-org.brum.beds.ac.uk/10.3390/app10061991 - 14 Mar 2020
Viewed by 2435
Abstract
A core issue of computational pattern mining is the identification of interesting patterns. When mining music corpora organized into classes of songs, patterns may be of interest because they are characteristic, describing prevalent properties of classes, or because they are discriminant, capturing distinctive [...] Read more.
A core issue of computational pattern mining is the identification of interesting patterns. When mining music corpora organized into classes of songs, patterns may be of interest because they are characteristic, describing prevalent properties of classes, or because they are discriminant, capturing distinctive properties of classes. Existing work in computational music corpus analysis has focused on discovering discriminant patterns. This paper studies characteristic patterns, investigating the behavior of different pattern interestingness measures in balancing coverage and discriminability of classes in top k pattern mining and in individual top ranked patterns. Characteristic pattern mining is applied to the collection of Native American music by Frances Densmore, and the discovered patterns are shown to be supported by Densmore’s own analyses. Full article
(This article belongs to the Special Issue Sound and Music Computing -- Music and Interaction)
Show Figures

Figure 1

14 pages, 1255 KiB  
Article
Automatic Tuning of High Piano Tones
by Sneha Shah and Vesa Välimäki
Appl. Sci. 2020, 10(6), 1983; https://0-doi-org.brum.beds.ac.uk/10.3390/app10061983 - 13 Mar 2020
Cited by 2 | Viewed by 3368
Abstract
Piano tuning is known to be difficult because the stiffness of piano strings causes the tones produced to be inharmonic. Aural tuning is time consuming and requires the help of a professional. This motivates the question of whether this process can be automated. [...] Read more.
Piano tuning is known to be difficult because the stiffness of piano strings causes the tones produced to be inharmonic. Aural tuning is time consuming and requires the help of a professional. This motivates the question of whether this process can be automated. Attempts at automatic tuning are usually assessed by comparing the Railsback curve of the results with the curve of a professional tuner. In this paper we determine a simple and reliable rule for tuning the high tones of a piano with the help of a listening test. This rule consists of matching the two tones in an octave interval so that the first partial frequency of the upper tone becomes exactly the same as the second partial frequency of the lower tone. This rule was rated best among four tuning rules that were compared in the test. The results found are explained using a beat-based analysis, and are consistent with some previous studies. They are also tested against the existing method of using Railsback curves, and it is shown that comparison using Railsback curves is an unreliable method of assessing different tunings. The findings from this paper can be used to create a complete automatic tuner that could make the process of piano tuning quick and inexpensive. Full article
(This article belongs to the Special Issue Sound and Music Computing -- Music and Interaction)
Show Figures

Figure 1

21 pages, 1976 KiB  
Article
Emotional Training and Modification of Disruptive Behaviors through Computer-Game-Based Music Therapy in Secondary Education
by Rocío Chao-Fernández, Vicenta Gisbert-Caudeli and Rubén Vázquez-Sánchez
Appl. Sci. 2020, 10(5), 1796; https://0-doi-org.brum.beds.ac.uk/10.3390/app10051796 - 05 Mar 2020
Cited by 7 | Viewed by 4241
Abstract
Music education research has shown interest in music therapy for integral development of the person, both in their performance and academic knowledge and in their personality. This project aims to analyze the benefits of music therapy in the comprehensive training of students with [...] Read more.
Music education research has shown interest in music therapy for integral development of the person, both in their performance and academic knowledge and in their personality. This project aims to analyze the benefits of music therapy in the comprehensive training of students with disruptive behaviors (n = 6). Tests designed by Gallego, Alonso, Cruz, and Lizama (1999) were conducted to assess emotional intelligence, which showed very low results. A series of activities were designed based on the use of the music videogame Musichao, the curricular content of which was adapted for this pilot experience. Subsequently, the emotional intelligence tests were applied again to determine the effectiveness of the teaching experience. The results indicate that, with the use of this videogame, significant improvements were obtained, both in the development of multiple intelligences and in self-motivation, self-awareness, self-control, and more specifically, in social skills, minimizing behaviors that were classified as inappropriate and/or aggressive, and becoming more skilled in their interactions with the surrounding environment. Full article
(This article belongs to the Special Issue Sound and Music Computing -- Music and Interaction)
Show Figures

Figure 1

13 pages, 3475 KiB  
Article
Gaussian Process Synthesis of Artificial Sounds
by Aristotelis Hadjakos
Appl. Sci. 2020, 10(5), 1781; https://0-doi-org.brum.beds.ac.uk/10.3390/app10051781 - 05 Mar 2020
Viewed by 2901
Abstract
In this paper, we propose Gaussian Process (GP) sound synthesis. A GP is used to sample random continuous functions, which are then used for wavetable or waveshaping synthesis. The shape of the sampled functions is controlled with the kernel function of the GP. [...] Read more.
In this paper, we propose Gaussian Process (GP) sound synthesis. A GP is used to sample random continuous functions, which are then used for wavetable or waveshaping synthesis. The shape of the sampled functions is controlled with the kernel function of the GP. Sampling multiple times from the same GP generates perceptually similar but non-identical sounds. Since there are many ways to choose the kernel function and its parameters, an interface aids the user in sound selection. The interface is based on a two-dimensional visualization of the sounds grouped by their similarity as judged by a t-SNE analysis of their Mel Frequency Cepstral Coefficient (MFCC) representations. Full article
(This article belongs to the Special Issue Sound and Music Computing -- Music and Interaction)
Show Figures

Figure 1

18 pages, 11056 KiB  
Article
Source Separation Using Dilated Time-Frequency DenseNet for Music Identification in Broadcast Contents
by Woon-Haeng Heo, Hyemi Kim and Oh-Wook Kwon
Appl. Sci. 2020, 10(5), 1727; https://0-doi-org.brum.beds.ac.uk/10.3390/app10051727 - 03 Mar 2020
Cited by 12 | Viewed by 3923
Abstract
We propose a source separation architecture using dilated time-frequency DenseNet for background music identification of broadcast content. We apply source separation techniques to the mixed signals of music and speech. For the source separation purpose, we propose a new architecture to add a [...] Read more.
We propose a source separation architecture using dilated time-frequency DenseNet for background music identification of broadcast content. We apply source separation techniques to the mixed signals of music and speech. For the source separation purpose, we propose a new architecture to add a time-frequency dilated convolution to the conventional DenseNet in order to effectively increase the receptive field in the source separation scheme. In addition, we apply different convolutions to each frequency band of the spectrogram in order to reflect the different frequency characteristics of the low- and high-frequency bands. To verify the performance of the proposed architecture, we perform singing-voice separation and music-identification experiments. As a result, we confirm that the proposed architecture produces the best performance in both experiments because it uses the dilated convolution to reflect wide contextual information. Full article
(This article belongs to the Special Issue Sound and Music Computing -- Music and Interaction)
Show Figures

Figure 1

16 pages, 5101 KiB  
Article
Binaural Rendering with Measured Room Responses: First-Order Ambisonic Microphone vs. Dummy Head
by Markus Zaunschirm, Matthias Frank and Franz Zotter
Appl. Sci. 2020, 10(5), 1631; https://0-doi-org.brum.beds.ac.uk/10.3390/app10051631 - 29 Feb 2020
Cited by 19 | Viewed by 4714
Abstract
To improve the limited degree of immersion of static binaural rendering for headphones, an increased measurement effort to obtain multiple-orientation binaural room impulse responses (MOBRIRs) is reasonable and enables dynamic variable-orientation rendering. We investigate the perceptual characteristics of dynamic rendering from MOBRIRs and [...] Read more.
To improve the limited degree of immersion of static binaural rendering for headphones, an increased measurement effort to obtain multiple-orientation binaural room impulse responses (MOBRIRs) is reasonable and enables dynamic variable-orientation rendering. We investigate the perceptual characteristics of dynamic rendering from MOBRIRs and test for the required angular resolution. Our first listening experiment shows that a resolution between 15 and 30 is sufficient to accomplish binaural rendering of high quality, regarding timbre, spatial mapping, and continuity. A more versatile alternative considers the separation of the room-dependent (RIR) from the listener-dependent head-related (HRIR) parts, and an efficient implementation thereof involves the measurement of a first-order Ambisonic RIR (ARIR) with a tetrahedral microphone. A resolution-enhanced ARIR can be obtained by an Ambisonic spatial decomposition method (ASDM) utilizing instantaneous direction of arrival estimation. ASDM permits dynamic rendering in higher-order Ambisonics, with the flexibility to render either using dummy-head or individualized HRIRs. Our comparative second listening experiment shows that 5th-order ASDM outperforms the MOBRIR rendering with resolutions coarser than 30 for all tested perceptual aspects. Both listening experiments are based on BRIRs and ARIRs measured in a studio environment. Full article
(This article belongs to the Special Issue Sound and Music Computing -- Music and Interaction)
Show Figures

Figure 1

23 pages, 7589 KiB  
Article
Tempo and Metrical Analysis by Tracking Multiple Metrical Levels Using Autocorrelation
by Olivier Lartillot and Didier Grandjean
Appl. Sci. 2019, 9(23), 5121; https://0-doi-org.brum.beds.ac.uk/10.3390/app9235121 - 26 Nov 2019
Cited by 6 | Viewed by 3346
Abstract
We present a method for tempo estimation from audio recordings based on signal processing and peak tracking, and not depending on training on ground-truth data. First, an accentuation curve, emphasizing the temporal location and accentuation of notes, is based on a detection of [...] Read more.
We present a method for tempo estimation from audio recordings based on signal processing and peak tracking, and not depending on training on ground-truth data. First, an accentuation curve, emphasizing the temporal location and accentuation of notes, is based on a detection of bursts of energy localized in time and frequency. This enables the detection of notes in dense polyphonic texture, while ignoring spectral fluctuation produced by vibrato and tremolo. Periodicities in the accentuation curve are detected using an improved version of autocorrelation function. Hierarchical metrical structures, composed of a large set of periodicities in pairwise harmonic relationships, are tracked over time. In this way, the metrical structure can be tracked even if the rhythmical emphasis switches from one metrical level to another. This approach, compared to all the other participants to the Music Information Retrieval Evaluation eXchange (MIREX) Audio Tempo Extraction competition from 2006 to 2018, is the third best one among those that can track tempo variations. While the two best methods are based on machine learning, our method suggests a way to track tempo founded on signal processing and heuristics-based peak tracking. Moreover, the approach offers for the first time a detailed representation of the dynamic evolution of the metrical structure. The method is integrated into MIRtoolbox, a Matlab toolbox freely available. Full article
(This article belongs to the Special Issue Sound and Music Computing -- Music and Interaction)
Show Figures

Graphical abstract

14 pages, 1548 KiB  
Article
AR Graphic Representation of Musical Notes for Self-Learning on Guitar
by Marta Sylvia Del Rio-Guerra, Jorge Martin-Gutierrez, Vicente A. Lopez-Chao, Rodolfo Flores Parra and Mario A. Ramirez Sosa
Appl. Sci. 2019, 9(21), 4527; https://0-doi-org.brum.beds.ac.uk/10.3390/app9214527 - 25 Oct 2019
Cited by 24 | Viewed by 6353
Abstract
Despite being one of the most commonly self-taught instruments, and despite the ready availability of significant amounts of didactic material, the guitar is a challenging instrument to learn. This paper proposes an application based on augmented reality (AR) that is designed to teach [...] Read more.
Despite being one of the most commonly self-taught instruments, and despite the ready availability of significant amounts of didactic material, the guitar is a challenging instrument to learn. This paper proposes an application based on augmented reality (AR) that is designed to teach beginner students basic musical chords on the guitar, and provides details of the experimental study performed to determine whether the AR methodology produced faster results than traditional one-on-one training with a music teacher. Participants were divided into two groups of the same size. Group 1 consisted of 32 participants who used the AR app to teach themselves guitar, while Group 2, with a further 32 participants, received formal instruction from a music teacher. Results found no differences in learning times between the two groups based on the variables of method and gender. However, participant feedback suggested that there are advantages to the self-taught approach using AR that are worth considering. A system usability scale (SUS) questionnaire was used to measure the usability of the application, obtaining a score of 82.5, which was higher than the average of 68 that indicates an application to be good from a user experience point of view, and satisfied the purpose for which the application was created. Full article
(This article belongs to the Special Issue Sound and Music Computing -- Music and Interaction)
Show Figures

Figure 1

13 pages, 5676 KiB  
Article
A User-Specific Approach for Comfortable Application of Advanced 3D CAD/CAM Technique in Dental Environments Using the Harmonic Series Noise Model
by Eun-Sung Song, Young-Jun Lim and Bongju Kim
Appl. Sci. 2019, 9(20), 4307; https://0-doi-org.brum.beds.ac.uk/10.3390/app9204307 - 14 Oct 2019
Cited by 3 | Viewed by 2322
Abstract
Recently, there has been a focus on improving the user’s emotional state by providing high-quality sound beyond noise reduction against industrial product noise. Three-dimensional computer aided design and computer-aided manufacturing (3D CAD/CAM) dental milling machines are a major source of industrial product noise [...] Read more.
Recently, there has been a focus on improving the user’s emotional state by providing high-quality sound beyond noise reduction against industrial product noise. Three-dimensional computer aided design and computer-aided manufacturing (3D CAD/CAM) dental milling machines are a major source of industrial product noise in the dental environment. Here, we propose a noise-control method to improve the sound quality in the dental environment. Our main goals are to analyze the acoustic characteristics of the sounds generated from the dental milling machine, to control the noise by active noise control, and to improve the sound quality of the residual noise by synthesized new sound. In our previous study, we demonstrated noise reduction in dental milling machines through tactile transducers. To improve the sound quality on residual noise, we performed frequency analysis, and synthesized sound similarly as musical instruments, using the harmonic series noise model. Our data suggest that noise improvement through synthesis may prove to be a useful tool in the development of dental devices. Full article
(This article belongs to the Special Issue Sound and Music Computing -- Music and Interaction)
Show Figures

Figure 1

13 pages, 784 KiB  
Article
Generation of Melodies for the Lost Chant of the Mozarabic Rite
by Darrell Conklin and Geert Maessen
Appl. Sci. 2019, 9(20), 4285; https://0-doi-org.brum.beds.ac.uk/10.3390/app9204285 - 12 Oct 2019
Cited by 4 | Viewed by 4411
Abstract
Prior to the establishment of the Roman rite with its Gregorian chant, in the Iberian Peninsula and Southern France the Mozarabic rite, with its own tradition of chant, was dominant from the sixth until the eleventh century. Few of these chants are preserved [...] Read more.
Prior to the establishment of the Roman rite with its Gregorian chant, in the Iberian Peninsula and Southern France the Mozarabic rite, with its own tradition of chant, was dominant from the sixth until the eleventh century. Few of these chants are preserved in pitch readable notation and thousands exist only in manuscripts using adiastematic neumes which specify only melodic contour relations and not exact intervals. Though their precise melodies appear to be forever lost it is possible to use computational machine learning and statistical sequence generation methods to produce plausible realizations. Pieces from the León antiphoner, dating from the early tenth century, were encoded into templates then instantiated by sampling from a statistical model trained on pitch-readable Gregorian chants. A concert of ten Mozarabic chant realizations was performed at a music festival in the Netherlands. This study shows that it is possible to construct realizations for incomplete ancient cultural remnants using only partial information compiled into templates, combined with statistical models learned from extant pieces to fill the templates. Full article
(This article belongs to the Special Issue Sound and Music Computing -- Music and Interaction)
Show Figures

Figure 1

14 pages, 7786 KiB  
Article
State-of-the-Art Model for Music Object Recognition with Deep Learning
by Zhiqing Huang, Xiang Jia and Yifan Guo
Appl. Sci. 2019, 9(13), 2645; https://0-doi-org.brum.beds.ac.uk/10.3390/app9132645 - 29 Jun 2019
Cited by 32 | Viewed by 5930
Abstract
Optical music recognition (OMR) is an area in music information retrieval. Music object detection is a key part of the OMR pipeline. Notes are used to record pitch and duration and have semantic information. Therefore, note recognition is the core and key aspect [...] Read more.
Optical music recognition (OMR) is an area in music information retrieval. Music object detection is a key part of the OMR pipeline. Notes are used to record pitch and duration and have semantic information. Therefore, note recognition is the core and key aspect of music score recognition. This paper proposes an end-to-end detection model based on a deep convolutional neural network and feature fusion. This model is able to directly process the entire image and then output the symbol categories and the pitch and duration of notes. We show a state-of-the-art recognition model for general music symbols which can get 0.92 duration accurary and 0.96 pitch accuracy . Full article
(This article belongs to the Special Issue Sound and Music Computing -- Music and Interaction)
Show Figures

Figure 1

23 pages, 3511 KiB  
Article
Adaptive Refinements of Pitch Tracking and HNR Estimation within a Vocoder for Statistical Parametric Speech Synthesis
by Mohammed Salah Al-Radhi, Tamás Gábor Csapó and Géza Németh
Appl. Sci. 2019, 9(12), 2460; https://0-doi-org.brum.beds.ac.uk/10.3390/app9122460 - 16 Jun 2019
Cited by 2 | Viewed by 3974
Abstract
Recent studies in text-to-speech synthesis have shown the benefit of using a continuous pitch estimate; one that interpolates fundamental frequency (F0) even when voicing is not present. However, continuous F0 is still sensitive to additive noise in speech signals and suffers from short-term [...] Read more.
Recent studies in text-to-speech synthesis have shown the benefit of using a continuous pitch estimate; one that interpolates fundamental frequency (F0) even when voicing is not present. However, continuous F0 is still sensitive to additive noise in speech signals and suffers from short-term errors (when it changes rather quickly over time). To alleviate these issues, three adaptive techniques have been developed in this article for achieving a robust and accurate F0: (1) we weight the pitch estimates with state noise covariance using adaptive Kalman-filter framework, (2) we iteratively apply a time axis warping on the input frame signal, (3) we optimize all F0 candidates using an instantaneous-frequency-based approach. Additionally, the second goal of this study is to introduce an extension of a novel continuous-based speech synthesis system (i.e., in which all parameters are continuous). We propose adding a new excitation parameter named Harmonic-to-Noise Ratio (HNR) to the voiced and unvoiced components to indicate the degree of voicing in the excitation and to reduce the influence of buzziness caused by the vocoder. Results based on objective and perceptual tests demonstrate that the voice built with the proposed framework gives state-of-the-art speech synthesis performance while outperforming the previous baseline. Full article
(This article belongs to the Special Issue Sound and Music Computing -- Music and Interaction)
Show Figures

Figure 1

Other

Jump to: Research

32 pages, 301 KiB  
Meeting Report
16th Sound and Music Computing Conference SMC 2019 (28–31 May 2019, Malaga, Spain)
by Lorenzo J. Tardón, Isabel Barbancho, Ana M. Barbancho, Alberto Peinado, Stefania Serafin and Federico Avanzini
Appl. Sci. 2019, 9(12), 2492; https://0-doi-org.brum.beds.ac.uk/10.3390/app9122492 - 19 Jun 2019
Cited by 1 | Viewed by 3453
Abstract
The 16th Sound and Music Computing Conference (SMC 2019) took place in Malaga, Spain, 28–31 May 2019 and it was organized by the Application of Information and Communication Technologies Research group (ATIC) of the University of Malaga (UMA). The SMC 2019 [...] Read more.
The 16th Sound and Music Computing Conference (SMC 2019) took place in Malaga, Spain, 28–31 May 2019 and it was organized by the Application of Information and Communication Technologies Research group (ATIC) of the University of Malaga (UMA). The SMC 2019 associated Summer School took place 25–28 May 2019. The First International Day of Women in Inclusive Engineering, Sound and Music Computing Research (WiSMC 2019) took place on 28 May 2019. The SMC 2019 TOPICS OF INTEREST included a wide selection of topics related to acoustics, psychoacoustics, music, technology for music, audio analysis, musicology, sonification, music games, machine learning, serious games, immersive audio, sound synthesis, etc. Full article
(This article belongs to the Special Issue Sound and Music Computing -- Music and Interaction)
Back to TopTop