Advances in Processing and Understanding of Music Signals

A special issue of Signals (ISSN 2624-6120).

Deadline for manuscript submissions: closed (15 January 2022) | Viewed by 18208

Special Issue Editors

Department of Electronic and Information Engineering, Tokyo University of Agriculture and Technology, Tokyo 184-8588, Japan
Interests: biomedical signal processing and machine learning for brain-computer interfaces; epilepsy; neuromusicology
Special Issues, Collections and Topics in MDPI journals
Graduate School of Information Science and Technology, The University of Tokyo, Tokyo 113-8654, Japan
Interests: speech synthesis; speech signal processing
Department of Engineering, University of Vic - Central University of Catalonia, 08500 Vic, Barcelona, Spain
Interests: biomedical signal processing; machine learning; deep learning; signal processing theory and methods; neurosciences
Special Issues, Collections and Topics in MDPI journals
Department of Intelligence Science and Technology (IST), Kyoto University, Kyoto 606-8501, Japan
Interests: statistical music analysis; statistical audio analysis; computational linguistics; statistical machine learning
Department of Information and Computing Sciences, Utrecht University, 3584 CS Utrecht, The Netherlands
Interests: biomedical signal processing; statistical signal processing; machine learning; multimodal signal processing; multimedia signal processing, multi-sensor signal processing; signal processing for human-computer interaction; body-computer interface
Kobalt Music Group, London EC4R 3TE, UK

Special Issue Information

Dear Colleagues, 

The primary purpose of this Special Issue is to provide an interdisciplinary forum for discussion on music associated with signal processing, computer science, physics, engineering, psychology, musicology, and neuroscience. So far, these topics have been presented and discussed separately by different communities. We welcome articles that unite multiple perspectives and/or views from distinct disciplines. As such, this Special Issue will cover the analysis (e.g., decomposition), the generation (e.g., speech-to-singing), and the experience (e.g., cognition) of music through manual, semi-automatic, and automated techniques (e.g, via machine learning). 

Scope: 

Topics of interest include, but are not limited to, the following: 

  • traditional musical dimensions such as rhythm, dynamics, melody, harmony, timbre, texture, and form;
  • subsymbolic and symbolic representations of music;
  • physically grounded computational models of music transmission;
  • analysis of psychological, affective, and cultural processes that influence the music experience;
  • music similarity calculation, including music retrieval and genre detection
  • physiological approach to understanding music cognition 

Moreover, we welcome contributions that describe work supporting research on music signal processing, such as dedicated software repositories and annotated databases.

Dr. Toshihisa Tanaka
Dr. Shinnosuke Takamichi
Prof. Dr. Jordi Solé-Casals
Prof. Dr. Kazuyoshi Yoshii
Prof. Dr. Egon L. van den Broek
Dr. Sertan Şentürk
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Signals is an international peer-reviewed open access quarterly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1000 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

25 pages, 3811 KiB  
Article
Blind Source Separation in Polyphonic Music Recordings Using Deep Neural Networks Trained via Policy Gradients
by Sören Schulze, Johannes Leuschner and Emily J. King
Signals 2021, 2(4), 637-661; https://0-doi-org.brum.beds.ac.uk/10.3390/signals2040039 - 07 Oct 2021
Cited by 3 | Viewed by 2284
Abstract
We propose a method for the blind separation of sounds of musical instruments in audio signals. We describe the individual tones via a parametric model, training a dictionary to capture the relative amplitudes of the harmonics. The model parameters are predicted via a [...] Read more.
We propose a method for the blind separation of sounds of musical instruments in audio signals. We describe the individual tones via a parametric model, training a dictionary to capture the relative amplitudes of the harmonics. The model parameters are predicted via a U-Net, which is a type of deep neural network. The network is trained without ground truth information, based on the difference between the model prediction and the individual time frames of the short-time Fourier transform. Since some of the model parameters do not yield a useful backpropagation gradient, we model them stochastically and employ the policy gradient instead. To provide phase information and account for inaccuracies in the dictionary-based representation, we also let the network output a direct prediction, which we then use to resynthesize the audio signals for the individual instruments. Due to the flexibility of the neural network, inharmonicity can be incorporated seamlessly and no preprocessing of the input spectra is required. Our algorithm yields high-quality separation results with particularly low interference on a variety of different audio samples, both acoustic and synthetic, provided that the sample contains enough data for the training and that the spectral characteristics of the musical instruments are sufficiently stable to be approximated by the dictionary. Full article
(This article belongs to the Special Issue Advances in Processing and Understanding of Music Signals)
Show Figures

Figure 1

19 pages, 10088 KiB  
Article
Global Structure-Aware Drum Transcription Based on Self-Attention Mechanisms
by Ryoto Ishizuka, Ryo Nishikimi and Kazuyoshi Yoshii
Signals 2021, 2(3), 508-526; https://0-doi-org.brum.beds.ac.uk/10.3390/signals2030031 - 13 Aug 2021
Cited by 3 | Viewed by 2393
Abstract
This paper describes an automatic drum transcription (ADT) method that directly estimates a tatum-level drum score from a music signal in contrast to most conventional ADT methods that estimate the frame-level onset probabilities of drums. To estimate a tatum-level score, we propose a [...] Read more.
This paper describes an automatic drum transcription (ADT) method that directly estimates a tatum-level drum score from a music signal in contrast to most conventional ADT methods that estimate the frame-level onset probabilities of drums. To estimate a tatum-level score, we propose a deep transcription model that consists of a frame-level encoder for extracting the latent features from a music signal and a tatum-level decoder for estimating a drum score from the latent features pooled at the tatum level. To capture the global repetitive structure of drum scores, which is difficult to learn with a recurrent neural network (RNN), we introduce a self-attention mechanism with tatum-synchronous positional encoding into the decoder. To mitigate the difficulty of training the self-attention-based model from an insufficient amount of paired data and to improve the musical naturalness of the estimated scores, we propose a regularized training method that uses a global structure-aware masked language (score) model with a self-attention mechanism pretrained from an extensive collection of drum scores. The experimental results showed that the proposed regularized model outperformed the conventional RNN-based model in terms of the tatum-level error rate and the frame-level F-measure, even when only a limited amount of paired data was available so that the non-regularized model underperformed the RNN-based model. Full article
(This article belongs to the Special Issue Advances in Processing and Understanding of Music Signals)
Show Figures

Figure 1

17 pages, 1601 KiB  
Article
Efficient Retrieval of Music Recordings Using Graph-Based Index Structures
by Frank Zalkow, Julian Brandner and Meinard Müller
Signals 2021, 2(2), 336-352; https://0-doi-org.brum.beds.ac.uk/10.3390/signals2020021 - 17 May 2021
Cited by 1 | Viewed by 7801
Abstract
Flexible retrieval systems are required for conveniently browsing through large music collections. In a particular content-based music retrieval scenario, the user provides a query audio snippet, and the retrieval system returns music recordings from the collection that are similar to the query. In [...] Read more.
Flexible retrieval systems are required for conveniently browsing through large music collections. In a particular content-based music retrieval scenario, the user provides a query audio snippet, and the retrieval system returns music recordings from the collection that are similar to the query. In this scenario, a fast response from the system is essential for a positive user experience. For realizing low response times, one requires index structures that facilitate efficient search operations. One such index structure is the K-d tree, which has already been used in music retrieval systems. As an alternative, we propose to use a modern graph-based index, denoted as Hierarchical Navigable Small World (HNSW) graph. As our main contribution, we explore its potential in the context of a cross-version music retrieval application. In particular, we report on systematic experiments comparing graph- and tree-based index structures in terms of the retrieval quality, disk space requirements, and runtimes. Despite the fact that the HNSW index provides only an approximate solution to the nearest neighbor search problem, we demonstrate that it has almost no negative impact on the retrieval quality in our application. As our main result, we show that the HNSW-based retrieval is several orders of magnitude faster. Furthermore, the graph structure also works well with high-dimensional index items, unlike the tree-based structure. Given these merits, we highlight the practical relevance of the HNSW graph for music information retrieval (MIR) applications. Full article
(This article belongs to the Special Issue Advances in Processing and Understanding of Music Signals)
Show Figures

Figure 1

Review

Jump to: Research

41 pages, 4626 KiB  
Review
An Educational Guide through the FMP Notebooks for Teaching and Learning Fundamentals of Music Processing
by Meinard Müller
Signals 2021, 2(2), 245-285; https://0-doi-org.brum.beds.ac.uk/10.3390/signals2020018 - 30 Apr 2021
Cited by 7 | Viewed by 4283
Abstract
This paper provides a guide through the FMP notebooks, a comprehensive collection of educational material for teaching and learning fundamentals of music processing (FMP) with a particular focus on the audio domain. Organized in nine parts that consist of more than 100 individual [...] Read more.
This paper provides a guide through the FMP notebooks, a comprehensive collection of educational material for teaching and learning fundamentals of music processing (FMP) with a particular focus on the audio domain. Organized in nine parts that consist of more than 100 individual notebooks, this collection discusses well-established topics in music information retrieval (MIR) such as beat tracking, chord recognition, music synchronization, audio fingerprinting, music segmentation, and source separation, to name a few. These MIR tasks provide motivating and tangible examples that students can hold onto when studying technical aspects in signal processing, information retrieval, or pattern analysis. The FMP notebooks comprise detailed textbook-like explanations of central techniques and algorithms combined with Python code examples that illustrate how to implement the methods. All components, including the introductions of MIR scenarios, illustrations, sound examples, technical concepts, mathematical details, and code examples, are integrated into a unified framework based on Jupyter notebooks. Providing a platform with many baseline implementations, the FMP notebooks are suited for conducting experiments and generating educational material for lectures, thus addressing students, teachers, and researchers. While giving a guide through the notebooks, this paper’s objective is to yield concrete examples on how to use the FMP notebooks to create an enriching, interactive, and interdisciplinary supplement for studies in science, technology, engineering, and mathematics. The FMP notebooks (including HTML exports) are publicly accessible under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Full article
(This article belongs to the Special Issue Advances in Processing and Understanding of Music Signals)
Show Figures

Figure 1

Back to TopTop