Artificial Intelligence in Audio and Music

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (20 January 2023) | Viewed by 4193

Special Issue Editor

Laboratory for Informatics, Image and Interaction, Faculté des Sciences et Technologies, La Rochelle University, Bâtiment Pascal Avenue Michel Crépeau, CEDEX 1, 17042 La Rochelle, France
Interests: analysis / modeling / spectral synthesis of signals (images and sounds) for cultural heritage and creation (archeology and music); computer music; computer graphics

Special Issue Information

Dear Colleagues,

As you know, using a computer to analyze or synthesize music contents or sound signals is a vast multidisciplinary topic, having involved scientists and artists, researchers and musicians, programmers and composers, for decades.

Nowadays, artificial intelligence seems to be the new terminology for computer science applied to human concerns. Music is one of them, and the always increasing power of computers allows tremendous applications, many now used in the industry.

For example, it is now possible to extract in real time musical information (e.g., sound sources, musical structure, etc.) from music scores or audio signals, to classify, identify, or even transform existing contents. However, it is also possible to generate new music contents from existing corpora after having learned from them.

The aims can be scientific or artistic, academic or industrial, and the approaches can use classic algorithmics, machine learning, or even deep learning approaches.

For this Special Issue, we encourage all researchers in the computer music research area to propose recent developments where artificial intelligence plays a key role in the analysis, transformation, or synthesis of musical sound contents.

Prof. Dr. Sylvain Marchand
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • computer music
  • sound processing
  • music information retrieval
  • machine learning
  • deep learning

Published Papers (2 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

11 pages, 710 KiB  
Article
Locally Activated Gated Neural Network for Automatic Music Genre Classification
by Zhiwei Liu, Ting Bian and Minglai Yang
Appl. Sci. 2023, 13(8), 5010; https://0-doi-org.brum.beds.ac.uk/10.3390/app13085010 - 17 Apr 2023
Cited by 4 | Viewed by 1130
Abstract
Automatic music genre classification is a prevailing pattern recognition task, and many algorithms have been proposed for accurate classification. Considering that the genre of music is a very broad concept, even music within the same genre can have significant differences. The current methods [...] Read more.
Automatic music genre classification is a prevailing pattern recognition task, and many algorithms have been proposed for accurate classification. Considering that the genre of music is a very broad concept, even music within the same genre can have significant differences. The current methods have not paid attention to the characteristics of large intra-class differences. This paper presents a novel approach to address this issue, using a locally activated gated neural network (LGNet). By incorporating multiple locally activated multi-layer perceptrons and a gated routing network, LGNet adaptively employs different network layers as multi-learners to learn from music signals with diverse characteristics. Our experimental results demonstrate that LGNet significantly outperforms the existing methods for music genre classification, achieving a superior performance on the filtered GTZAN dataset. Full article
(This article belongs to the Special Issue Artificial Intelligence in Audio and Music)
Show Figures

Figure 1

26 pages, 1245 KiB  
Article
A Linear Memory CTC-Based Algorithm for Text-to-Voice Alignment of Very Long Audio Recordings
by Guillaume Doras, Yann Teytaut and Axel Roebel
Appl. Sci. 2023, 13(3), 1854; https://0-doi-org.brum.beds.ac.uk/10.3390/app13031854 - 31 Jan 2023
Cited by 2 | Viewed by 1831
Abstract
Synchronisation of a voice recording with the corresponding text is a common task in speech and music processing, and is used in many practical applications (automatic subtitling, audio indexing, etc.). A common approach derives a mid-level feature from the audio and finds its [...] Read more.
Synchronisation of a voice recording with the corresponding text is a common task in speech and music processing, and is used in many practical applications (automatic subtitling, audio indexing, etc.). A common approach derives a mid-level feature from the audio and finds its alignment to the text by means of maximizing a similarity measure via Dynamic Time Warping (DTW). Recently, a Connectionist Temporal Classification (CTC) approach was proposed that directly emits character probabilities and uses those to find the optimal text-to-voice alignment. While this method yields promising results, the memory complexity of the optimal alignment search remains quadratic in input lengths, limiting its application to relatively short recordings. In this work, we describe how recent improvements brought to the textbook DTW algorithm can be adapted to the CTC context to achieve linear memory complexity. We then detail our overall solution and demonstrate that it can align text to several hours of audio with a mean alignment error of 50 ms for speech, and 120 ms for singing voice, which corresponds to a median alignment error that is below 50 ms for both voice types. Finally, we evaluate its robustness to transcription errors and different languages. Full article
(This article belongs to the Special Issue Artificial Intelligence in Audio and Music)
Show Figures

Figure 1

Back to TopTop