Machine Learning for Language and Signal Processing

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (20 November 2022) | Viewed by 18490

Special Issue Editors


E-Mail Website
Guest Editor
Graduate Institute of Library & Information Science,National Taiwan Normal University, Taipei 10617, Taiwan
Interests: natural language processing; text mining; knowledge discovery; machine learning; information retrieval; automatic information organization and subject analysis; scientometrics; e-Learning; IT techniques for educational applications

E-Mail Website
Guest Editor
Department of Electrical Engineering, National Central University, Taoyuan 32001, Taiwan
Interests: natural language processing; biomedical and health informatics; artificial intelligence; machine learning; web information retrieval

Special Issue Information

Dear Colleagues,

Machine learning is a branch of artificial intelligence that focuses on the use of data and algorithms to simulate the way that humans learn. Through the use of statistical methods, learning models are trained to make classifications or predictions, subsequently driving the decision-making process within specific applications. Therefore, this Special Issue aims to present new ideas, original research results, and practical development experiences from all language and signal research areas, including natural language processing and digital signal processing. Relevant topics for this Special Issue include, but are not limited to, the following:

  • Audio and acoustic signal processing;
  • Biomedical signal processing;
  • Deep learning methods and applications;
  • Information retrieval and extraction;
  • Language resources and evaluation;
  • Multidimensional signal processing
  • Multimedia signal processing;
  • NLP and DSP applications;
  • Phonology, morphology, syntax, and semantics;
  • Question-answering, summarization, and machine translation;
  • Sentiment analysis, stylistic analysis, and argument mining;
  • Signal-processing theories and methods;
  • Speech perception, production, and acquisition;
  • Speech recognition, synthesis, and generation;
  • Spoken language retrieval, translation, and summarization.

Prof. Dr. Yuen-Hsien Tseng
Prof. Dr. Lung-Hao Lee
Prof. Dr. Po-Lei Lee
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • artificial intelligence
  • deep learning
  • neural networks
  • text mining
  • natural language processing
  • digital signal processing

Published Papers (8 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

16 pages, 1047 KiB  
Article
Image-Based Radical Identification in Chinese Characters
by Yu Tzu Wu, Eric Fujiwara and Carlos Kenichi Suzuki
Appl. Sci. 2023, 13(4), 2163; https://0-doi-org.brum.beds.ac.uk/10.3390/app13042163 - 08 Feb 2023
Cited by 1 | Viewed by 4228
Abstract
The Chinese writing system, known as hanzi or Han character, is fundamentally pictographic, composed of clusters of strokes. Nowadays, there are over 85,000 individual characters, making it difficult even for a native speaker to recognize the precise meaning of everything one reads. However, [...] Read more.
The Chinese writing system, known as hanzi or Han character, is fundamentally pictographic, composed of clusters of strokes. Nowadays, there are over 85,000 individual characters, making it difficult even for a native speaker to recognize the precise meaning of everything one reads. However, specific clusters of strokes known as indexing radicals provide the semantic information of the whole character or even of an entire family of characters, are golden features in entry indexing in dictionaries and are essential in learning the Chinese language as a first or second idiom. Therefore, this work aims to identify the indexing radical of a hanzi from a picture through a convolutional neural network model with two layers and 15 classes. The model was validated for three calligraphy styles and presented an average F-score of ∼95.7% to classify 15 radicals within the known styles. For unknown fonts, the F-score varied according to the overall calligraphy size, thickness, and stroke nature and reached ∼83.0% for the best scenario. Subsequently, the model was evaluated on five ancient Chinese poems with a random set of hanzi, resulting in average F-scores of ∼86.0% and ∼61.4% disregarding and regarding the unknown indexing radicals, respectively. Full article
(This article belongs to the Special Issue Machine Learning for Language and Signal Processing)
Show Figures

Figure 1

17 pages, 7526 KiB  
Article
Numerical Simulation of Adaptive Radial Basis NN-Based Non-Singular Fast Terminal Sliding Mode Control with Time Delay Estimator for Precise Control of Dual-Axis Manipulator
by Jim-Wei Wu, Wen-Shan Cen and Cheng-Chang Ho
Appl. Sci. 2022, 12(19), 9605; https://0-doi-org.brum.beds.ac.uk/10.3390/app12199605 - 24 Sep 2022
Cited by 1 | Viewed by 1104
Abstract
Robotic manipulators can reduce the cost of production and improve productivity; however, controlling a manipulator to follow a desired trajectory is a thorny problem. In this study, we introduced various forms of interference to facilitate the modeling of a dual-axis manipulator. The interference [...] Read more.
Robotic manipulators can reduce the cost of production and improve productivity; however, controlling a manipulator to follow a desired trajectory is a thorny problem. In this study, we introduced various forms of interference to facilitate the modeling of a dual-axis manipulator. The interference associated with the payload is handled by an adaptive radial basis neural network (ARBNN) controller, while other interference is estimated by a time delay estimator (TDE). The control signal is output by a non-singular fast terminal sliding mode controller (NFTSMC) to minimize further interference. Since the proposed controller can deal with the payload, system uncertainties, external disturbances, friction, and backlash, compared with conventional control methods, it has better tracking accuracy and stability. Full article
(This article belongs to the Special Issue Machine Learning for Language and Signal Processing)
Show Figures

Figure 1

17 pages, 1754 KiB  
Article
Transformer-Based Multilingual Speech Emotion Recognition Using Data Augmentation and Feature Fusion
by Badriyya B. Al-onazi, Muhammad Asif Nauman, Rashid Jahangir, Muhmmad Mohsin Malik, Eman H. Alkhammash and Ahmed M. Elshewey
Appl. Sci. 2022, 12(18), 9188; https://0-doi-org.brum.beds.ac.uk/10.3390/app12189188 - 14 Sep 2022
Cited by 19 | Viewed by 3258
Abstract
In recent years data science has been applied in a variety of real-life applications such as human-computer interaction applications, computer gaming, mobile services, and emotion evaluation. Among the wide range of applications, speech emotion recognition (SER) is also an emerging and challenging research [...] Read more.
In recent years data science has been applied in a variety of real-life applications such as human-computer interaction applications, computer gaming, mobile services, and emotion evaluation. Among the wide range of applications, speech emotion recognition (SER) is also an emerging and challenging research topic. For SER, recent studies used handcrafted features that provide the best results but failed to provide accuracy while applied in complex scenarios. Later, deep learning techniques were used for SER that automatically detect features from speech signals. Deep learning-based SER techniques overcome the issues of accuracy, yet there are still significant gaps in the reported methods. Studies using lightweight CNN failed to learn optimal features from composite acoustic signals. This study proposed a novel SER model to overcome the limitations mentioned earlier in this study. We focused on Arabic vocal emotions in particular because they received relatively little attention in research. The proposed model performs data augmentation before feature extraction. The 273 derived features were fed as input to the transformer model for emotion recognition. This model is applied to four datasets named BAVED, EMO-DB, SAVEE, and EMOVO. The experimental findings demonstrated the robust performance of the proposed model compared to existing techniques. The proposed SER model achieved 95.2%, 93.4%, 85.1%, and 91.7% accuracy on BAVED, EMO-DB, SAVEE, and EMOVO datasets respectively. The highest accuracy was obtained using BAVED dataset, indicating that the proposed model is well suited to Arabic vocal emotions. Full article
(This article belongs to the Special Issue Machine Learning for Language and Signal Processing)
Show Figures

Figure 1

15 pages, 2344 KiB  
Article
A Dual-Adaptive Approach Based on Discrete Cosine Transform for Removal of ECG Baseline Wander
by Chun-Chieh Lin, Pei-Chann Chang and Ping-Heng Tsai
Appl. Sci. 2022, 12(17), 8839; https://0-doi-org.brum.beds.ac.uk/10.3390/app12178839 - 02 Sep 2022
Cited by 2 | Viewed by 1366
Abstract
Removal of baseline wander (BW) is an important preprocessing step before manually or automatically interpreting electrocardiogram (ECG) records. It is a challenging issue to fully remove BW while preserving original clinical information because BW is usually mingled with low-frequency ECG components. A dual-adaptive [...] Read more.
Removal of baseline wander (BW) is an important preprocessing step before manually or automatically interpreting electrocardiogram (ECG) records. It is a challenging issue to fully remove BW while preserving original clinical information because BW is usually mingled with low-frequency ECG components. A dual-adaptive approach based on discrete cosine transform (DCT) is presented in this study. Firstly, the cardiac fundamental frequency (CFF) of ECGs is accurately calculated through DCT domain analysis. Secondly, DCT coefficients of ECGs, whose frequencies are below CFF, are used to construct an amplitude vector in which the optimal cut-point between BW and ECGs is distinctly reflected. Finally, a new filtering technique based on DCT is exploited to suppress BW with its cutoff frequency adjusted to the optimal cut-point. The proposed method is applied to both real ECG records and simulated ECGs with its results compared to those of three previous methods published in the literature. The experimental results show that substantial improvements in performance can be achieved when adopting this dual-adaptive approach. Full article
(This article belongs to the Special Issue Machine Learning for Language and Signal Processing)
Show Figures

Figure 1

12 pages, 950 KiB  
Article
Estimation of the Underlying F0 Range of a Speaker from the Spectral Features of a Brief Speech Input
by Wei Zhang, Yanlu Xie, Binghuai Lin, Liyuan Wang and Jinsong Zhang
Appl. Sci. 2022, 12(13), 6494; https://0-doi-org.brum.beds.ac.uk/10.3390/app12136494 - 27 Jun 2022
Viewed by 1269
Abstract
From a very brief speech, human listeners can estimate the pitch range of the speaker and normalize pitch perception. Spectral features which inherently involve both articulatory and phonatory characteristics were speculated to play roles in this process, but few were reported to directly [...] Read more.
From a very brief speech, human listeners can estimate the pitch range of the speaker and normalize pitch perception. Spectral features which inherently involve both articulatory and phonatory characteristics were speculated to play roles in this process, but few were reported to directly correlate with speaker’s F0 range. To mimic this human auditory capability and validate the speculation, in a preliminary study we proposed an LSTM-based method to estimate speaker’s F0 range from a 300 ms-long speech input, which turned out to outperform the conventional method. By two more experiments, this study further improved the method and verified its validity in estimating the speaker-specific underlying F0 range. After incorporating a novel measurement of F0 range and a multi-task training approach, Experiment 1 showed that the refined model gave more accurate estimates than the initial model. Based on a Japanese-Chinese bilingual parallel speech corpus, Experiment 2 found that the F0 ranges estimated with the model from the Chinese speech and the model from the Japanese speech produced by the same set of speakers had no significant difference, whereas the conventional method showed significant difference. The results indicate that the proposed spectrum-based method captures the speaker-specific underlying F0 range which is independent of the linguistic content. Full article
(This article belongs to the Special Issue Machine Learning for Language and Signal Processing)
Show Figures

Figure 1

19 pages, 2557 KiB  
Article
Classifying the Severity of Cyberbullying Incidents by Using a Hierarchical Squashing-Attention Network
by Jheng-Long Wu and Chiao-Yu Tang
Appl. Sci. 2022, 12(7), 3502; https://0-doi-org.brum.beds.ac.uk/10.3390/app12073502 - 30 Mar 2022
Cited by 6 | Viewed by 2795
Abstract
Cyberbullying has become more prevalent in online social media platforms. Natural language processing and machine learning techniques have been employed to develop automatic cyberbullying detection models, which are only designed for binary classification tasks that can only detect whether the text contains cyberbullying [...] Read more.
Cyberbullying has become more prevalent in online social media platforms. Natural language processing and machine learning techniques have been employed to develop automatic cyberbullying detection models, which are only designed for binary classification tasks that can only detect whether the text contains cyberbullying content. Cyberbullying severity is a critical factor that can provide organizations with valuable information for developing cyberbullying prevention strategies. This paper proposes a hierarchical squashing-attention network (HSAN) for classifying the severity of cyberbullying incidents. Therefore, the study aimed to (1) establish a Chinese-language cyberbullying severity dataset marked with three severity ratings (slight, medium, and serious) and (2) develop a new squashing-attention mechanism (SAM) of HSAN according to the squashing function, which uses vector length to estimate the weight of attention. Experiments indicated that the SAM could sufficiently analyze sentences to determine cyberbullying severity. The proposed HSAN model outperformed other machine-learning-based and deep-learning-based models in determining the severity of cyberbullying incidents. Full article
(This article belongs to the Special Issue Machine Learning for Language and Signal Processing)
Show Figures

Figure 1

17 pages, 640 KiB  
Article
Exploiting Diverse Information in Pre-Trained Language Model for Multi-Choice Machine Reading Comprehension
by Ziwei Bai, Junpeng Liu, Meiqi Wang, Caixia Yuan and Xiaojie Wang
Appl. Sci. 2022, 12(6), 3072; https://0-doi-org.brum.beds.ac.uk/10.3390/app12063072 - 17 Mar 2022
Cited by 2 | Viewed by 1375
Abstract
Answering different multi-choice machine reading comprehension (MRC) questions generally requires different information due to the abundant diversity of the questions, options and passages. Recently, pre-trained language models which provide rich information have been widely used to address MRC tasks. Most of the existing [...] Read more.
Answering different multi-choice machine reading comprehension (MRC) questions generally requires different information due to the abundant diversity of the questions, options and passages. Recently, pre-trained language models which provide rich information have been widely used to address MRC tasks. Most of the existing work only focuses on the output representation at the top layer of the models; the subtle and beneficial information provided by the intermediate layers is ignored. This paper therefore proposes a multi-decision based transformer model that builds multiple decision modules by utilizing the outputs at different layers to confront the various questions and passages. To avoid the information diversity in different layers being damaged during fine-tuning, we also propose a learning rate decaying method to control the updating speed of the parameters in different blocks. Experimental results on multiple publicly available datasets show that our model can answer different questions by utilizing the representation in different layers and speed up the inference procedure with considerable accuracy. Full article
(This article belongs to the Special Issue Machine Learning for Language and Signal Processing)
Show Figures

Figure 1

26 pages, 2255 KiB  
Article
AidIR: An Interactive Dialog System to Aid Disease Information Retrieval
by Da-Jinn Wang, Tsong-Yi Chen and Chia-Yi Su
Appl. Sci. 2022, 12(4), 1875; https://0-doi-org.brum.beds.ac.uk/10.3390/app12041875 - 11 Feb 2022
Viewed by 1816
Abstract
This paper proposes an interactive dialog system, called AidIR, to aid information retrieval. AidIR allows users to retrieve information on diseases resulting from coronaviruses and diseases transmitted by vector mosquitoes with natural language interaction and Line chat media. In a subjective evaluation, we [...] Read more.
This paper proposes an interactive dialog system, called AidIR, to aid information retrieval. AidIR allows users to retrieve information on diseases resulting from coronaviruses and diseases transmitted by vector mosquitoes with natural language interaction and Line chat media. In a subjective evaluation, we asked 20 users to rate the intuitiveness, usability, and user experience of AidIR with a range between −2 and 2. Moreover, we also asked these users to answer yes–no questions to evaluate AidIR and provide feedback. The average scores of intuitiveness, usability, and user experience are 0.8, 0.8, and 1.05, respectively. The yes–no questions demonstrated that AidIR is better than systems using the graphical user interface in mobile phones and single-turn dialog systems. According to user feedback, AidIR is more convenient for information retrieval. Moreover, we designed a new loss function to jointly train a BERT model for domain classification and sequence label tasks. The accuracy of both tasks is 92%. Finally, we trained the dialog policy network with supervised learning tasks and deployed the reinforcement learning algorithm to allow AidIR to continue learning the dialog policy. Full article
(This article belongs to the Special Issue Machine Learning for Language and Signal Processing)
Show Figures

Figure 1

Back to TopTop