Research

16 pages, 1047 KiB

Open AccessArticle

Image-Based Radical Identification in Chinese Characters

by Yu Tzu Wu, Eric Fujiwara and Carlos Kenichi Suzuki

Appl. Sci. 2023, 13(4), 2163; https://0-doi-org.brum.beds.ac.uk/10.3390/app13042163 - 08 Feb 2023

Cited by 1 | Viewed by 4228

The Chinese writing system, known as hanzi or Han character, is fundamentally pictographic, composed of clusters of strokes. Nowadays, there are over 85,000 individual characters, making it difficult even for a native speaker to recognize the precise meaning of everything one reads. However, [...] Read more.

The Chinese writing system, known as hanzi or Han character, is fundamentally pictographic, composed of clusters of strokes. Nowadays, there are over 85,000 individual characters, making it difficult even for a native speaker to recognize the precise meaning of everything one reads. However, specific clusters of strokes known as indexing radicals provide the semantic information of the whole character or even of an entire family of characters, are golden features in entry indexing in dictionaries and are essential in learning the Chinese language as a first or second idiom. Therefore, this work aims to identify the indexing radical of a hanzi from a picture through a convolutional neural network model with two layers and 15 classes. The model was validated for three calligraphy styles and presented an average F-score of ∼95.7% to classify 15 radicals within the known styles. For unknown fonts, the F-score varied according to the overall calligraphy size, thickness, and stroke nature and reached ∼83.0% for the best scenario. Subsequently, the model was evaluated on five ancient Chinese poems with a random set of hanzi, resulting in average F-scores of ∼86.0% and ∼61.4% disregarding and regarding the unknown indexing radicals, respectively. Full article

(This article belongs to the Special Issue Machine Learning for Language and Signal Processing)

► Show Figures

Figure 1

17 pages, 7526 KiB

Open AccessArticle

Numerical Simulation of Adaptive Radial Basis NN-Based Non-Singular Fast Terminal Sliding Mode Control with Time Delay Estimator for Precise Control of Dual-Axis Manipulator

by Jim-Wei Wu, Wen-Shan Cen and Cheng-Chang Ho

Appl. Sci. 2022, 12(19), 9605; https://0-doi-org.brum.beds.ac.uk/10.3390/app12199605 - 24 Sep 2022

Cited by 1 | Viewed by 1104

Abstract

Robotic manipulators can reduce the cost of production and improve productivity; however, controlling a manipulator to follow a desired trajectory is a thorny problem. In this study, we introduced various forms of interference to facilitate the modeling of a dual-axis manipulator. The interference [...] Read more.

Robotic manipulators can reduce the cost of production and improve productivity; however, controlling a manipulator to follow a desired trajectory is a thorny problem. In this study, we introduced various forms of interference to facilitate the modeling of a dual-axis manipulator. The interference associated with the payload is handled by an adaptive radial basis neural network (ARBNN) controller, while other interference is estimated by a time delay estimator (TDE). The control signal is output by a non-singular fast terminal sliding mode controller (NFTSMC) to minimize further interference. Since the proposed controller can deal with the payload, system uncertainties, external disturbances, friction, and backlash, compared with conventional control methods, it has better tracking accuracy and stability. Full article

(This article belongs to the Special Issue Machine Learning for Language and Signal Processing)

► Show Figures

Figure 1

17 pages, 1754 KiB

Open AccessArticle

Transformer-Based Multilingual Speech Emotion Recognition Using Data Augmentation and Feature Fusion

by Badriyya B. Al-onazi, Muhammad Asif Nauman, Rashid Jahangir, Muhmmad Mohsin Malik, Eman H. Alkhammash and Ahmed M. Elshewey

Appl. Sci. 2022, 12(18), 9188; https://0-doi-org.brum.beds.ac.uk/10.3390/app12189188 - 14 Sep 2022

Cited by 19 | Viewed by 3258

Abstract

In recent years data science has been applied in a variety of real-life applications such as human-computer interaction applications, computer gaming, mobile services, and emotion evaluation. Among the wide range of applications, speech emotion recognition (SER) is also an emerging and challenging research [...] Read more.

In recent years data science has been applied in a variety of real-life applications such as human-computer interaction applications, computer gaming, mobile services, and emotion evaluation. Among the wide range of applications, speech emotion recognition (SER) is also an emerging and challenging research topic. For SER, recent studies used handcrafted features that provide the best results but failed to provide accuracy while applied in complex scenarios. Later, deep learning techniques were used for SER that automatically detect features from speech signals. Deep learning-based SER techniques overcome the issues of accuracy, yet there are still significant gaps in the reported methods. Studies using lightweight CNN failed to learn optimal features from composite acoustic signals. This study proposed a novel SER model to overcome the limitations mentioned earlier in this study. We focused on Arabic vocal emotions in particular because they received relatively little attention in research. The proposed model performs data augmentation before feature extraction. The 273 derived features were fed as input to the transformer model for emotion recognition. This model is applied to four datasets named BAVED, EMO-DB, SAVEE, and EMOVO. The experimental findings demonstrated the robust performance of the proposed model compared to existing techniques. The proposed SER model achieved 95.2%, 93.4%, 85.1%, and 91.7% accuracy on BAVED, EMO-DB, SAVEE, and EMOVO datasets respectively. The highest accuracy was obtained using BAVED dataset, indicating that the proposed model is well suited to Arabic vocal emotions. Full article

(This article belongs to the Special Issue Machine Learning for Language and Signal Processing)

► Show Figures

Figure 1

15 pages, 2344 KiB

Open AccessArticle

A Dual-Adaptive Approach Based on Discrete Cosine Transform for Removal of ECG Baseline Wander

by Chun-Chieh Lin, Pei-Chann Chang and Ping-Heng Tsai

Appl. Sci. 2022, 12(17), 8839; https://0-doi-org.brum.beds.ac.uk/10.3390/app12178839 - 02 Sep 2022

Cited by 2 | Viewed by 1366

Abstract

Removal of baseline wander (BW) is an important preprocessing step before manually or automatically interpreting electrocardiogram (ECG) records. It is a challenging issue to fully remove BW while preserving original clinical information because BW is usually mingled with low-frequency ECG components. A dual-adaptive [...] Read more.

Removal of baseline wander (BW) is an important preprocessing step before manually or automatically interpreting electrocardiogram (ECG) records. It is a challenging issue to fully remove BW while preserving original clinical information because BW is usually mingled with low-frequency ECG components. A dual-adaptive approach based on discrete cosine transform (DCT) is presented in this study. Firstly, the cardiac fundamental frequency (CFF) of ECGs is accurately calculated through DCT domain analysis. Secondly, DCT coefficients of ECGs, whose frequencies are below CFF, are used to construct an amplitude vector in which the optimal cut-point between BW and ECGs is distinctly reflected. Finally, a new filtering technique based on DCT is exploited to suppress BW with its cutoff frequency adjusted to the optimal cut-point. The proposed method is applied to both real ECG records and simulated ECGs with its results compared to those of three previous methods published in the literature. The experimental results show that substantial improvements in performance can be achieved when adopting this dual-adaptive approach. Full article

(This article belongs to the Special Issue Machine Learning for Language and Signal Processing)

► Show Figures

Figure 1

12 pages, 950 KiB

Open AccessArticle

Estimation of the Underlying F0 Range of a Speaker from the Spectral Features of a Brief Speech Input

by Wei Zhang, Yanlu Xie, Binghuai Lin, Liyuan Wang and Jinsong Zhang

Appl. Sci. 2022, 12(13), 6494; https://0-doi-org.brum.beds.ac.uk/10.3390/app12136494 - 27 Jun 2022

Viewed by 1269

Abstract

From a very brief speech, human listeners can estimate the pitch range of the speaker and normalize pitch perception. Spectral features which inherently involve both articulatory and phonatory characteristics were speculated to play roles in this process, but few were reported to directly [...] Read more.

From a very brief speech, human listeners can estimate the pitch range of the speaker and normalize pitch perception. Spectral features which inherently involve both articulatory and phonatory characteristics were speculated to play roles in this process, but few were reported to directly correlate with speaker’s F0 range. To mimic this human auditory capability and validate the speculation, in a preliminary study we proposed an LSTM-based method to estimate speaker’s F0 range from a 300 ms-long speech input, which turned out to outperform the conventional method. By two more experiments, this study further improved the method and verified its validity in estimating the speaker-specific underlying F0 range. After incorporating a novel measurement of F0 range and a multi-task training approach, Experiment 1 showed that the refined model gave more accurate estimates than the initial model. Based on a Japanese-Chinese bilingual parallel speech corpus, Experiment 2 found that the F0 ranges estimated with the model from the Chinese speech and the model from the Japanese speech produced by the same set of speakers had no significant difference, whereas the conventional method showed significant difference. The results indicate that the proposed spectrum-based method captures the speaker-specific underlying F0 range which is independent of the linguistic content. Full article

(This article belongs to the Special Issue Machine Learning for Language and Signal Processing)

► Show Figures

Figure 1

19 pages, 2557 KiB

Open AccessArticle

Classifying the Severity of Cyberbullying Incidents by Using a Hierarchical Squashing-Attention Network

by Jheng-Long Wu and Chiao-Yu Tang

Appl. Sci. 2022, 12(7), 3502; https://0-doi-org.brum.beds.ac.uk/10.3390/app12073502 - 30 Mar 2022

Cited by 6 | Viewed by 2795

Abstract

Cyberbullying has become more prevalent in online social media platforms. Natural language processing and machine learning techniques have been employed to develop automatic cyberbullying detection models, which are only designed for binary classification tasks that can only detect whether the text contains cyberbullying [...] Read more.

Cyberbullying has become more prevalent in online social media platforms. Natural language processing and machine learning techniques have been employed to develop automatic cyberbullying detection models, which are only designed for binary classification tasks that can only detect whether the text contains cyberbullying content. Cyberbullying severity is a critical factor that can provide organizations with valuable information for developing cyberbullying prevention strategies. This paper proposes a hierarchical squashing-attention network (HSAN) for classifying the severity of cyberbullying incidents. Therefore, the study aimed to (1) establish a Chinese-language cyberbullying severity dataset marked with three severity ratings (slight, medium, and serious) and (2) develop a new squashing-attention mechanism (SAM) of HSAN according to the squashing function, which uses vector length to estimate the weight of attention. Experiments indicated that the SAM could sufficiently analyze sentences to determine cyberbullying severity. The proposed HSAN model outperformed other machine-learning-based and deep-learning-based models in determining the severity of cyberbullying incidents. Full article

(This article belongs to the Special Issue Machine Learning for Language and Signal Processing)

► Show Figures

Figure 1

17 pages, 640 KiB

Open AccessArticle

Exploiting Diverse Information in Pre-Trained Language Model for Multi-Choice Machine Reading Comprehension

by Ziwei Bai, Junpeng Liu, Meiqi Wang, Caixia Yuan and Xiaojie Wang

Appl. Sci. 2022, 12(6), 3072; https://0-doi-org.brum.beds.ac.uk/10.3390/app12063072 - 17 Mar 2022

Cited by 2 | Viewed by 1375

Abstract

Answering different multi-choice machine reading comprehension (MRC) questions generally requires different information due to the abundant diversity of the questions, options and passages. Recently, pre-trained language models which provide rich information have been widely used to address MRC tasks. Most of the existing [...] Read more.

Answering different multi-choice machine reading comprehension (MRC) questions generally requires different information due to the abundant diversity of the questions, options and passages. Recently, pre-trained language models which provide rich information have been widely used to address MRC tasks. Most of the existing work only focuses on the output representation at the top layer of the models; the subtle and beneficial information provided by the intermediate layers is ignored. This paper therefore proposes a multi-decision based transformer model that builds multiple decision modules by utilizing the outputs at different layers to confront the various questions and passages. To avoid the information diversity in different layers being damaged during fine-tuning, we also propose a learning rate decaying method to control the updating speed of the parameters in different blocks. Experimental results on multiple publicly available datasets show that our model can answer different questions by utilizing the representation in different layers and speed up the inference procedure with considerable accuracy. Full article

(This article belongs to the Special Issue Machine Learning for Language and Signal Processing)

► Show Figures

Figure 1

26 pages, 2255 KiB

Open AccessArticle

AidIR: An Interactive Dialog System to Aid Disease Information Retrieval

by Da-Jinn Wang, Tsong-Yi Chen and Chia-Yi Su

Appl. Sci. 2022, 12(4), 1875; https://0-doi-org.brum.beds.ac.uk/10.3390/app12041875 - 11 Feb 2022

Viewed by 1816

Abstract

This paper proposes an interactive dialog system, called AidIR, to aid information retrieval. AidIR allows users to retrieve information on diseases resulting from coronaviruses and diseases transmitted by vector mosquitoes with natural language interaction and Line chat media. In a subjective evaluation, we [...] Read more.

This paper proposes an interactive dialog system, called AidIR, to aid information retrieval. AidIR allows users to retrieve information on diseases resulting from coronaviruses and diseases transmitted by vector mosquitoes with natural language interaction and Line chat media. In a subjective evaluation, we asked 20 users to rate the intuitiveness, usability, and user experience of AidIR with a range between −2 and 2. Moreover, we also asked these users to answer yes–no questions to evaluate AidIR and provide feedback. The average scores of intuitiveness, usability, and user experience are 0.8, 0.8, and 1.05, respectively. The yes–no questions demonstrated that AidIR is better than systems using the graphical user interface in mobile phones and single-turn dialog systems. According to user feedback, AidIR is more convenient for information retrieval. Moreover, we designed a new loss function to jointly train a BERT model for domain classification and sequence label tasks. The accuracy of both tasks is 92%. Finally, we trained the dialog policy network with supervised learning tasks and deployed the reinforcement learning algorithm to allow AidIR to continue learning the dialog policy. Full article

(This article belongs to the Special Issue Machine Learning for Language and Signal Processing)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Machine Learning for Language and Signal Processing

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Published Papers (8 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI