Next Article in Journal
Study on Mechanical and Acoustic Emission Characteristics of Backfill–Rock Instability under Different Stress Conditions
Previous Article in Journal
Calibration and Inter-Unit Consistency Assessment of an Electrochemical Sensor System Using Machine Learning
Previous Article in Special Issue
Portable Facial Expression System Based on EMG Sensors and Machine Learning Models
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Speech Emotion Recognition Incorporating Relative Difficulty and Labeling Reliability

School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Buk-gu, Gwangju 61005, Republic of Korea
*
Author to whom correspondence should be addressed.
Submission received: 3 June 2024 / Revised: 19 June 2024 / Accepted: 24 June 2024 / Published: 25 June 2024
(This article belongs to the Special Issue Sensors Applications on Emotion Recognition)

Abstract

Emotions in speech are expressed in various ways, and the speech emotion recognition (SER) model may perform poorly on unseen corpora that contain different emotional factors from those expressed in training databases. To construct an SER model robust to unseen corpora, regularization approaches or metric losses have been studied. In this paper, we propose an SER method that incorporates relative difficulty and labeling reliability of each training sample. Inspired by the Proxy-Anchor loss, we propose a novel loss function which gives higher gradients to the samples for which the emotion labels are more difficult to estimate among those in the given minibatch. Since the annotators may label the emotion based on the emotional expression which resides in the conversational context or other modality but is not apparent in the given speech utterance, some of the emotional labels may not be reliable and these unreliable labels may affect the proposed loss function more severely. In this regard, we propose to apply label smoothing for the samples misclassified by a pre-trained SER model. Experimental results showed that the performance of the SER on unseen corpora was improved by adopting the proposed loss function with label smoothing on the misclassified data.
Keywords: speech emotion recognition; out-of-corpus; generalization; relative difficulty; labeling reliability speech emotion recognition; out-of-corpus; generalization; relative difficulty; labeling reliability

Share and Cite

MDPI and ACS Style

Ahn, Y.; Han, S.; Lee, S.; Shin, J.W. Speech Emotion Recognition Incorporating Relative Difficulty and Labeling Reliability. Sensors 2024, 24, 4111. https://0-doi-org.brum.beds.ac.uk/10.3390/s24134111

AMA Style

Ahn Y, Han S, Lee S, Shin JW. Speech Emotion Recognition Incorporating Relative Difficulty and Labeling Reliability. Sensors. 2024; 24(13):4111. https://0-doi-org.brum.beds.ac.uk/10.3390/s24134111

Chicago/Turabian Style

Ahn, Youngdo, Sangwook Han, Seonggyu Lee, and Jong Won Shin. 2024. "Speech Emotion Recognition Incorporating Relative Difficulty and Labeling Reliability" Sensors 24, no. 13: 4111. https://0-doi-org.brum.beds.ac.uk/10.3390/s24134111

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop