sensors-logo

Journal Browser

Journal Browser

Multimodal Emotion Recognition in Artificial Intelligence

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Intelligent Sensors".

Deadline for manuscript submissions: closed (4 February 2022) | Viewed by 20944

Special Issue Editors


E-Mail Website
Guest Editor
Department of Mathematics and Computer Science, University of Perugia, 06123 Perugia, Italy
Interests: artificial intelligence; emotion recognition; learner behaviour modeling; semantic proximity measures; link prediction; deep learning algorithms
Special Issues, Collections and Topics in MDPI journals

E-Mail
Guest Editor
Dipartimento di Matematica e Informatica (DiMaI), University of Florence, Florence, Italy
Interests: artificial intelligence; e-learning; link prediction; complex networks
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Mathematics and Computer Science, University of Perugia, 06123 Perugia, Italy
Interests: online evolutionary algorithms; metaheuristic for combinatorial optimization; discrete differential evolution; semantic proximity measures; planning agents and complex network dynamics; emotion recognition
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Philosophy Department, Universitat Autònoma de Barcelona, 08193 Bellaterra (BCN), Spain
Interests: robot emotions; affective computing; computational cognitive science; human-robot interaction; philosophy of technology; Bayesian probability; blended cognition
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Advances in artificial intelligence demand multidisciplinary development of the whole field of affective computing and emotion recognition, which is becoming a key scenario for human–machine interaction, data mining systems, medical self-care, social network analysis, and social influence of multimodal communication.

This Special Issue aims to bring together researchers and practitioners toward stimulating cooperation and cross-fertilization between different communities focused on the research, development, and applications of emotion recognition, both through the use of emotional data and data arousing different types and levels of emotions.

Critical innovations are paving the way for innovative applications in the broad concept of sensors, ranging from personal data (e.g., wearable devices, crowd-sound data, speech, images, brain–computer interfaces) to data from professional sensors collected in labs (e.g., eye-tracking, medical data, MRI, balance boards).

Prof. Dr. Valentina Franzoni
Dr. Giulio Biondi
Prof. Dr. Alfredo Milani
Prof. Dr. Jordi Vallverdú
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • affective computing
  • emotion recognition
  • artificial intelligence
  • wearable sensors
  • brain–computer devices
  • crowd-sound emotions
  • speech emotions
  • text emotions
  • face recognition
  • social robots

Published Papers (5 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

18 pages, 519 KiB  
Article
Robust Multi-Scenario Speech-Based Emotion Recognition System
by Fangfang Zhu-Zhou, Roberto Gil-Pita, Joaquín García-Gómez and Manuel Rosa-Zurera
Sensors 2022, 22(6), 2343; https://0-doi-org.brum.beds.ac.uk/10.3390/s22062343 - 18 Mar 2022
Cited by 8 | Viewed by 1949
Abstract
Every human being experiences emotions daily, e.g., joy, sadness, fear, anger. These might be revealed through speech—words are often accompanied by our emotional states when we talk. Different acoustic emotional databases are freely available for solving the Emotional Speech Recognition (ESR) task. Unfortunately, [...] Read more.
Every human being experiences emotions daily, e.g., joy, sadness, fear, anger. These might be revealed through speech—words are often accompanied by our emotional states when we talk. Different acoustic emotional databases are freely available for solving the Emotional Speech Recognition (ESR) task. Unfortunately, many of them were generated under non-real-world conditions, i.e., actors played emotions, and recorded emotions were under fictitious circumstances where noise is non-existent. Another weakness in the design of emotion recognition systems is the scarcity of enough patterns in the available databases, causing generalization problems and leading to overfitting. This paper examines how different recording environmental elements impact system performance using a simple logistic regression algorithm. Specifically, we conducted experiments simulating different scenarios, using different levels of Gaussian white noise, real-world noise, and reverberation. The results from this research show a performance deterioration in all scenarios, increasing the error probability from 25.57% to 79.13% in the worst case. Additionally, a virtual enlargement method and a robust multi-scenario speech-based emotion recognition system are proposed. Our system’s average error probability of 34.57% is comparable to the best-case scenario with 31.55%. The findings support the prediction that simulated emotional speech databases do not offer sufficient closeness to real scenarios. Full article
(This article belongs to the Special Issue Multimodal Emotion Recognition in Artificial Intelligence)
Show Figures

Figure 1

17 pages, 3072 KiB  
Article
Emotional Speech Recognition Method Based on Word Transcription
by Gulmira Bekmanova, Banu Yergesh, Altynbek Sharipbay and Assel Mukanova
Sensors 2022, 22(5), 1937; https://0-doi-org.brum.beds.ac.uk/10.3390/s22051937 - 02 Mar 2022
Cited by 11 | Viewed by 3188
Abstract
The emotional speech recognition method presented in this article was applied to recognize the emotions of students during online exams in distance learning due to COVID-19. The purpose of this method is to recognize emotions in spoken speech through the knowledge base of [...] Read more.
The emotional speech recognition method presented in this article was applied to recognize the emotions of students during online exams in distance learning due to COVID-19. The purpose of this method is to recognize emotions in spoken speech through the knowledge base of emotionally charged words, which are stored as a code book. The method analyzes human speech for the presence of emotions. To assess the quality of the method, an experiment was conducted for 420 audio recordings. The accuracy of the proposed method is 79.7% for the Kazakh language. The method can be used for different languages and consists of the following tasks: capturing a signal, detecting speech in it, recognizing speech words in a simplified transcription, determining word boundaries, comparing a simplified transcription with a code book, and constructing a hypothesis about the degree of speech emotionality. In case of the presence of emotions, there occurs complete recognition of words and definitions of emotions in speech. The advantage of this method is the possibility of its widespread use since it is not demanding on computational resources. The described method can be applied when there is a need to recognize positive and negative emotions in a crowd, in public transport, schools, universities, etc. The experiment carried out has shown the effectiveness of this method. The results obtained will make it possible in the future to develop devices that begin to record and recognize a speech signal, for example, in the case of detecting negative emotions in sounding speech and, if necessary, transmitting a message about potential threats or riots. Full article
(This article belongs to the Special Issue Multimodal Emotion Recognition in Artificial Intelligence)
Show Figures

Figure 1

13 pages, 958 KiB  
Article
Lie to Me: Shield Your Emotions from Prying Software
by Alina Elena Baia, Giulio Biondi, Valentina Franzoni, Alfredo Milani and Valentina Poggioni
Sensors 2022, 22(3), 967; https://0-doi-org.brum.beds.ac.uk/10.3390/s22030967 - 26 Jan 2022
Cited by 6 | Viewed by 2558
Abstract
Deep learning approaches for facial Emotion Recognition (ER) obtain high accuracy on basic models, e.g., Ekman’s models, in the specific domain of facial emotional expressions. Thus, facial tracking of users’ emotions could be easily used against the right to privacy or for manipulative [...] Read more.
Deep learning approaches for facial Emotion Recognition (ER) obtain high accuracy on basic models, e.g., Ekman’s models, in the specific domain of facial emotional expressions. Thus, facial tracking of users’ emotions could be easily used against the right to privacy or for manipulative purposes. As recent studies have shown that deep learning models are susceptible to adversarial examples (images intentionally modified to fool a machine learning classifier) we propose to use them to preserve users’ privacy against ER. In this paper, we present a technique for generating Emotion Adversarial Attacks (EAAs). EAAs are performed applying well-known image filters inspired from Instagram, and a multi-objective evolutionary algorithm is used to determine the per-image best filters attacking combination. Experimental results on the well-known AffectNet dataset of facial expressions show that our approach successfully attacks emotion classifiers to protect user privacy. On the other hand, the quality of the images from the human perception point of view is maintained. Several experiments with different sequences of filters are run and show that the Attack Success Rate is very high, above 90% for every test. Full article
(This article belongs to the Special Issue Multimodal Emotion Recognition in Artificial Intelligence)
Show Figures

Figure 1

29 pages, 1759 KiB  
Article
Multimodal Emotion Recognition on RAVDESS Dataset Using Transfer Learning
by Cristina Luna-Jiménez, David Griol, Zoraida Callejas, Ricardo Kleinlein, Juan M. Montero and Fernando Fernández-Martínez
Sensors 2021, 21(22), 7665; https://0-doi-org.brum.beds.ac.uk/10.3390/s21227665 - 18 Nov 2021
Cited by 45 | Viewed by 7831
Abstract
Emotion Recognition is attracting the attention of the research community due to the multiple areas where it can be applied, such as in healthcare or in road safety systems. In this paper, we propose a multimodal emotion recognition system that relies on speech [...] Read more.
Emotion Recognition is attracting the attention of the research community due to the multiple areas where it can be applied, such as in healthcare or in road safety systems. In this paper, we propose a multimodal emotion recognition system that relies on speech and facial information. For the speech-based modality, we evaluated several transfer-learning techniques, more specifically, embedding extraction and Fine-Tuning. The best accuracy results were achieved when we fine-tuned the CNN-14 of the PANNs framework, confirming that the training was more robust when it did not start from scratch and the tasks were similar. Regarding the facial emotion recognizers, we propose a framework that consists of a pre-trained Spatial Transformer Network on saliency maps and facial images followed by a bi-LSTM with an attention mechanism. The error analysis reported that the frame-based systems could present some problems when they were used directly to solve a video-based task despite the domain adaptation, which opens a new line of research to discover new ways to correct this mismatch and take advantage of the embedded knowledge of these pre-trained models. Finally, from the combination of these two modalities with a late fusion strategy, we achieved 80.08% accuracy on the RAVDESS dataset on a subject-wise 5-CV evaluation, classifying eight emotions. The results revealed that these modalities carry relevant information to detect users’ emotional state and their combination enables improvement of system performance. Full article
(This article belongs to the Special Issue Multimodal Emotion Recognition in Artificial Intelligence)
Show Figures

Figure 1

20 pages, 1274 KiB  
Article
The Extensive Usage of the Facial Image Threshing Machine for Facial Emotion Recognition Performance
by Jung Hwan Kim, Alwin Poulose and Dong Seog Han
Sensors 2021, 21(6), 2026; https://0-doi-org.brum.beds.ac.uk/10.3390/s21062026 - 12 Mar 2021
Cited by 44 | Viewed by 4069
Abstract
Facial emotion recognition (FER) systems play a significant role in identifying driver emotions. Accurate facial emotion recognition of drivers in autonomous vehicles reduces road rage. However, training even the advanced FER model without proper datasets causes poor performance in real-time testing. FER system [...] Read more.
Facial emotion recognition (FER) systems play a significant role in identifying driver emotions. Accurate facial emotion recognition of drivers in autonomous vehicles reduces road rage. However, training even the advanced FER model without proper datasets causes poor performance in real-time testing. FER system performance is heavily affected by the quality of datasets than the quality of the algorithms. To improve FER system performance for autonomous vehicles, we propose a facial image threshing (FIT) machine that uses advanced features of pre-trained facial recognition and training from the Xception algorithm. The FIT machine involved removing irrelevant facial images, collecting facial images, correcting misplacing face data, and merging original datasets on a massive scale, in addition to the data-augmentation technique. The final FER results of the proposed method improved the validation accuracy by 16.95% over the conventional approach with the FER 2013 dataset. The confusion matrix evaluation based on the unseen private dataset shows a 5% improvement over the original approach with the FER 2013 dataset to confirm the real-time testing. Full article
(This article belongs to the Special Issue Multimodal Emotion Recognition in Artificial Intelligence)
Show Figures

Figure 1

Back to TopTop