sensors-logo

Journal Browser

Journal Browser

Sensing Systems for Sign Language Recognition

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Optical Sensors".

Deadline for manuscript submissions: closed (31 May 2022) | Viewed by 16178

Special Issue Editors


E-Mail Website
Guest Editor
AtlanTTic research center, University of Vigo, Campus Universitario, 36310 Vigo, Spain
Interests: computer vision; machine learning; deep learning; applied research

E-Mail Website
Guest Editor
Universitat de Barcelona and Computer Vision Center, Gran Via de les Corts Catalanes 585, 08007 Barcelona, Spain
Interests: computer vision; machine learning; deep learning; human behavior understanding

E-Mail Website
Guest Editor
National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences (CASIA), Beijing 100190, China
Interests: computer vision; machine learning

Special Issue Information

Dear Colleagues,

The automatic understanding of Sign Languages is a must to ease integration for millions of deaf people around the world. The last two decades have witnessed increasing research efforts to solve this problem. Many of these proposals were based on somehow intrusive sensors to capture the 3D rapid movements of arms, hands and fingers (data gloves, colored gloves, mocap, ultrasound, etc.), but, for the past ten years, RGB and depth sensors have been becoming the mainstream solution to simultaneously capturing the communicative channels related to hand movements, face expressivity and whole upper body movements. The latest advancements in human activity recognition from visual cues, rooted in highly efficient deep learning models, have also pushed research in Sign Language Recognition as a closely related application. We are in an excitement moment when it comes to pushing socially required applications to reduce communicative barriers.

This Special Issue seeks to bring together innovative research and development solutions in the area of Sign Language Recognition, using any kind of sensing devices. Comparative studies on different sensing devices are also very welcome. Authors are invited to submit original articles across the full development stack (hardware, system and software), including architectures, techniques and tools for sensing and modeling the complex movement details of signing and the proper decoding of sign sequences. This may include, but is not limited to, sensing modalities, innovative solutions for data collection, strategies for data augmentation, sensor fusion, spatio-temporal representation, computational reduction, model optimization for mobile devices, real-world applications, etc.

This topic is very well-fitted to the scope of the journal because using proper acquisition sensors allows trustful and discriminative information which is critical for sign language representation and, then, recognition. This journal also accepts the algorithmic processing of sensed signals, something that is rapidly evolving for SLR.

Dr. José Luis Alba Castro
Dr. Sergio Escalera
Prof. Jun Wan
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • sign language recognition
  • sign language production
  • action recognition
  • gesture recognition
  • RGB
  • depth
  • time-of-flight
  • multi-modality (RGB-D, RGB and IR, etc.)
  • data gloves
  • human pose estimation
  • deep learning
  • CNN
  • GCN
  • transformer
  • video datasets

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

17 pages, 1119 KiB  
Article
One Model is Not Enough: Ensembles for Isolated Sign Language Recognition
by Marek Hrúz, Ivan Gruber, Jakub Kanis, Matyáš Boháček, Miroslav Hlaváč and Zdeněk Krňoul
Sensors 2022, 22(13), 5043; https://0-doi-org.brum.beds.ac.uk/10.3390/s22135043 - 04 Jul 2022
Cited by 14 | Viewed by 2379
Abstract
In this paper, we dive into sign language recognition, focusing on the recognition of isolated signs. The task is defined as a classification problem, where a sequence of frames (i.e., images) is recognized as one of the given sign language glosses. We analyze [...] Read more.
In this paper, we dive into sign language recognition, focusing on the recognition of isolated signs. The task is defined as a classification problem, where a sequence of frames (i.e., images) is recognized as one of the given sign language glosses. We analyze two appearance-based approaches, I3D and TimeSformer, and one pose-based approach, SPOTER. The appearance-based approaches are trained on a few different data modalities, whereas the performance of SPOTER is evaluated on different types of preprocessing. All the methods are tested on two publicly available datasets: AUTSL and WLASL300. We experiment with ensemble techniques to achieve new state-of-the-art results of 73.84% accuracy on the WLASL300 dataset by using the CMA-ES optimization method to find the best ensemble weight parameters. Furthermore, we present an ensembling technique based on the Transformer model, which we call Neural Ensembler. Full article
(This article belongs to the Special Issue Sensing Systems for Sign Language Recognition)
Show Figures

Figure 1

12 pages, 5107 KiB  
Article
Multi-Scale Attention 3D Convolutional Network for Multimodal Gesture Recognition
by Huizhou Chen, Yunan Li, Huijuan Fang, Wentian Xin, Zixiang Lu and Qiguang Miao
Sensors 2022, 22(6), 2405; https://0-doi-org.brum.beds.ac.uk/10.3390/s22062405 - 21 Mar 2022
Cited by 13 | Viewed by 2775
Abstract
Gesture recognition is an important direction in computer vision research. Information from the hands is crucial in this task. However, current methods consistently achieve attention on hand regions based on estimated keypoints, which will significantly increase both time and complexity, and may lose [...] Read more.
Gesture recognition is an important direction in computer vision research. Information from the hands is crucial in this task. However, current methods consistently achieve attention on hand regions based on estimated keypoints, which will significantly increase both time and complexity, and may lose position information of the hand due to wrong keypoint estimations. Moreover, for dynamic gesture recognition, it is not enough to consider only the attention in the spatial dimension. This paper proposes a multi-scale attention 3D convolutional network for gesture recognition, with a fusion of multimodal data. The proposed network achieves attention mechanisms both locally and globally. The local attention leverages the hand information extracted by the hand detector to focus on the hand region, and reduces the interference of gesture-irrelevant factors. Global attention is achieved in both the human-posture context and the channel context through a dual spatiotemporal attention module. Furthermore, to make full use of the differences between different modalities of data, we designed a multimodal fusion scheme to fuse the features of RGB and depth data. The proposed method is evaluated using the Chalearn LAP Isolated Gesture Dataset and the Briareo Dataset. Experiments on these two datasets prove the effectiveness of our network and show it outperforms many state-of-the-art methods. Full article
(This article belongs to the Special Issue Sensing Systems for Sign Language Recognition)
Show Figures

Figure 1

28 pages, 9552 KiB  
Article
American Sign Language Words Recognition of Skeletal Videos Using Processed Video Driven Multi-Stacked Deep LSTM
by Sunusi Bala Abdullahi and Kosin Chamnongthai
Sensors 2022, 22(4), 1406; https://0-doi-org.brum.beds.ac.uk/10.3390/s22041406 - 11 Feb 2022
Cited by 19 | Viewed by 2651
Abstract
Complex hand gesture interactions among dynamic sign words may lead to misclassification, which affects the recognition accuracy of the ubiquitous sign language recognition system. This paper proposes to augment the feature vector of dynamic sign words with knowledge of hand dynamics as a [...] Read more.
Complex hand gesture interactions among dynamic sign words may lead to misclassification, which affects the recognition accuracy of the ubiquitous sign language recognition system. This paper proposes to augment the feature vector of dynamic sign words with knowledge of hand dynamics as a proxy and classify dynamic sign words using motion patterns based on the extracted feature vector. In this method, some double-hand dynamic sign words have ambiguous or similar features across a hand motion trajectory, which leads to classification errors. Thus, the similar/ambiguous hand motion trajectory is determined based on the approximation of a probability density function over a time frame. Then, the extracted features are enhanced by transformation using maximal information correlation. These enhanced features of 3D skeletal videos captured by a leap motion controller are fed as a state transition pattern to a classifier for sign word classification. To evaluate the performance of the proposed method, an experiment is performed with 10 participants on 40 double hands dynamic ASL words, which reveals 97.98% accuracy. The method is further developed on challenging ASL, SHREC, and LMDHG data sets and outperforms conventional methods by 1.47%, 1.56%, and 0.37%, respectively. Full article
(This article belongs to the Special Issue Sensing Systems for Sign Language Recognition)
Show Figures

Figure 1

18 pages, 5180 KiB  
Article
Bangla Sign Language (BdSL) Alphabets and Numerals Classification Using a Deep Learning Model
by Kanchon Kanti Podder, Muhammad E. H. Chowdhury, Anas M. Tahir, Zaid Bin Mahbub, Amith Khandakar, Md Shafayet Hossain and Muhammad Abdul Kadir
Sensors 2022, 22(2), 574; https://0-doi-org.brum.beds.ac.uk/10.3390/s22020574 - 12 Jan 2022
Cited by 27 | Viewed by 7113
Abstract
A real-time Bangla Sign Language interpreter can enable more than 200 k hearing and speech-impaired people to the mainstream workforce in Bangladesh. Bangla Sign Language (BdSL) recognition and detection is a challenging topic in computer vision and deep learning research because sign language [...] Read more.
A real-time Bangla Sign Language interpreter can enable more than 200 k hearing and speech-impaired people to the mainstream workforce in Bangladesh. Bangla Sign Language (BdSL) recognition and detection is a challenging topic in computer vision and deep learning research because sign language recognition accuracy may vary on the skin tone, hand orientation, and background. This research has used deep machine learning models for accurate and reliable BdSL Alphabets and Numerals using two well-suited and robust datasets. The dataset prepared in this study comprises of the largest image database for BdSL Alphabets and Numerals in order to reduce inter-class similarity while dealing with diverse image data, which comprises various backgrounds and skin tones. The papers compared classification with and without background images to determine the best working model for BdSL Alphabets and Numerals interpretation. The CNN model trained with the images that had a background was found to be more effective than without background. The hand detection portion in the segmentation approach must be more accurate in the hand detection process to boost the overall accuracy in the sign recognition. It was found that ResNet18 performed best with 99.99% accuracy, precision, F1 score, sensitivity, and 100% specificity, which outperforms the works in the literature for BdSL Alphabets and Numerals recognition. This dataset is made publicly available for researchers to support and encourage further research on Bangla Sign Language Interpretation so that the hearing and speech-impaired individuals can benefit from this research. Full article
(This article belongs to the Special Issue Sensing Systems for Sign Language Recognition)
Show Figures

Figure 1

Back to TopTop