Submit to Applied Sciences Review for Applied Sciences Propose a Special Issue

Journal Browser

Application and Theory of Multimedia Signal Processing Using Machine Learning or Advanced Methods

Print Special Issue Flyer
Special Issue Editors
Special Issue Information
Keywords
Published Papers

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (30 December 2021) | Viewed by 28886

Printed Edition Available!
A printed edition of this Special Issue is available here.

Special issue Application and Theory of Multimedia Signal Processing Using Machine Learning or Advanced Methods book cover image

Share This Special Issue

Special Issue Editor

Prof. Dr. Cheonshik Kim

E-Mail Website
Guest Editor

Department of Computer Engineering, Sejong University, Seoul 05006, Korea
Interests: multimedia security; multimedia signal processing; image compression; watermark technology; steganography; multimedia database
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Data mining uses algorithms to identify and predict useful patterns from data. Although it has found success in many areas, the results of multimedia mining are not satisfactory. Multimedia data mining extracts relevant data from multimedia files, such as audio, video, and still images, to perform similar searches, identify associations, and perform entity identification and classification. Deep learning technology emerged as a new breakthrough in the fields of data mining and AI, and has proven useful in both data analysis and application. In addition, deep learning technology has made great progress in the area of multimedia. Deep learning is a field of machine learning that is applied in smart phones for face recognition and voice commands. Meanwhile, deep learning technology contributes to the development of algorithms for the safety and security of multimedia data and the development of new applications.

This Special Issue will share the achievements of key researchers and practitioners in academia, as well as in the industry, dealing with a wide range of theoretical and applied problems in the field of multimedia. Authors are encouraged to submit contributions in any related areas.

Prof. Dr. Cheonshik Kim
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

image and video application
speech recognition
information security
pattern recognition
human interaction
biometric recognition

Published Papers (11 papers)

Download All Papers

Editorial

Jump to: Research

3 pages, 162 KiB

Open AccessEditorial

Application and Theory of Multimedia Signal Processing Using Machine Learning or Advanced Methods

by Cheonshik Kim

Appl. Sci. 2022, 12(13), 6426; https://0-doi-org.brum.beds.ac.uk/10.3390/app12136426 - 24 Jun 2022

Viewed by 919

Abstract

Machine learning (ML) uses algorithms to identify and predict useful patterns from data [...] Full article

(This article belongs to the Special Issue Application and Theory of Multimedia Signal Processing Using Machine Learning or Advanced Methods)

Research

Jump to: Editorial

17 pages, 6046 KiB

Open AccessArticle

A Pipeline Approach to Context-Aware Handwritten Text Recognition

by Yee Fan Tan, Tee Connie, Michael Kah Ong Goh and Andrew Beng Jin Teoh

Appl. Sci. 2022, 12(4), 1870; https://0-doi-org.brum.beds.ac.uk/10.3390/app12041870 - 11 Feb 2022

Cited by 8 | Viewed by 5615

Abstract

Despite concerted efforts towards handwritten text recognition, the automatic location and transcription of handwritten text remain a challenging task. Text detection and segmentation methods are often prone to errors, affecting the accuracy of the subsequent recognition procedure. In this paper, a pipeline that locates texts on a page and recognizes the text types, as well as the context of the texts within the detected region, is proposed. Clinical receipts are used as the subject of study. The proposed model is comprised of an object detection neural network that extracts text sequences present on the page regardless of size, orientation, and type (handwritten text, printed text, or non-text). After that, the text sequences are fed to a Residual Network with a Transformer (ResNet-101T) model to perform transcription. Next, the transcribed text sequences are analyzed using a Named Entity Recognition (NER) model to classify the text sequences into their corresponding contexts (e.g., name, address, prescription, and bill amount). In the proposed pipeline, all the processes are implicitly learned from data. Experiments performed on 500 self-collected clinical receipts containing 15,297 text segments reported a character error rate (CER) and word error rate (WER) of 7.77% and 10.77%, respectively. Full article

(This article belongs to the Special Issue Application and Theory of Multimedia Signal Processing Using Machine Learning or Advanced Methods)

► Show Figures

Figure 1

14 pages, 3519 KiB

Open AccessArticle

Empirical Evaluation on Utilizing CNN-Features for Seismic Patch Classification

by Chunxia Zhang, Xiaoli Wei and Sang-Woon Kim

Appl. Sci. 2022, 12(1), 197; https://0-doi-org.brum.beds.ac.uk/10.3390/app12010197 - 25 Dec 2021

Cited by 2 | Viewed by 1814

Abstract

This paper empirically evaluates two kinds of features, which are extracted, respectively, with traditional statistical methods and convolutional neural networks (CNNs), in order to improve the performance of seismic patch image classification. In the latter case, feature vectors, named “CNN-features”, were extracted from one trained CNN model, and were then used to learn existing classifiers, such as support vector machines. In this case, to learn the CNN model, a technique of transfer learning using synthetic seismic patch data in the source domain, and real-world patch data in the target domain, was applied. The experimental results show that CNN-features lead to some improvements in the classification performance. By analyzing the data complexity measures, the CNN-features are found to have the strongest discriminant capabilities. Furthermore, the transfer learning technique alleviates the problems of long processing times and the lack of learning data. Full article

(This article belongs to the Special Issue Application and Theory of Multimedia Signal Processing Using Machine Learning or Advanced Methods)

► Show Figures

Figure 1

23 pages, 3538 KiB

Open AccessArticle

A Study of Features and Deep Neural Network Architectures and Hyper-Parameters for Domestic Audio Classification

by Abigail Copiaco, Christian Ritz, Nidhal Abdulaziz and Stefano Fasciani

Appl. Sci. 2021, 11(11), 4880; https://0-doi-org.brum.beds.ac.uk/10.3390/app11114880 - 26 May 2021

Cited by 11 | Viewed by 3704

Abstract

Recent methodologies for audio classification frequently involve cepstral and spectral features, applied to single channel recordings of acoustic scenes and events. Further, the concept of transfer learning has been widely used over the years, and has proven to provide an efficient alternative to training neural networks from scratch. The lower time and resource requirements when using pre-trained models allows for more versatility in developing system classification approaches. However, information on classification performance when using different features for multi-channel recordings is often limited. Furthermore, pre-trained networks are initially trained on bigger databases and are often unnecessarily large. This poses a challenge when developing systems for devices with limited computational resources, such as mobile or embedded devices. This paper presents a detailed study of the most apparent and widely-used cepstral and spectral features for multi-channel audio applications. Accordingly, we propose the use of spectro-temporal features. Additionally, the paper details the development of a compact version of the AlexNet model for computationally-limited platforms through studies of performances against various architectural and parameter modifications of the original network. The aim is to minimize the network size while maintaining the series network architecture and preserving the classification accuracy. Considering that other state-of-the-art compact networks present complex directed acyclic graphs, a series architecture proposes an advantage in customizability. Experimentation was carried out through Matlab, using a database that we have generated for this task, which composes of four-channel synthetic recordings of both sound events and scenes. The top performing methodology resulted in a weighted F1-score of 87.92% for scalogram features classified via the modified AlexNet-33 network, which has a size of 14.33 MB. The AlexNet network returned 86.24% at a size of 222.71 MB. Full article

(This article belongs to the Special Issue Application and Theory of Multimedia Signal Processing Using Machine Learning or Advanced Methods)

► Show Figures

Figure 1

24 pages, 5076 KiB

Open AccessArticle

Power Allocation for Secrecy-Capacity-Optimization-Artificial-Noise Secure MIMO Precoding Systems under Perfect and Imperfect Channel State Information

by Yebo Gu, Bowen Huang and Zhilu Wu

Appl. Sci. 2021, 11(10), 4558; https://0-doi-org.brum.beds.ac.uk/10.3390/app11104558 - 17 May 2021

Cited by 2 | Viewed by 1537

Abstract

In this paper, we consider the physical layer security problem of the wireless communication system. For the multiple-input, multiple-output (MIMO) wireless communication system, secrecy capacity optimization artificial noise (SCO−AN) is introduced and studied. Unlike its traditional counterpart, SCO−AN is an artificial noise located in the range space of the channel state information space and thus results in a significant increase in the secrecy capacity. Due to the limitation of transmission power, making rational use of this power is crucial to effectively increase the secrecy capacity. Hence, in this paper, the objective function of transmission power allocation is constructed. We also consider the imperfect channel estimation in the power allocation problems. In traditional AN research conducted in the past, the expression of the imperfect channel estimation effect was left unknown. Still, the extent to which the channel estimation error impacts the accuracy of secrecy capacity computation is not negligible. We derive the expression of channel estimation error for least square (LS) and minimum mean squared error (MMSE) channel estimation. The objective function for transmission power allocation is non-convex. That is, the traditional gradient method cannot be used to solve this non-convex optimization problem of power allocation. An improved sequence quadratic program (ISQP) is therefore applied to solve this optimization problem. The numerical result shows that the ISQP is better than other algorithms, and the power allocation as derived from ISQP significantly increases secrecy capacity. Full article

(This article belongs to the Special Issue Application and Theory of Multimedia Signal Processing Using Machine Learning or Advanced Methods)

► Show Figures

Figure 1

14 pages, 1834 KiB

Open AccessArticle

You Only Look Once, But Compute Twice: Service Function Chaining for Low-Latency Object Detection in Softwarized Networks

by Zuo Xiang, Patrick Seeling and Frank H. P. Fitzek

Appl. Sci. 2021, 11(5), 2177; https://0-doi-org.brum.beds.ac.uk/10.3390/app11052177 - 02 Mar 2021

Cited by 10 | Viewed by 2666

Abstract

With increasing numbers of computer vision and object detection application scenarios, those requiring ultra-low service latency times have become increasingly prominent; e.g., those for autonomous and connected vehicles or smart city applications. The incorporation of machine learning through the applications of trained models in these scenarios can pose a computational challenge. The softwarization of networks provides opportunities to incorporate computing into the network, increasing flexibility by distributing workloads through offloading from client and edge nodes over in-network nodes to servers. In this article, we present an example for splitting the inference component of the YOLOv2 trained machine learning model between client, network, and service side processing to reduce the overall service latency. Assuming a client has 20% of the server computational resources, we observe a more than 12-fold reduction of service latency when incorporating our service split compared to on-client processing and and an increase in speed of more than 25% compared to performing everything on the server. Our approach is not only applicable to object detection, but can also be applied in a broad variety of machine learning-based applications and services. Full article

(This article belongs to the Special Issue Application and Theory of Multimedia Signal Processing Using Machine Learning or Advanced Methods)

► Show Figures

Figure 1

19 pages, 3159 KiB

Open AccessArticle

Foreground Objects Detection by U-Net with Multiple Difference Images

by Jae-Yeul Kim and Jong-Eun Ha

Appl. Sci. 2021, 11(4), 1807; https://0-doi-org.brum.beds.ac.uk/10.3390/app11041807 - 18 Feb 2021

Cited by 6 | Viewed by 2897

Abstract

In video surveillance, robust detection of foreground objects is usually done by subtracting a background model from the current image. Most traditional approaches use a statistical method to model the background image. Recently, deep learning has also been widely used to detect foreground objects in video surveillance. It shows dramatic improvement compared to the traditional approaches. It is trained through supervised learning, which requires training samples with pixel-level assignment. It requires a huge amount of time and is high cost, while traditional algorithms operate unsupervised and do not require training samples. Additionally, deep learning-based algorithms lack generalization power. They operate well on scenes that are similar to the training conditions, but they do not operate well on scenes that deviate from the training conditions. In this paper, we present a new method to detect foreground objects in video surveillance using multiple difference images as the input of convolutional neural networks, which guarantees improved generalization power compared to current deep learning-based methods. First, we adjust U-Net to use multiple difference images as input. Second, we show that training using all scenes in the CDnet 2014 dataset can improve the generalization power. Hyper-parameters such as the number of difference images and the interval between images in difference image computation are chosen by analyzing experimental results. We demonstrate that the proposed algorithm achieves improved performance in scenes that are not used in training compared to state-of-the-art deep learning and traditional unsupervised algorithms. Diverse experiments using various open datasets and real images show the feasibility of the proposed method. Full article

(This article belongs to the Special Issue Application and Theory of Multimedia Signal Processing Using Machine Learning or Advanced Methods)

► Show Figures

Figure 1

14 pages, 1438 KiB

Open AccessArticle

Constrained Backtracking Matching Pursuit Algorithm for Image Reconstruction in Compressed Sensing

by Xue Bi, Lu Leng, Cheonshik Kim, Xinwen Liu, Yajun Du and Feng Liu

Appl. Sci. 2021, 11(4), 1435; https://0-doi-org.brum.beds.ac.uk/10.3390/app11041435 - 05 Feb 2021

Cited by 13 | Viewed by 1807

Abstract

Image reconstruction based on sparse constraints is an important research topic in compressed sensing. Sparsity adaptive matching pursuit (SAMP) is a greedy pursuit reconstruction algorithm, which reconstructs signals without prior information of the sparsity level and potentially presents better reconstruction performance than other greedy pursuit algorithms. However, SAMP still suffers from being sensitive to the step size selection at high sub-sampling ratios. To solve this problem, this paper proposes a constrained backtracking matching pursuit (CBMP) algorithm for image reconstruction. The composite strategy, including two kinds of constraints, effectively controls the increment of the estimated sparsity level at different stages and accurately estimates the true support set of images. Based on the relationship analysis between the signal and measurement, an energy criterion is also proposed as a constraint. At the same time, the four-to-one rule is improved as an extra constraint. Comprehensive experimental results demonstrate that the proposed CBMP yields better performance and further stability than other greedy pursuit algorithms for image reconstruction. Full article

(This article belongs to the Special Issue Application and Theory of Multimedia Signal Processing Using Machine Learning or Advanced Methods)

► Show Figures

Figure 1

21 pages, 4080 KiB

Open AccessArticle

Self-Embedding Fragile Watermarking Scheme to Detect Image Tampering Using AMBTC and OPAP Approaches

by Cheonshik Kim and Ching-Nung Yang

Appl. Sci. 2021, 11(3), 1146; https://0-doi-org.brum.beds.ac.uk/10.3390/app11031146 - 27 Jan 2021

Cited by 25 | Viewed by 2242

Abstract

Research on self-embedding watermarks is being actively conducted to solve personal privacy and copyright problems by image attack. In this paper, we propose a self-embedded watermarking technique based on Absolute Moment Block Truncation Coding (AMBTC) for reconstructing tampered images by cropping attacks and forgery. AMBTC is suitable as a recovery bit (watermark) for the tampered image. This is because AMBTC has excellent compression performance and image quality. Moreover, to improve the quality of the marked image, the Optimal Pixel Adjustment Process (OPAP) method is used in the process of hiding AMBTC in the cover image. To find a damaged block in a marked image, the authentication data along with the watermark must be hidden in the block. We employ a checksum for authentication. The watermark is embedded in the pixels of the cover image using 3LSB and 2LSB, and the checksum is hidden in the LSB. Through the recovering procedure, it is possible to recover the original marked image from the tampered marked image. In addition, when the tampering ratio was 45%, the image (Lena) could be recovered at 36 dB. The proposed self-embedding method was verified through an experiment, and the result was the recovered image showed superior perceptual quality compared to the previous methods. Full article

(This article belongs to the Special Issue Application and Theory of Multimedia Signal Processing Using Machine Learning or Advanced Methods)

► Show Figures

Figure 1

14 pages, 336 KiB

Open AccessArticle

Practical Inner Product Encryption with Constant Private Key

by Yi-Fan Tseng, Zi-Yuan Liu and Raylin Tso

Appl. Sci. 2020, 10(23), 8669; https://0-doi-org.brum.beds.ac.uk/10.3390/app10238669 - 03 Dec 2020

Cited by 5 | Viewed by 1979

Abstract

Inner product encryption, first introduced by Katz et al., is a type of predicate encryption in which a ciphertext and a private key correspond to an attribute vector and a predicate vector, respectively. Only if the attribute and predicate vectors satisfy the inner product predicate will the decryption in this scheme be correct. In addition, the ability to use inner product encryption as an underlying building block to construct other useful cryptographic primitives has been demonstrated in the context of anonymous identity-based encryption and hidden vector encryption. However, the computing cost and communication cost of performing inner product encryption are very high at present. To resolve this problem, we introduce an efficient inner product encryption approach in this work. Specifically, the size of the private key is only one

G

element and one

Z_{p}

element, and decryption requires only one pairing computation. The formal security proof and implementation result are also demonstrated. Compared with other state-of-the-art schemes, our scheme is the most efficient in terms of the number of pairing computations for decryption and the private key length. Full article

(This article belongs to the Special Issue Application and Theory of Multimedia Signal Processing Using Machine Learning or Advanced Methods)

18 pages, 2582 KiB

Open AccessArticle

Hybrid Data Hiding Based on AMBTC Using Enhanced Hamming Code

by Cheonshik Kim, Dong-Kyoo Shin, Ching-Nung Yang and Lu Leng

Appl. Sci. 2020, 10(15), 5336; https://0-doi-org.brum.beds.ac.uk/10.3390/app10155336 - 02 Aug 2020

Cited by 25 | Viewed by 2436

Abstract

The image-based data hiding method is a technology used to transmit confidential information secretly. Since images (e.g., grayscale images) usually have sufficient redundancy information, they are a very suitable medium for hiding data. Absolute Moment Block Truncation Coding (AMBTC) is one of several compression methods and is appropriate for embedding data due to its very low complexity and acceptable distortion. However, since there is not enough redundant data compared to grayscale images, the research to embed data in the compressed image is a very challenging topic. That is the motivation and challenge of this research. Meanwhile, the Hamming codes are used to embed secret bits, as well as a block code that can detect up to two simultaneous bit errors and correct single bit errors. In this paper, we propose an effective data hiding method for two quantization levels of each block of AMBTC using Hamming codes. Bai and Chang introduced a method of applying Hamming (7,4) to two quantization levels; however, the scheme is ineffective, and the image distortion error is relatively large. To solve the problem with the image distortion errors, this paper introduces a way of optimizing codewords and reducing pixel distortion by utilizing Hamming (7,4) and lookup tables. In the experiments, when concealing 150,000 bits in the Lena image, the averages of the Normalized Cross-Correlation (NCC) and Mean-Squared Error (MSE) of our proposed method were 0.9952 and 37.9460, respectively, which were the highest. The sufficient experiments confirmed that the performance of the proposed method is satisfactory in terms of image embedding capacity and quality. Full article

(This article belongs to the Special Issue Application and Theory of Multimedia Signal Processing Using Machine Learning or Advanced Methods)

► Show Figures