Application and Theory of Multimedia Signal Processing Using Machine Learning or Advanced Methods

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (30 December 2021) | Viewed by 28886

Printed Edition Available!
A printed edition of this Special Issue is available here.

Special Issue Editor


E-Mail Website
Guest Editor
Department of Computer Engineering, Sejong University, Seoul 05006, Korea
Interests: multimedia security; multimedia signal processing; image compression; watermark technology; steganography; multimedia database
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Data mining uses algorithms to identify and predict useful patterns from data. Although it has found success in many areas, the results of multimedia mining are not satisfactory. Multimedia data mining extracts relevant data from multimedia files, such as audio, video, and still images, to perform similar searches, identify associations, and perform entity identification and classification. Deep learning technology emerged as a new breakthrough in the fields of data mining and AI, and has proven useful in both data analysis and application. In addition, deep learning technology has made great progress in the area of multimedia. Deep learning is a field of machine learning that is applied in smart phones for face recognition and voice commands. Meanwhile, deep learning technology contributes to the development of algorithms for the safety and security of multimedia data and the development of new applications.

This Special Issue will share the achievements of key researchers and practitioners in academia, as well as in the industry, dealing with a wide range of theoretical and applied problems in the field of multimedia. Authors are encouraged to submit contributions in any related areas.

Prof. Dr. Cheonshik Kim
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • image and video application
  • speech recognition
  • information security
  • pattern recognition
  • human interaction
  • biometric recognition

Published Papers (11 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Editorial

Jump to: Research

3 pages, 162 KiB  
Editorial
Application and Theory of Multimedia Signal Processing Using Machine Learning or Advanced Methods
by Cheonshik Kim
Appl. Sci. 2022, 12(13), 6426; https://0-doi-org.brum.beds.ac.uk/10.3390/app12136426 - 24 Jun 2022
Viewed by 919
Abstract
Machine learning (ML) uses algorithms to identify and predict useful patterns from data [...] Full article

Research

Jump to: Editorial

17 pages, 6046 KiB  
Article
A Pipeline Approach to Context-Aware Handwritten Text Recognition
by Yee Fan Tan, Tee Connie, Michael Kah Ong Goh and Andrew Beng Jin Teoh
Appl. Sci. 2022, 12(4), 1870; https://0-doi-org.brum.beds.ac.uk/10.3390/app12041870 - 11 Feb 2022
Cited by 8 | Viewed by 5615
Abstract
Despite concerted efforts towards handwritten text recognition, the automatic location and transcription of handwritten text remain a challenging task. Text detection and segmentation methods are often prone to errors, affecting the accuracy of the subsequent recognition procedure. In this paper, a pipeline that [...] Read more.
Despite concerted efforts towards handwritten text recognition, the automatic location and transcription of handwritten text remain a challenging task. Text detection and segmentation methods are often prone to errors, affecting the accuracy of the subsequent recognition procedure. In this paper, a pipeline that locates texts on a page and recognizes the text types, as well as the context of the texts within the detected region, is proposed. Clinical receipts are used as the subject of study. The proposed model is comprised of an object detection neural network that extracts text sequences present on the page regardless of size, orientation, and type (handwritten text, printed text, or non-text). After that, the text sequences are fed to a Residual Network with a Transformer (ResNet-101T) model to perform transcription. Next, the transcribed text sequences are analyzed using a Named Entity Recognition (NER) model to classify the text sequences into their corresponding contexts (e.g., name, address, prescription, and bill amount). In the proposed pipeline, all the processes are implicitly learned from data. Experiments performed on 500 self-collected clinical receipts containing 15,297 text segments reported a character error rate (CER) and word error rate (WER) of 7.77% and 10.77%, respectively. Full article
Show Figures

Figure 1

14 pages, 3519 KiB  
Article
Empirical Evaluation on Utilizing CNN-Features for Seismic Patch Classification
by Chunxia Zhang, Xiaoli Wei and Sang-Woon Kim
Appl. Sci. 2022, 12(1), 197; https://0-doi-org.brum.beds.ac.uk/10.3390/app12010197 - 25 Dec 2021
Cited by 2 | Viewed by 1814
Abstract
This paper empirically evaluates two kinds of features, which are extracted, respectively, with traditional statistical methods and convolutional neural networks (CNNs), in order to improve the performance of seismic patch image classification. In the latter case, feature vectors, named “CNN-features”, were extracted from [...] Read more.
This paper empirically evaluates two kinds of features, which are extracted, respectively, with traditional statistical methods and convolutional neural networks (CNNs), in order to improve the performance of seismic patch image classification. In the latter case, feature vectors, named “CNN-features”, were extracted from one trained CNN model, and were then used to learn existing classifiers, such as support vector machines. In this case, to learn the CNN model, a technique of transfer learning using synthetic seismic patch data in the source domain, and real-world patch data in the target domain, was applied. The experimental results show that CNN-features lead to some improvements in the classification performance. By analyzing the data complexity measures, the CNN-features are found to have the strongest discriminant capabilities. Furthermore, the transfer learning technique alleviates the problems of long processing times and the lack of learning data. Full article
Show Figures

Figure 1

23 pages, 3538 KiB  
Article
A Study of Features and Deep Neural Network Architectures and Hyper-Parameters for Domestic Audio Classification
by Abigail Copiaco, Christian Ritz, Nidhal Abdulaziz and Stefano Fasciani
Appl. Sci. 2021, 11(11), 4880; https://0-doi-org.brum.beds.ac.uk/10.3390/app11114880 - 26 May 2021
Cited by 11 | Viewed by 3704
Abstract
Recent methodologies for audio classification frequently involve cepstral and spectral features, applied to single channel recordings of acoustic scenes and events. Further, the concept of transfer learning has been widely used over the years, and has proven to provide an efficient alternative to [...] Read more.
Recent methodologies for audio classification frequently involve cepstral and spectral features, applied to single channel recordings of acoustic scenes and events. Further, the concept of transfer learning has been widely used over the years, and has proven to provide an efficient alternative to training neural networks from scratch. The lower time and resource requirements when using pre-trained models allows for more versatility in developing system classification approaches. However, information on classification performance when using different features for multi-channel recordings is often limited. Furthermore, pre-trained networks are initially trained on bigger databases and are often unnecessarily large. This poses a challenge when developing systems for devices with limited computational resources, such as mobile or embedded devices. This paper presents a detailed study of the most apparent and widely-used cepstral and spectral features for multi-channel audio applications. Accordingly, we propose the use of spectro-temporal features. Additionally, the paper details the development of a compact version of the AlexNet model for computationally-limited platforms through studies of performances against various architectural and parameter modifications of the original network. The aim is to minimize the network size while maintaining the series network architecture and preserving the classification accuracy. Considering that other state-of-the-art compact networks present complex directed acyclic graphs, a series architecture proposes an advantage in customizability. Experimentation was carried out through Matlab, using a database that we have generated for this task, which composes of four-channel synthetic recordings of both sound events and scenes. The top performing methodology resulted in a weighted F1-score of 87.92% for scalogram features classified via the modified AlexNet-33 network, which has a size of 14.33 MB. The AlexNet network returned 86.24% at a size of 222.71 MB. Full article
Show Figures

Figure 1

24 pages, 5076 KiB  
Article
Power Allocation for Secrecy-Capacity-Optimization-Artificial-Noise Secure MIMO Precoding Systems under Perfect and Imperfect Channel State Information
by Yebo Gu, Bowen Huang and Zhilu Wu
Appl. Sci. 2021, 11(10), 4558; https://0-doi-org.brum.beds.ac.uk/10.3390/app11104558 - 17 May 2021
Cited by 2 | Viewed by 1537
Abstract
In this paper, we consider the physical layer security problem of the wireless communication system. For the multiple-input, multiple-output (MIMO) wireless communication system, secrecy capacity optimization artificial noise (SCO−AN) is introduced and studied. Unlike its traditional counterpart, SCO−AN is an artificial noise located [...] Read more.
In this paper, we consider the physical layer security problem of the wireless communication system. For the multiple-input, multiple-output (MIMO) wireless communication system, secrecy capacity optimization artificial noise (SCO−AN) is introduced and studied. Unlike its traditional counterpart, SCO−AN is an artificial noise located in the range space of the channel state information space and thus results in a significant increase in the secrecy capacity. Due to the limitation of transmission power, making rational use of this power is crucial to effectively increase the secrecy capacity. Hence, in this paper, the objective function of transmission power allocation is constructed. We also consider the imperfect channel estimation in the power allocation problems. In traditional AN research conducted in the past, the expression of the imperfect channel estimation effect was left unknown. Still, the extent to which the channel estimation error impacts the accuracy of secrecy capacity computation is not negligible. We derive the expression of channel estimation error for least square (LS) and minimum mean squared error (MMSE) channel estimation. The objective function for transmission power allocation is non-convex. That is, the traditional gradient method cannot be used to solve this non-convex optimization problem of power allocation. An improved sequence quadratic program (ISQP) is therefore applied to solve this optimization problem. The numerical result shows that the ISQP is better than other algorithms, and the power allocation as derived from ISQP significantly increases secrecy capacity. Full article
Show Figures

Figure 1

14 pages, 1834 KiB  
Article
You Only Look Once, But Compute Twice: Service Function Chaining for Low-Latency Object Detection in Softwarized Networks
by Zuo Xiang, Patrick Seeling and Frank H. P. Fitzek
Appl. Sci. 2021, 11(5), 2177; https://0-doi-org.brum.beds.ac.uk/10.3390/app11052177 - 02 Mar 2021
Cited by 10 | Viewed by 2666
Abstract
With increasing numbers of computer vision and object detection application scenarios, those requiring ultra-low service latency times have become increasingly prominent; e.g., those for autonomous and connected vehicles or smart city applications. The incorporation of machine learning through the applications of trained models [...] Read more.
With increasing numbers of computer vision and object detection application scenarios, those requiring ultra-low service latency times have become increasingly prominent; e.g., those for autonomous and connected vehicles or smart city applications. The incorporation of machine learning through the applications of trained models in these scenarios can pose a computational challenge. The softwarization of networks provides opportunities to incorporate computing into the network, increasing flexibility by distributing workloads through offloading from client and edge nodes over in-network nodes to servers. In this article, we present an example for splitting the inference component of the YOLOv2 trained machine learning model between client, network, and service side processing to reduce the overall service latency. Assuming a client has 20% of the server computational resources, we observe a more than 12-fold reduction of service latency when incorporating our service split compared to on-client processing and and an increase in speed of more than 25% compared to performing everything on the server. Our approach is not only applicable to object detection, but can also be applied in a broad variety of machine learning-based applications and services. Full article
Show Figures

Figure 1

19 pages, 3159 KiB  
Article
Foreground Objects Detection by U-Net with Multiple Difference Images
by Jae-Yeul Kim and Jong-Eun Ha
Appl. Sci. 2021, 11(4), 1807; https://0-doi-org.brum.beds.ac.uk/10.3390/app11041807 - 18 Feb 2021
Cited by 6 | Viewed by 2897
Abstract
In video surveillance, robust detection of foreground objects is usually done by subtracting a background model from the current image. Most traditional approaches use a statistical method to model the background image. Recently, deep learning has also been widely used to detect foreground [...] Read more.
In video surveillance, robust detection of foreground objects is usually done by subtracting a background model from the current image. Most traditional approaches use a statistical method to model the background image. Recently, deep learning has also been widely used to detect foreground objects in video surveillance. It shows dramatic improvement compared to the traditional approaches. It is trained through supervised learning, which requires training samples with pixel-level assignment. It requires a huge amount of time and is high cost, while traditional algorithms operate unsupervised and do not require training samples. Additionally, deep learning-based algorithms lack generalization power. They operate well on scenes that are similar to the training conditions, but they do not operate well on scenes that deviate from the training conditions. In this paper, we present a new method to detect foreground objects in video surveillance using multiple difference images as the input of convolutional neural networks, which guarantees improved generalization power compared to current deep learning-based methods. First, we adjust U-Net to use multiple difference images as input. Second, we show that training using all scenes in the CDnet 2014 dataset can improve the generalization power. Hyper-parameters such as the number of difference images and the interval between images in difference image computation are chosen by analyzing experimental results. We demonstrate that the proposed algorithm achieves improved performance in scenes that are not used in training compared to state-of-the-art deep learning and traditional unsupervised algorithms. Diverse experiments using various open datasets and real images show the feasibility of the proposed method. Full article
Show Figures

Figure 1

14 pages, 1438 KiB  
Article
Constrained Backtracking Matching Pursuit Algorithm for Image Reconstruction in Compressed Sensing
by Xue Bi, Lu Leng, Cheonshik Kim, Xinwen Liu, Yajun Du and Feng Liu
Appl. Sci. 2021, 11(4), 1435; https://0-doi-org.brum.beds.ac.uk/10.3390/app11041435 - 05 Feb 2021
Cited by 13 | Viewed by 1807
Abstract
Image reconstruction based on sparse constraints is an important research topic in compressed sensing. Sparsity adaptive matching pursuit (SAMP) is a greedy pursuit reconstruction algorithm, which reconstructs signals without prior information of the sparsity level and potentially presents better reconstruction performance than other [...] Read more.
Image reconstruction based on sparse constraints is an important research topic in compressed sensing. Sparsity adaptive matching pursuit (SAMP) is a greedy pursuit reconstruction algorithm, which reconstructs signals without prior information of the sparsity level and potentially presents better reconstruction performance than other greedy pursuit algorithms. However, SAMP still suffers from being sensitive to the step size selection at high sub-sampling ratios. To solve this problem, this paper proposes a constrained backtracking matching pursuit (CBMP) algorithm for image reconstruction. The composite strategy, including two kinds of constraints, effectively controls the increment of the estimated sparsity level at different stages and accurately estimates the true support set of images. Based on the relationship analysis between the signal and measurement, an energy criterion is also proposed as a constraint. At the same time, the four-to-one rule is improved as an extra constraint. Comprehensive experimental results demonstrate that the proposed CBMP yields better performance and further stability than other greedy pursuit algorithms for image reconstruction. Full article
Show Figures

Figure 1

21 pages, 4080 KiB  
Article
Self-Embedding Fragile Watermarking Scheme to Detect Image Tampering Using AMBTC and OPAP Approaches
by Cheonshik Kim and Ching-Nung Yang
Appl. Sci. 2021, 11(3), 1146; https://0-doi-org.brum.beds.ac.uk/10.3390/app11031146 - 27 Jan 2021
Cited by 25 | Viewed by 2242
Abstract
Research on self-embedding watermarks is being actively conducted to solve personal privacy and copyright problems by image attack. In this paper, we propose a self-embedded watermarking technique based on Absolute Moment Block Truncation Coding (AMBTC) for reconstructing tampered images by cropping attacks and [...] Read more.
Research on self-embedding watermarks is being actively conducted to solve personal privacy and copyright problems by image attack. In this paper, we propose a self-embedded watermarking technique based on Absolute Moment Block Truncation Coding (AMBTC) for reconstructing tampered images by cropping attacks and forgery. AMBTC is suitable as a recovery bit (watermark) for the tampered image. This is because AMBTC has excellent compression performance and image quality. Moreover, to improve the quality of the marked image, the Optimal Pixel Adjustment Process (OPAP) method is used in the process of hiding AMBTC in the cover image. To find a damaged block in a marked image, the authentication data along with the watermark must be hidden in the block. We employ a checksum for authentication. The watermark is embedded in the pixels of the cover image using 3LSB and 2LSB, and the checksum is hidden in the LSB. Through the recovering procedure, it is possible to recover the original marked image from the tampered marked image. In addition, when the tampering ratio was 45%, the image (Lena) could be recovered at 36 dB. The proposed self-embedding method was verified through an experiment, and the result was the recovered image showed superior perceptual quality compared to the previous methods. Full article
Show Figures

Figure 1

14 pages, 336 KiB  
Article
Practical Inner Product Encryption with Constant Private Key
by Yi-Fan Tseng, Zi-Yuan Liu and Raylin Tso
Appl. Sci. 2020, 10(23), 8669; https://0-doi-org.brum.beds.ac.uk/10.3390/app10238669 - 03 Dec 2020
Cited by 5 | Viewed by 1979
Abstract
Inner product encryption, first introduced by Katz et al., is a type of predicate encryption in which a ciphertext and a private key correspond to an attribute vector and a predicate vector, respectively. Only if the attribute and predicate vectors satisfy the inner [...] Read more.
Inner product encryption, first introduced by Katz et al., is a type of predicate encryption in which a ciphertext and a private key correspond to an attribute vector and a predicate vector, respectively. Only if the attribute and predicate vectors satisfy the inner product predicate will the decryption in this scheme be correct. In addition, the ability to use inner product encryption as an underlying building block to construct other useful cryptographic primitives has been demonstrated in the context of anonymous identity-based encryption and hidden vector encryption. However, the computing cost and communication cost of performing inner product encryption are very high at present. To resolve this problem, we introduce an efficient inner product encryption approach in this work. Specifically, the size of the private key is only one G element and one Zp element, and decryption requires only one pairing computation. The formal security proof and implementation result are also demonstrated. Compared with other state-of-the-art schemes, our scheme is the most efficient in terms of the number of pairing computations for decryption and the private key length. Full article
18 pages, 2582 KiB  
Article
Hybrid Data Hiding Based on AMBTC Using Enhanced Hamming Code
by Cheonshik Kim, Dong-Kyoo Shin, Ching-Nung Yang and Lu Leng
Appl. Sci. 2020, 10(15), 5336; https://0-doi-org.brum.beds.ac.uk/10.3390/app10155336 - 02 Aug 2020
Cited by 25 | Viewed by 2436
Abstract
The image-based data hiding method is a technology used to transmit confidential information secretly. Since images (e.g., grayscale images) usually have sufficient redundancy information, they are a very suitable medium for hiding data. Absolute Moment Block Truncation Coding (AMBTC) is one of several [...] Read more.
The image-based data hiding method is a technology used to transmit confidential information secretly. Since images (e.g., grayscale images) usually have sufficient redundancy information, they are a very suitable medium for hiding data. Absolute Moment Block Truncation Coding (AMBTC) is one of several compression methods and is appropriate for embedding data due to its very low complexity and acceptable distortion. However, since there is not enough redundant data compared to grayscale images, the research to embed data in the compressed image is a very challenging topic. That is the motivation and challenge of this research. Meanwhile, the Hamming codes are used to embed secret bits, as well as a block code that can detect up to two simultaneous bit errors and correct single bit errors. In this paper, we propose an effective data hiding method for two quantization levels of each block of AMBTC using Hamming codes. Bai and Chang introduced a method of applying Hamming (7,4) to two quantization levels; however, the scheme is ineffective, and the image distortion error is relatively large. To solve the problem with the image distortion errors, this paper introduces a way of optimizing codewords and reducing pixel distortion by utilizing Hamming (7,4) and lookup tables. In the experiments, when concealing 150,000 bits in the Lena image, the averages of the Normalized Cross-Correlation (NCC) and Mean-Squared Error (MSE) of our proposed method were 0.9952 and 37.9460, respectively, which were the highest. The sufficient experiments confirmed that the performance of the proposed method is satisfactory in terms of image embedding capacity and quality. Full article
Show Figures

Figure 1

Back to TopTop