entropy-logo

Journal Browser

Journal Browser

Data Compression and Complexity

A special issue of Entropy (ISSN 1099-4300). This special issue belongs to the section "Information Theory, Probability and Statistics".

Deadline for manuscript submissions: closed (5 July 2021) | Viewed by 9376

Special Issue Editors


E-Mail Website
Guest Editor
National Institute of Advanced Studies (NIAS), Indian Institute of Science Campus, Bengaluru, Karnataka, India
Interests: compression-complexity; nonlinear time series analysis; causality testing; neuro-chaos inspired machine learning

E-Mail Website
Guest Editor
Indian Institute of Science Education and Research (IISER) Bhopal, Madhya Pradesh, India
Interests: dynamical systems; signal processing; machine learning; bioinformatics

E-Mail Website1 Website2
Guest Editor
Department of Electrical and Computer Engineering, University of California, Santa Barbara, CA 93106-9560, USA
Interests: information theoretic techniques for signal processing and machine learning; lossy source coding; rate distortion theory
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Data compression, the art and science of removing redundancy to encode information with fewer bits than the original representation, is already a mature subdiscipline of Information Theory. However, data compression is not just applied for efficient storage and transmission in communication systems but is found to have intersections with other disciplines such as machine learning, model selection, statistical inference, causality testing, and the study of complex systems.

Data compression is of relevance not just in the analysis of time-series data, but also in the analysis of networks and other nonlinear data representations that arise in a modern analysis of physical, biological, and human-designed complex systems. These areas have often employed a host of information-theoretic methods such as permutation entropy, approximate entropy, sample entropy, Lempel–Ziv and effort-to-compress complexity with considerable success to yield unique insights.

This Special Issue aims to be a forum for researchers from diverse disciplines to present new insights into the theory and practice of data compression and to explore the rich interplay between info-theoretic methods and complex systems. Apart from novel algorithms that push the frontiers of state-of-the-art performance in data compression, we also welcome applications involving time series analysis, model selection, causal inference, and machine learning in real-world natural and human-engineered complex systems.

Dr. Nithin Nagaraj
Dr. Kushal Shah
Prof. Dr. Jerry D. Gibson
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Entropy is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Data compression (lossless and lossy)
  • Universal source coding
  • Compression complexity
  • Complex systems
  • Time series analysis
  • Entropy coding
  • Information-theoretic analysis
  • Model selection
  • Dynamical complexity
  • Causal inference
  • Machine learning and data compression

Published Papers (5 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

12 pages, 502 KiB  
Article
Compression-Complexity Measures for Analysis and Classification of Coronaviruses
by Naga Venkata Trinath Sai Munagala, Prem Kumar Amanchi, Karthi Balasubramanian, Athira Panicker and Nithin Nagaraj
Entropy 2023, 25(1), 81; https://0-doi-org.brum.beds.ac.uk/10.3390/e25010081 - 31 Dec 2022
Cited by 2 | Viewed by 1433
Abstract
Finding a vaccine or specific antiviral treatment for a global pandemic of virus diseases (such as the ongoing COVID-19) requires rapid analysis, annotation and evaluation of metagenomic libraries to enable a quick and efficient screening of nucleotide sequences. Traditional sequence alignment methods are [...] Read more.
Finding a vaccine or specific antiviral treatment for a global pandemic of virus diseases (such as the ongoing COVID-19) requires rapid analysis, annotation and evaluation of metagenomic libraries to enable a quick and efficient screening of nucleotide sequences. Traditional sequence alignment methods are not suitable and there is a need for fast alignment-free techniques for sequence analysis. Information theory and data compression algorithms provide a rich set of mathematical and computational tools to capture essential patterns in biological sequences. In this study, we investigate the use of compression-complexity (Effort-to-Compress or ETC and Lempel-Ziv or LZ complexity) based distance measures for analyzing genomic sequences. The proposed distance measure is used to successfully reproduce the phylogenetic trees for a mammalian dataset consisting of eight species clusters, a set of coronaviruses belonging to group I, group II, group III, and SARS-CoV-1 coronaviruses, and a set of coronaviruses causing COVID-19 (SARS-CoV-2), and those not causing COVID-19. Having demonstrated the usefulness of these compression complexity measures, we employ them for the automatic classification of COVID-19-causing genome sequences using machine learning techniques. Two flavors of SVM (linear and quadratic) along with linear discriminant and fine K Nearest Neighbors classifer are used for classification. Using a data set comprising 1001 coronavirus sequences (causing COVID-19 and those not causing COVID-19), a classification accuracy of 98% is achieved with a sensitivity of 95% and a specificity of 99.8%. This work could be extended further to enable medical practitioners to automatically identify and characterize coronavirus strains and their rapidly growing mutants in a fast and efficient fashion. Full article
(This article belongs to the Special Issue Data Compression and Complexity)
Show Figures

Figure 1

21 pages, 4274 KiB  
Article
A General Rate-Distortion Optimization Method for Block Compressed Sensing of Images
by Qunlin Chen, Derong Chen and Jiulu Gong
Entropy 2021, 23(10), 1354; https://0-doi-org.brum.beds.ac.uk/10.3390/e23101354 - 16 Oct 2021
Cited by 1 | Viewed by 1520
Abstract
Block compressed sensing (BCS) is a promising technology for image sampling and compression for resource-constrained applications, but it needs to balance the sampling rate and quantization bit-depth for a bit-rate constraint. In this paper, we summarize the commonly used CS quantization frameworks into [...] Read more.
Block compressed sensing (BCS) is a promising technology for image sampling and compression for resource-constrained applications, but it needs to balance the sampling rate and quantization bit-depth for a bit-rate constraint. In this paper, we summarize the commonly used CS quantization frameworks into a unified framework, and a new bit-rate model and a model of the optimal bit-depth are proposed for the unified CS framework. The proposed bit-rate model reveals the relationship between the bit-rate, sampling rate, and bit-depth based on the information entropy of generalized Gaussian distribution. The optimal bit-depth model can predict the optimal bit-depth of CS measurements at a given bit-rate. Then, we propose a general algorithm for choosing sampling rate and bit-depth based on the proposed models. Experimental results show that the proposed algorithm achieves near-optimal rate-distortion performance for the uniform quantization framework and predictive quantization framework in BCS. Full article
(This article belongs to the Special Issue Data Compression and Complexity)
Show Figures

Figure 1

14 pages, 632 KiB  
Article
Coarse-Grained Pruning of Neural Network Models Based on Blocky Sparse Structure
by Lan Huang, Jia Zeng, Shiqi Sun, Wencong Wang, Yan Wang and Kangping Wang
Entropy 2021, 23(8), 1042; https://0-doi-org.brum.beds.ac.uk/10.3390/e23081042 - 13 Aug 2021
Cited by 1 | Viewed by 2291
Abstract
Deep neural networks may achieve excellent performance in many research fields. However, many deep neural network models are over-parameterized. The computation of weight matrices often consumes a lot of time, which requires plenty of computing resources. In order to solve these problems, a [...] Read more.
Deep neural networks may achieve excellent performance in many research fields. However, many deep neural network models are over-parameterized. The computation of weight matrices often consumes a lot of time, which requires plenty of computing resources. In order to solve these problems, a novel block-based division method and a special coarse-grained block pruning strategy are proposed in this paper to simplify and compress the fully connected structure, and the pruned weight matrices with a blocky structure are then stored in the format of Block Sparse Row (BSR) to accelerate the calculation of the weight matrices. First, the weight matrices are divided into square sub-blocks based on spatial aggregation. Second, a coarse-grained block pruning procedure is utilized to scale down the model parameters. Finally, the BSR storage format, which is much more friendly to block sparse matrix storage and computation, is employed to store these pruned dense weight blocks to speed up the calculation. In the following experiments on MNIST and Fashion-MNIST datasets, the trend of accuracies with different pruning granularities and different sparsity is explored in order to analyze our method. The experimental results show that our coarse-grained block pruning method can compress the network and can reduce the computational cost without greatly degrading the classification accuracy. The experiment on the CIFAR-10 dataset shows that our block pruning strategy can combine well with the convolutional networks. Full article
(This article belongs to the Special Issue Data Compression and Complexity)
Show Figures

Figure 1

19 pages, 492 KiB  
Article
A Stochastic Model for Block Segmentation of Images Based on the Quadtree and the Bayes Code for It
by Yuta Nakahara and Toshiyasu Matsushima
Entropy 2021, 23(8), 991; https://0-doi-org.brum.beds.ac.uk/10.3390/e23080991 - 30 Jul 2021
Cited by 7 | Viewed by 1723
Abstract
In information theory, lossless compression of general data is based on an explicit assumption of a stochastic generative model on target data. However, in lossless image compression, researchers have mainly focused on the coding procedure that outputs the coded sequence from the input [...] Read more.
In information theory, lossless compression of general data is based on an explicit assumption of a stochastic generative model on target data. However, in lossless image compression, researchers have mainly focused on the coding procedure that outputs the coded sequence from the input image, and the assumption of the stochastic generative model is implicit. In these studies, there is a difficulty in discussing the difference between the expected code length and the entropy of the stochastic generative model. We solve this difficulty for a class of images, in which they have non-stationarity among segments. In this paper, we propose a novel stochastic generative model of images by redefining the implicit stochastic generative model in a previous coding procedure. Our model is based on the quadtree so that it effectively represents the variable block size segmentation of images. Then, we construct the Bayes code optimal for the proposed stochastic generative model. It requires the summation of all possible quadtrees weighted by their posterior. In general, its computational cost increases exponentially for the image size. However, we introduce an efficient algorithm to calculate it in the polynomial order of the image size without loss of optimality. As a result, the derived algorithm has a better average coding rate than that of JBIG. Full article
(This article belongs to the Special Issue Data Compression and Complexity)
Show Figures

Figure 1

21 pages, 363 KiB  
Article
Tracking an Auto-Regressive Process with Limited Communication per Unit Time
by Rooji Jinan, Parimal Parag and Himanshu Tyagi
Entropy 2021, 23(3), 347; https://0-doi-org.brum.beds.ac.uk/10.3390/e23030347 - 15 Mar 2021
Cited by 2 | Viewed by 1372
Abstract
Samples from a high-dimensional first-order auto-regressive process generated by an independently and identically distributed random innovation sequence are observed by a sender which can communicate only finitely many bits per unit time to a receiver. The receiver seeks to form an estimate of [...] Read more.
Samples from a high-dimensional first-order auto-regressive process generated by an independently and identically distributed random innovation sequence are observed by a sender which can communicate only finitely many bits per unit time to a receiver. The receiver seeks to form an estimate of the process value at every time instant in real-time. We consider a time-slotted communication model in a slow-sampling regime where multiple communication slots occur between two sampling instants. We propose a successive update scheme which uses communication between sampling instants to refine estimates of the latest sample and study the following question: Is it better to collect communication of multiple slots to send better refined estimates, making the receiver wait more for every refinement, or to be fast but loose and send new information in every communication opportunity? We show that the fast but loose successive update scheme with ideal spherical codes is universally optimal asymptotically for a large dimension. However, most practical quantization codes for fixed dimensions do not meet the ideal performance required for this optimality, and they typically will have a bias in the form of a fixed additive error. Interestingly, our analysis shows that the fast but loose scheme is not an optimal choice in the presence of such errors, and a judiciously chosen frequency of updates outperforms it. Full article
(This article belongs to the Special Issue Data Compression and Complexity)
Show Figures

Figure 1

Back to TopTop