Data Compression Algorithms and their Applications

A special issue of Algorithms (ISSN 1999-4893). This special issue belongs to the section "Databases and Data Structures".

Deadline for manuscript submissions: closed (30 November 2019) | Viewed by 35711

Special Issue Editor

Department of Applied Mathematics and Computer Science (DTU Compute), Technical University of Denmark, 2800 Kgs. Lyngby, Denmark
Interests: pattern matching; data compression; parallelism in modern computer architectures

Special Issue Information

Dear Colleagues,

Data compression is classic research area in computer science focusing on the efficient storage and communication of data. Data compression is ubiquitous throughout science and engineering and essentially any data of non-trivial size is stored or communicated in compressed form on any modern computer system. With rapid advances in data collection in areas such as e-commerce, astronomy, climatology, bioinformatics, and particle physics, the need for efficient data compression is stronger than ever.

We invite you to submit high quality papers to this Special Issue on “Data compression and applications”, with subjects covering the whole range from theory to applications. The following is a (non-exhaustive) list of topics of interests:

  • Loss-less data compression
  • Lossy data compression
  • Algorithms on compressed data
  • Compressed data structures
  • Applications of data compression

Prof. Philip Bille
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Algorithms is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • loss-less data compression
  • lossy data compression
  • algorithms on compressed data
  • compressed data structures
  • applications of data compression

Published Papers (9 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Editorial

Jump to: Research

2 pages, 145 KiB  
Editorial
Editorial: Special Issue on Data Compression Algorithms and Their Applications
by Philip Bille
Algorithms 2020, 13(1), 28; https://0-doi-org.brum.beds.ac.uk/10.3390/a13010028 - 17 Jan 2020
Viewed by 3146
Abstract
This Special Issue of Algorithms is focused on data compression algorithms and their applications. Full article
(This article belongs to the Special Issue Data Compression Algorithms and their Applications)

Research

Jump to: Editorial

18 pages, 1969 KiB  
Article
A New Lossless DNA Compression Algorithm Based on A Single-Block Encoding Scheme
by Deloula Mansouri, Xiaohui Yuan and Abdeldjalil Saidani
Algorithms 2020, 13(4), 99; https://0-doi-org.brum.beds.ac.uk/10.3390/a13040099 - 20 Apr 2020
Cited by 9 | Viewed by 4811
Abstract
With the emergent evolution in DNA sequencing technology, a massive amount of genomic data is produced every day, mainly DNA sequences, craving for more storage and bandwidth. Unfortunately, managing, analyzing and specifically storing these large amounts of data become a major scientific challenge [...] Read more.
With the emergent evolution in DNA sequencing technology, a massive amount of genomic data is produced every day, mainly DNA sequences, craving for more storage and bandwidth. Unfortunately, managing, analyzing and specifically storing these large amounts of data become a major scientific challenge for bioinformatics. Therefore, to overcome these challenges, compression has become necessary. In this paper, we describe a new reference-free DNA compressor abbreviated as DNAC-SBE. DNAC-SBE is a lossless hybrid compressor that consists of three phases. First, starting from the largest base (Bi), the positions of each Bi are replaced with ones and the positions of other bases that have smaller frequencies than Bi are replaced with zeros. Second, to encode the generated streams, we propose a new single-block encoding scheme (SEB) based on the exploitation of the position of neighboring bits within the block using two different techniques. Finally, the proposed algorithm dynamically assigns the shorter length code to each block. Results show that DNAC-SBE outperforms state-of-the-art compressors and proves its efficiency in terms of special conditions imposed on compressed data, storage space and data transfer rate regardless of the file format or the size of the data. Full article
(This article belongs to the Special Issue Data Compression Algorithms and their Applications)
Show Figures

Figure 1

21 pages, 412 KiB  
Article
Optimal Prefix Free Codes with Partial Sorting
by Jérémy Barbay
Algorithms 2020, 13(1), 12; https://0-doi-org.brum.beds.ac.uk/10.3390/a13010012 - 31 Dec 2019
Cited by 3 | Viewed by 5455
Abstract
We describe an algorithm computing an optimal prefix free code for n unsorted positive weights in time within O ( n ( 1 + lg α ) ) O ( n lg n ) , where the alternation [...] Read more.
We describe an algorithm computing an optimal prefix free code for n unsorted positive weights in time within O ( n ( 1 + lg α ) ) O ( n lg n ) , where the alternation α [ 1 . . n 1 ] approximates the minimal amount of sorting required by the computation. This asymptotical complexity is within a constant factor of the optimal in the algebraic decision tree computational model, in the worst case over all instances of size n and alternation α . Such results refine the state of the art complexity of Θ ( n lg n ) in the worst case over instances of size n in the same computational model, a landmark in compression and coding since 1952. Beside the new analysis technique, such improvement is obtained by combining a new algorithm, inspired by van Leeuwen’s algorithm to compute optimal prefix free codes from sorted weights (known since 1976), with a relatively minor extension of Karp et al.’s deferred data structure to partially sort a multiset accordingly to the queries performed on it (known since 1988). Preliminary experimental results on text compression by words show α to be polynomially smaller than n, which suggests improvements by at most a constant multiplicative factor in the running time for such applications. Full article
(This article belongs to the Special Issue Data Compression Algorithms and their Applications)
Show Figures

Figure 1

18 pages, 1556 KiB  
Article
Finding Patterns in Signals Using Lossy Text Compression
by Liat Rozenberg, Sagi Lotan and Dan Feldman
Algorithms 2019, 12(12), 267; https://0-doi-org.brum.beds.ac.uk/10.3390/a12120267 - 11 Dec 2019
Cited by 2 | Viewed by 3479
Abstract
Whether the source is autonomous car, robotic vacuum cleaner, or a quadcopter, signals from sensors tend to have some hidden patterns that repeat themselves. For example, typical GPS traces from a smartphone contain periodic trajectories such as “home, work, home, work, ⋯”. Our [...] Read more.
Whether the source is autonomous car, robotic vacuum cleaner, or a quadcopter, signals from sensors tend to have some hidden patterns that repeat themselves. For example, typical GPS traces from a smartphone contain periodic trajectories such as “home, work, home, work, ⋯”. Our goal in this study was to automatically reverse engineer such signals, identify their periodicity, and then use it to compress and de-noise these signals. To do so, we present a novel method of using algorithms from the field of pattern matching and text compression to represent the “language” in such signals. Common text compression algorithms are less tailored to handle such strings. Moreover, they are lossless, and cannot be used to recover noisy signals. To this end, we define the recursive run-length encoding (RRLE) method, which is a generalization of the well known run-length encoding (RLE) method. Then, we suggest lossy and lossless algorithms to compress and de-noise such signals. Unlike previous results, running time and optimality guarantees are proved for each algorithm. Experimental results on synthetic and real data sets are provided. We demonstrate our system by showing how it can be used to turn commercial micro air-vehicles into autonomous robots. This is by reverse engineering their unpublished communication protocols and using a laptop or on-board micro-computer to control them. Our open source code may be useful for both the community of millions of toy robots users, as well as for researchers that may extend it for further protocols. Full article
(This article belongs to the Special Issue Data Compression Algorithms and their Applications)
Show Figures

Figure 1

26 pages, 1370 KiB  
Article
Compression Challenges in Large Scale Partial Differential Equation Solvers
by Sebastian Götschel and Martin Weiser
Algorithms 2019, 12(9), 197; https://0-doi-org.brum.beds.ac.uk/10.3390/a12090197 - 17 Sep 2019
Cited by 4 | Viewed by 4201
Abstract
Solvers for partial differential equations (PDEs) are one of the cornerstones of computational science. For large problems, they involve huge amounts of data that need to be stored and transmitted on all levels of the memory hierarchy. Often, bandwidth is the limiting factor [...] Read more.
Solvers for partial differential equations (PDEs) are one of the cornerstones of computational science. For large problems, they involve huge amounts of data that need to be stored and transmitted on all levels of the memory hierarchy. Often, bandwidth is the limiting factor due to the relatively small arithmetic intensity, and increasingly due to the growing disparity between computing power and bandwidth. Consequently, data compression techniques have been investigated and tailored towards the specific requirements of PDE solvers over the recent decades. This paper surveys data compression challenges and discusses examples of corresponding solution approaches for PDE problems, covering all levels of the memory hierarchy from mass storage up to the main memory. We illustrate concepts for particular methods, with examples, and give references to alternatives. Full article
(This article belongs to the Special Issue Data Compression Algorithms and their Applications)
Show Figures

Figure 1

16 pages, 1875 KiB  
Article
Nearest Embedded and Embedding Self-Nested Trees
by Romain Azaïs
Algorithms 2019, 12(9), 180; https://0-doi-org.brum.beds.ac.uk/10.3390/a12090180 - 29 Aug 2019
Cited by 2 | Viewed by 2968
Abstract
Self-nested trees present a systematic form of redundancy in their subtrees and thus achieve optimal compression rates by directed acrylic graph (DAG) compression. A method for quantifying the degree of self-similarity of plants through self-nested trees was introduced by Godin and Ferraro in [...] Read more.
Self-nested trees present a systematic form of redundancy in their subtrees and thus achieve optimal compression rates by directed acrylic graph (DAG) compression. A method for quantifying the degree of self-similarity of plants through self-nested trees was introduced by Godin and Ferraro in 2010. The procedure consists of computing a self-nested approximation, called the nearest embedding self-nested tree, that both embeds the plant and is the closest to it. In this paper, we propose a new algorithm that computes the nearest embedding self-nested tree with a smaller overall complexity, but also the nearest embedded self-nested tree. We show from simulations that the latter is mostly the closest to the initial data, which suggests that this better approximation should be used as a privileged measure of the degree of self-similarity of plants. Full article
(This article belongs to the Special Issue Data Compression Algorithms and their Applications)
Show Figures

Figure 1

16 pages, 4444 KiB  
Article
Compaction of Church Numerals
by Isamu Furuya and Takuya Kida
Algorithms 2019, 12(8), 159; https://0-doi-org.brum.beds.ac.uk/10.3390/a12080159 - 08 Aug 2019
Cited by 1 | Viewed by 3052
Abstract
In this study, we address the problem of compaction of Church numerals. Church numerals are unary representations of natural numbers on the scheme of lambda terms. We propose a novel decomposition scheme from a given natural number into an arithmetic expression using tetration, [...] Read more.
In this study, we address the problem of compaction of Church numerals. Church numerals are unary representations of natural numbers on the scheme of lambda terms. We propose a novel decomposition scheme from a given natural number into an arithmetic expression using tetration, which enables us to obtain a compact representation of lambda terms that leads to the Church numeral of the natural number. For natural number n, we prove that the size of the lambda term obtained by the proposed method is O ( ( slog 2 n ) ( log n / log log n ) ) . Moreover, we experimentally confirmed that the proposed method outperforms binary representation of Church numerals on average, when n is less than approximately 10,000. Full article
(This article belongs to the Special Issue Data Compression Algorithms and their Applications)
Show Figures

Figure 1

11 pages, 1681 KiB  
Article
A New Regularized Reconstruction Algorithm Based on Compressed Sensing for the Sparse Underdetermined Problem and Applications of One-Dimensional and Two-Dimensional Signal Recovery
by Bin Wang, Li Wang, Hao Yu and Fengming Xin
Algorithms 2019, 12(7), 126; https://0-doi-org.brum.beds.ac.uk/10.3390/a12070126 - 26 Jun 2019
Cited by 1 | Viewed by 3130
Abstract
The compressed sensing theory has been widely used in solving undetermined equations in various fields and has made remarkable achievements. The regularized smooth L0 (ReSL0) reconstruction algorithm adds an error regularization term to the smooth L0(SL0) algorithm, achieving the reconstruction of the signal [...] Read more.
The compressed sensing theory has been widely used in solving undetermined equations in various fields and has made remarkable achievements. The regularized smooth L0 (ReSL0) reconstruction algorithm adds an error regularization term to the smooth L0(SL0) algorithm, achieving the reconstruction of the signal well in the presence of noise. However, the ReSL0 reconstruction algorithm still has some flaws. It still chooses the original optimization method of SL0 and the Gauss approximation function, but this method has the problem of a sawtooth effect in the later optimization stage, and the convergence effect is not ideal. Therefore, we make two adjustments to the basis of the ReSL0 reconstruction algorithm: firstly, we introduce another CIPF function which has a better approximation effect than Gauss function; secondly, we combine the steepest descent method and Newton method in terms of the algorithm optimization. Then, a novel regularized recovery algorithm named combined regularized smooth L0 (CReSL0) is proposed. Under the same experimental conditions, the CReSL0 algorithm is compared with other popular reconstruction algorithms. Overall, the CReSL0 algorithm achieves excellent reconstruction performance in terms of the peak signal-to-noise ratio (PSNR) and run-time for both a one-dimensional Gauss signal and two-dimensional image reconstruction tasks. Full article
(This article belongs to the Special Issue Data Compression Algorithms and their Applications)
Show Figures

Figure 1

10 pages, 742 KiB  
Article
Time-Universal Data Compression
by Boris Ryabko
Algorithms 2019, 12(6), 116; https://0-doi-org.brum.beds.ac.uk/10.3390/a12060116 - 29 May 2019
Cited by 3 | Viewed by 3344
Abstract
Nowadays, a variety of data-compressors (or archivers) is available, each of which has its merits, and it is impossible to single out the best ones. Thus, one faces the problem of choosing the best method to compress a given file, and this problem [...] Read more.
Nowadays, a variety of data-compressors (or archivers) is available, each of which has its merits, and it is impossible to single out the best ones. Thus, one faces the problem of choosing the best method to compress a given file, and this problem is more important the larger is the file. It seems natural to try all the compressors and then choose the one that gives the shortest compressed file, then transfer (or store) the index number of the best compressor (it requires log m bits, if m is the number of compressors available) and the compressed file. The only problem is the time, which essentially increases due to the need to compress the file m times (in order to find the best compressor). We suggest a method of data compression whose performance is close to optimal, but for which the extra time needed is relatively small: the ratio of this extra time and the total time of calculation can be limited, in an asymptotic manner, by an arbitrary positive constant. In short, the main idea of the suggested approach is as follows: in order to find the best, try all the data compressors, but, when doing so, use for compression only a small part of the file. Then apply the best data compressors to the whole file. Note that there are many situations where it may be necessary to find the best data compressor out of a given set. In such a case, it is often done by comparing compressors empirically. One of the goals of this work is to turn such a selection process into a part of the data compression method, automating and optimizing it. Full article
(This article belongs to the Special Issue Data Compression Algorithms and their Applications)
Back to TopTop