Algorithms and Data-Structures for Compressed Computation

A special issue of Algorithms (ISSN 1999-4893). This special issue belongs to the section "Databases and Data Structures".

Deadline for manuscript submissions: closed (30 January 2021) | Viewed by 13702

Special Issue Editors


E-Mail Website
Guest Editor
Department of Mathematics, Computer Science and Physics, Via delle Scienze 206, 33100 Udine, Italy
Interests: algorithms for bioinformatics and computational biology; logic decision algorithms; algorithms and data-structures for compressed computation
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Business and Management, LUISS Guido Carli, viale Romania 32, 00197 Rome, Italy
Interests: data structures; algorithms; data compression; bioinformatics
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

As the production of massive data has outpaced Moore’s law in many scientific areas, the very notion of algorithm is shaping up to keep the pace. Today’s algorithms must not only be able to deal with large amounts of data, they also must be capable to use resources close to their theoretical limits. Against this landscape, compression has become a fundamental tool to make the storage of this kind of data possible in the first place.

While motivated by the fact that massive data are often very compressible, this approach poses interesting algorithmic challenges: Can we compute directly on compressed data without first decompressing it? Can we propose measures and frameworks to compare different approaches? Can different techniques for compressed computation be unified?

Thanks to more than two decades of successful research on the subject, today we know that many algorithmic marvels are indeed possible. In fact, compression and computation are two sides of the same coin, as they exploit the same underlying structure both for storing and operating on data.

Despite numerous successes in the field, however, several challenges remain open, including (but not limited to) extending techniques to more/less structured and massive data. The goal of this Special Issue is precisely to advance research in the field of compressed computation. We welcome the submission of high-quality papers dealing with compressed data structures, data compression, string/tree/graph processing, and combinatorics.

Dr. Alberto Policriti
Dr. Nicola Prezza
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Algorithms is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Compressed data structures
  • Data compression
  • String processing
  • Combinatorics on strings
  • Text indexing
  • Pattern matching

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Editorial

Jump to: Research

2 pages, 134 KiB  
Editorial
Special Issue on Algorithms and Data-Structures for Compressed Computation
by Alberto Policriti and Nicola Prezza
Algorithms 2022, 15(12), 457; https://0-doi-org.brum.beds.ac.uk/10.3390/a15120457 - 02 Dec 2022
Viewed by 982
Abstract
As the production of massive data has outpaced Moore’s law in many scientific areas, the very notion of algorithms is transforming [...] Full article
(This article belongs to the Special Issue Algorithms and Data-Structures for Compressed Computation)

Research

Jump to: Editorial

23 pages, 795 KiB  
Article
Storing Set Families More Compactly with Top ZDDs
by Kotaro Matsuda, Shuhei Denzumi and Kunihiko Sadakane
Algorithms 2021, 14(6), 172; https://0-doi-org.brum.beds.ac.uk/10.3390/a14060172 - 31 May 2021
Cited by 1 | Viewed by 2263
Abstract
Zero-suppressed Binary Decision Diagrams (ZDDs) are data structures for representing set families in a compressed form. With ZDDs, many valuable operations on set families can be done in time polynomial in ZDD size. In some cases, however, the size of ZDDs for representing [...] Read more.
Zero-suppressed Binary Decision Diagrams (ZDDs) are data structures for representing set families in a compressed form. With ZDDs, many valuable operations on set families can be done in time polynomial in ZDD size. In some cases, however, the size of ZDDs for representing large set families becomes too huge to store them in the main memory. This paper proposes top ZDD, a novel representation of ZDDs which uses less space than existing ones. The top ZDD is an extension of the top tree, which compresses trees, to compress directed acyclic graphs by sharing identical subgraphs. We prove that navigational operations on ZDDs can be done in time poly-logarithmic in ZDD size, and show that there exist set families for which the size of the top ZDD is exponentially smaller than that of the ZDD. We also show experimentally that our top ZDDs have smaller sizes than ZDDs for real data. Full article
(This article belongs to the Special Issue Algorithms and Data-Structures for Compressed Computation)
Show Figures

Figure 1

15 pages, 1072 KiB  
Article
Text Indexing for Regular Expression Matching
by Daniel Gibney and Sharma V. Thankachan
Algorithms 2021, 14(5), 133; https://0-doi-org.brum.beds.ac.uk/10.3390/a14050133 - 23 Apr 2021
Cited by 6 | Viewed by 2620
Abstract
Finding substrings of a text T that match a regular expression p is a fundamental problem. Despite being the subject of extensive research, no solution with a time complexity significantly better than O(|T||p|) has been [...] Read more.
Finding substrings of a text T that match a regular expression p is a fundamental problem. Despite being the subject of extensive research, no solution with a time complexity significantly better than O(|T||p|) has been found. Backurs and Indyk in FOCS 2016 established conditional lower bounds for the algorithmic problem based on the Strong Exponential Time Hypothesis that helps explain this difficulty. A natural question is whether we can improve the time complexity for matching the regular expression by preprocessing the text T? We show that conditioned on the Online Matrix–Vector Multiplication (OMv) conjecture, even with arbitrary polynomial preprocessing time, a regular expression query on a text cannot be answered in strongly sublinear time, i.e., O(|T|1ε) for any ε>0. Furthermore, if we extend the OMv conjecture to a plausible conjecture regarding Boolean matrix multiplication with polynomial preprocessing time, which we call Online Matrix–Matrix Multiplication (OMM), we can strengthen this hardness result to there being no solution with a query time that is O(|T|3/2ε). These results hold for alphabet sizes three or greater. We then provide data structures that answer queries in O(|T||p|τ) time where τ[1,|T|] is fixed at construction. These include a solution that works for all regular expressions with Expτ·|T| preprocessing time and space. For patterns containing only ‘concatenation’ and ‘or’ operators (the same type used in the hardness result), we provide (1) a deterministic solution which requires Expτ·|T|log2|T| preprocessing time and space, and (2) when |p||T|z for z=2o(log|T|), a randomized solution with amortized query time which answers queries correctly with high probability, requiring Expτ·|T|2Ωlog|T| preprocessing time and space. Full article
(This article belongs to the Special Issue Algorithms and Data-Structures for Compressed Computation)
Show Figures

Figure 1

8 pages, 288 KiB  
Article
Compressed Communication Complexity of Hamming Distance
by Shiori Mitsuya, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai and Masayuki Takeda
Algorithms 2021, 14(4), 116; https://0-doi-org.brum.beds.ac.uk/10.3390/a14040116 - 03 Apr 2021
Cited by 3 | Viewed by 1831
Abstract
We consider the communication complexity of the Hamming distance of two strings. Bille et al. [SPIRE 2018] considered the communication complexity of the longest common prefix (LCP) problem in the setting where the two parties have their strings in a compressed form, i.e., [...] Read more.
We consider the communication complexity of the Hamming distance of two strings. Bille et al. [SPIRE 2018] considered the communication complexity of the longest common prefix (LCP) problem in the setting where the two parties have their strings in a compressed form, i.e., represented by the Lempel-Ziv 77 factorization (LZ77) with/without self-references. We present a randomized public-coin protocol for a joint computation of the Hamming distance of two strings represented by LZ77 without self-references. Although our scheme is heavily based on Bille et al.’s LCP protocol, our complexity analysis is original which uses Crochemore’s C-factorization and Rytter’s AVL-grammar. As a byproduct, we also show that LZ77 with/without self-references are not monotonic in the sense that their sizes can increase by a factor of 4/3 when a prefix of the string is removed. Full article
(This article belongs to the Special Issue Algorithms and Data-Structures for Compressed Computation)
19 pages, 385 KiB  
Article
Approximation Ratios of RePair, LongestMatch and Greedy on Unary Strings
by Danny Hucke and Carl Philipp Reh
Algorithms 2021, 14(2), 65; https://0-doi-org.brum.beds.ac.uk/10.3390/a14020065 - 20 Feb 2021
Cited by 2 | Viewed by 1713
Abstract
A grammar-based compressor is an algorithm that receives a word and outputs a context-free grammar that only produces this word. The approximation ratio for a single input word is the size of the grammar produced for this word divided by the size of [...] Read more.
A grammar-based compressor is an algorithm that receives a word and outputs a context-free grammar that only produces this word. The approximation ratio for a single input word is the size of the grammar produced for this word divided by the size of a smallest grammar for this word. The worst-case approximation ratio of a grammar-based compressor for a given word length is the largest approximation ratio over all input words of that length. In this work, we study the worst-case approximation ratio of the algorithms Greedy, RePair and LongestMatch on unary strings, i.e., strings that only make use of a single symbol. Our main contribution is to show the improved upper bound of O((logn)8·(loglogn)3) for the worst-case approximation ratio of Greedy. In addition, we also show the lower bound of 1.34847194 for the worst-case approximation ratio of Greedy, and that RePair and LongestMatch have a worst-case approximation ratio of log2(3). Full article
(This article belongs to the Special Issue Algorithms and Data-Structures for Compressed Computation)
Show Figures

Figure 1

21 pages, 844 KiB  
Article
Non-Overlapping LZ77 Factorization and LZ78 Substring Compression Queries with Suffix Trees
by Dominik Köppl
Algorithms 2021, 14(2), 44; https://0-doi-org.brum.beds.ac.uk/10.3390/a14020044 - 29 Jan 2021
Cited by 5 | Viewed by 3026
Abstract
We present algorithms computing the non-overlapping Lempel–Ziv-77 factorization and the longest previous non-overlapping factor table within small space in linear or near-linear time with the help of modern suffix tree representations fitting into limited space. With similar techniques, we show how to answer [...] Read more.
We present algorithms computing the non-overlapping Lempel–Ziv-77 factorization and the longest previous non-overlapping factor table within small space in linear or near-linear time with the help of modern suffix tree representations fitting into limited space. With similar techniques, we show how to answer substring compression queries for the Lempel–Ziv-78 factorization with a possible logarithmic multiplicative slowdown depending on the used suffix tree representation. Full article
(This article belongs to the Special Issue Algorithms and Data-Structures for Compressed Computation)
Show Figures

Figure 1

Back to TopTop