Submit to Algorithms Review for Algorithms Propose a Special Issue

Journal Browser

Algorithms and Data-Structures for Compressed Computation

Special Issue Editors
Special Issue Information
Keywords
Published Papers

A special issue of Algorithms (ISSN 1999-4893). This special issue belongs to the section "Databases and Data Structures".

Deadline for manuscript submissions: closed (30 January 2021) | Viewed by 13702

Share This Special Issue

Special Issue Editors

Prof. Dr. Alberto Policriti

E-Mail Website
Guest Editor

Department of Mathematics, Computer Science and Physics, Via delle Scienze 206, 33100 Udine, Italy
Interests: algorithms for bioinformatics and computational biology; logic decision algorithms; algorithms and data-structures for compressed computation
Special Issues, Collections and Topics in MDPI journals

Dr. Nicola Prezza

E-Mail Website
Guest Editor

Department of Business and Management, LUISS Guido Carli, viale Romania 32, 00197 Rome, Italy
Interests: data structures; algorithms; data compression; bioinformatics
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

As the production of massive data has outpaced Moore’s law in many scientific areas, the very notion of algorithm is shaping up to keep the pace. Today’s algorithms must not only be able to deal with large amounts of data, they also must be capable to use resources close to their theoretical limits. Against this landscape, compression has become a fundamental tool to make the storage of this kind of data possible in the first place.

While motivated by the fact that massive data are often very compressible, this approach poses interesting algorithmic challenges: Can we compute directly on compressed data without first decompressing it? Can we propose measures and frameworks to compare different approaches? Can different techniques for compressed computation be unified?

Thanks to more than two decades of successful research on the subject, today we know that many algorithmic marvels are indeed possible. In fact, compression and computation are two sides of the same coin, as they exploit the same underlying structure both for storing and operating on data.

Despite numerous successes in the field, however, several challenges remain open, including (but not limited to) extending techniques to more/less structured and massive data. The goal of this Special Issue is precisely to advance research in the field of compressed computation. We welcome the submission of high-quality papers dealing with compressed data structures, data compression, string/tree/graph processing, and combinatorics.

Dr. Alberto Policriti
Dr. Nicola Prezza
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Algorithms is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

Compressed data structures
Data compression
String processing
Combinatorics on strings
Text indexing
Pattern matching

Published Papers (6 papers)

Download All Papers

Editorial

Jump to: Research

2 pages, 134 KiB

Open AccessEditorial

Special Issue on Algorithms and Data-Structures for Compressed Computation

by Alberto Policriti and Nicola Prezza

Algorithms 2022, 15(12), 457; https://0-doi-org.brum.beds.ac.uk/10.3390/a15120457 - 02 Dec 2022

Viewed by 982

Abstract

As the production of massive data has outpaced Moore’s law in many scientific areas, the very notion of algorithms is transforming [...] Full article

(This article belongs to the Special Issue Algorithms and Data-Structures for Compressed Computation)

Research

Jump to: Editorial

23 pages, 795 KiB

Open AccessArticle

Storing Set Families More Compactly with Top ZDDs

by Kotaro Matsuda, Shuhei Denzumi and Kunihiko Sadakane

Algorithms 2021, 14(6), 172; https://0-doi-org.brum.beds.ac.uk/10.3390/a14060172 - 31 May 2021

Cited by 1 | Viewed by 2263

Abstract

Zero-suppressed Binary Decision Diagrams (ZDDs) are data structures for representing set families in a compressed form. With ZDDs, many valuable operations on set families can be done in time polynomial in ZDD size. In some cases, however, the size of ZDDs for representing large set families becomes too huge to store them in the main memory. This paper proposes top ZDD, a novel representation of ZDDs which uses less space than existing ones. The top ZDD is an extension of the top tree, which compresses trees, to compress directed acyclic graphs by sharing identical subgraphs. We prove that navigational operations on ZDDs can be done in time poly-logarithmic in ZDD size, and show that there exist set families for which the size of the top ZDD is exponentially smaller than that of the ZDD. We also show experimentally that our top ZDDs have smaller sizes than ZDDs for real data. Full article

(This article belongs to the Special Issue Algorithms and Data-Structures for Compressed Computation)

► Show Figures

Figure 1

15 pages, 1072 KiB

Open AccessArticle

Text Indexing for Regular Expression Matching

by Daniel Gibney and Sharma V. Thankachan

Algorithms 2021, 14(5), 133; https://0-doi-org.brum.beds.ac.uk/10.3390/a14050133 - 23 Apr 2021

Cited by 6 | Viewed by 2620

Abstract

Finding substrings of a text T that match a regular expression p is a fundamental problem. Despite being the subject of extensive research, no solution with a time complexity significantly better than

O (| T | | p |)

has been [...] Read more.

O (| T | | p |)

has been found. Backurs and Indyk in FOCS 2016 established conditional lower bounds for the algorithmic problem based on the Strong Exponential Time Hypothesis that helps explain this difficulty. A natural question is whether we can improve the time complexity for matching the regular expression by preprocessing the text T? We show that conditioned on the Online Matrix–Vector Multiplication (OMv) conjecture, even with arbitrary polynomial preprocessing time, a regular expression query on a text cannot be answered in strongly sublinear time, i.e.,

O (| T |^{1 - ε})

for any

ε > 0

. Furthermore, if we extend the OMv conjecture to a plausible conjecture regarding Boolean matrix multiplication with polynomial preprocessing time, which we call Online Matrix–Matrix Multiplication (OMM), we can strengthen this hardness result to there being no solution with a query time that is

O (| T |^{3 / 2 - ε})

. These results hold for alphabet sizes three or greater. We then provide data structures that answer queries in

O (\frac{| T | | p |}{τ})

time where

τ \in [1, | T |]

is fixed at construction. These include a solution that works for all regular expressions with

Exp (τ \cdot | T |)

preprocessing time and space. For patterns containing only ‘concatenation’ and ‘or’ operators (the same type used in the hardness result), we provide (1) a deterministic solution which requires

Exp (τ \cdot \frac{| T |}{{log}^{2} | T |})

preprocessing time and space, and (2) when

| p | \leq {| T |}^{z}

for

z = 2^{o (\sqrt{log | T |})}

, a randomized solution with amortized query time which answers queries correctly with high probability, requiring

Exp (τ \cdot \frac{| T |}{2^{Ω \sqrt{log | T |}}})

preprocessing time and space. Full article

(This article belongs to the Special Issue Algorithms and Data-Structures for Compressed Computation)

► Show Figures

Figure 1

8 pages, 288 KiB

Open AccessArticle

Compressed Communication Complexity of Hamming Distance

by Shiori Mitsuya, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai and Masayuki Takeda

Algorithms 2021, 14(4), 116; https://0-doi-org.brum.beds.ac.uk/10.3390/a14040116 - 03 Apr 2021

Cited by 3 | Viewed by 1831

Abstract

We consider the communication complexity of the Hamming distance of two strings. Bille et al. [SPIRE 2018] considered the communication complexity of the longest common prefix (LCP) problem in the setting where the two parties have their strings in a compressed form, i.e., represented by the Lempel-Ziv 77 factorization (LZ77) with/without self-references. We present a randomized public-coin protocol for a joint computation of the Hamming distance of two strings represented by LZ77 without self-references. Although our scheme is heavily based on Bille et al.’s LCP protocol, our complexity analysis is original which uses Crochemore’s C-factorization and Rytter’s AVL-grammar. As a byproduct, we also show that LZ77 with/without self-references are not monotonic in the sense that their sizes can increase by a factor of 4/3 when a prefix of the string is removed. Full article

(This article belongs to the Special Issue Algorithms and Data-Structures for Compressed Computation)

19 pages, 385 KiB

Open AccessArticle

Approximation Ratios of RePair, LongestMatch and Greedy on Unary Strings

by Danny Hucke and Carl Philipp Reh

Algorithms 2021, 14(2), 65; https://0-doi-org.brum.beds.ac.uk/10.3390/a14020065 - 20 Feb 2021

Cited by 2 | Viewed by 1713

Abstract

A grammar-based compressor is an algorithm that receives a word and outputs a context-free grammar that only produces this word. The approximation ratio for a single input word is the size of the grammar produced for this word divided by the size of a smallest grammar for this word. The worst-case approximation ratio of a grammar-based compressor for a given word length is the largest approximation ratio over all input words of that length. In this work, we study the worst-case approximation ratio of the algorithms Greedy, RePair and LongestMatch on unary strings, i.e., strings that only make use of a single symbol. Our main contribution is to show the improved upper bound of

O ({(log n)}^{8} \cdot {(log log n)}^{3})

for the worst-case approximation ratio of Greedy. In addition, we also show the lower bound of

1.34847194 \dots

for the worst-case approximation ratio of Greedy, and that RePair and LongestMatch have a worst-case approximation ratio of

{log}_{2} (3)

. Full article

(This article belongs to the Special Issue Algorithms and Data-Structures for Compressed Computation)

► Show Figures

Figure 1

21 pages, 844 KiB

Open AccessArticle

Non-Overlapping LZ77 Factorization and LZ78 Substring Compression Queries with Suffix Trees

by Dominik Köppl

Algorithms 2021, 14(2), 44; https://0-doi-org.brum.beds.ac.uk/10.3390/a14020044 - 29 Jan 2021

Cited by 5 | Viewed by 3026

Abstract

We present algorithms computing the non-overlapping Lempel–Ziv-77 factorization and the longest previous non-overlapping factor table within small space in linear or near-linear time with the help of modern suffix tree representations fitting into limited space. With similar techniques, we show how to answer substring compression queries for the Lempel–Ziv-78 factorization with a possible logarithmic multiplicative slowdown depending on the used suffix tree representation. Full article

(This article belongs to the Special Issue Algorithms and Data-Structures for Compressed Computation)

► Show Figures