String Matching and Its Applications

A special issue of Algorithms (ISSN 1999-4893).

Deadline for manuscript submissions: closed (31 October 2019) | Viewed by 17536

Special Issue Editors


E-Mail Website
Guest Editor
LITIS EA 4108, Normastic FR3638, IRIB, University of Rouen Normandie, Normandie University, Rouen, France
Interests: stringology; bioinformatics; combinatorics on words

E-Mail Website
Guest Editor
Department of Mathematics and Computer Science, University of Catania, I-95125 Catania, Italy
Interests: algorithms on strings; algorithms on graphs; bioinformatics
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

With the rapid growth of available data in almost all fields, there is large demand for efficient pattern-matching algorithms. We invite you to submit your latest research in the area of string matching (single or multiple, on-line or off-line, exact or approximate, in uncompressed or compressed form) describing new data structures and/or new algorithms. High-quality papers are solicited to address both theoretical and practical issues of string matching including, but not restricted to, natural language processing, text mining, bioinformatics (DNA, RNA, protein sequences), chemoinformatics, intrusion detection, security, plagiarism detection, digital forensics, video retrieval, and music analysis.

Prof. Dr. Thierry Lecroq
Prof. Dr. Simone Faro
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Algorithms is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • exact string matching
  • approximate string matching
  • multiple string matching
  • development and analysis of algorithms
  • text indexing
  • data structures for string matching
  • information retrieval
  • searching for regularities
  • string matching in natural language processing
  • string matching in bioinformatics

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

11 pages, 829 KiB  
Article
Lyndon Factorization Algorithms for Small Alphabets and Run-Length Encoded Strings
by Sukhpal Singh Ghuman, Emanuele Giaquinta and Jorma Tarhio
Algorithms 2019, 12(6), 124; https://0-doi-org.brum.beds.ac.uk/10.3390/a12060124 - 21 Jun 2019
Viewed by 4230
Abstract
We present two modifications of Duval’s algorithm for computing the Lyndon factorization of a string. One of the algorithms has been designed for strings containing runs of the smallest character. It works best for small alphabets and it is able to skip a [...] Read more.
We present two modifications of Duval’s algorithm for computing the Lyndon factorization of a string. One of the algorithms has been designed for strings containing runs of the smallest character. It works best for small alphabets and it is able to skip a significant number of characters of the string. Moreover, it can be engineered to have linear time complexity in the worst case. When there is a run-length encoded string R of length ρ , the other algorithm computes the Lyndon factorization of R in O ( ρ ) time and in constant space. It is shown by experimental results that the new variations are faster than Duval’s original algorithm in many scenarios. Full article
(This article belongs to the Special Issue String Matching and Its Applications)
Show Figures

Figure 1

15 pages, 344 KiB  
Article
Applications of Non-Uniquely Decodable Codes to Privacy-Preserving High-Entropy Data Representation
by Muhammed Oğuzhan Külekci and Yasin Öztürk
Algorithms 2019, 12(4), 78; https://0-doi-org.brum.beds.ac.uk/10.3390/a12040078 - 17 Apr 2019
Cited by 1 | Viewed by 3952
Abstract
Non-uniquely-decodable (non-UD) codes can be defined as the codes that cannot be uniquely decoded without additional disambiguation information. These are mainly the class of non–prefix–free codes, where a code-word can be a prefix of other(s), and thus, the code-word boundary information is essential [...] Read more.
Non-uniquely-decodable (non-UD) codes can be defined as the codes that cannot be uniquely decoded without additional disambiguation information. These are mainly the class of non–prefix–free codes, where a code-word can be a prefix of other(s), and thus, the code-word boundary information is essential for correct decoding. Due to their inherent unique decodability problem, such non-UD codes have not received much attention except a few studies, in which using compressed data structures to represent the disambiguation information efficiently had been previously proposed. It had been shown before that the compression ratio can get quite close to Huffman/Arithmetic codes with an additional capability of providing direct access in compressed data, which is a missing feature in the regular Huffman codes. In this study we investigate non-UD codes in another dimension addressing the privacy of the high-entropy data. We particularly focus on such massive volumes, where typical examples are encoded video or similar multimedia files. Representation of such a volume with non–UD coding creates two elements as the disambiguation information and the payload, where decoding the original data from these elements becomes hard when one of them is missing. We make use of this observation for privacy concerns. and study the space consumption as well as the hardness of that decoding. We conclude that non-uniquely-decodable codes can be an alternative to selective encryption schemes that aim to secure only part of the data when data is huge. We provide a freely available software implementation of the proposed scheme as well. Full article
(This article belongs to the Special Issue String Matching and Its Applications)
Show Figures

Figure 1

20 pages, 1984 KiB  
Article
Permuted Pattern Matching Algorithms on Multi-Track Strings
by Diptarama Hendrian, Yohei Ueki, Kazuyuki Narisawa, Ryo Yoshinaka and Ayumi Shinohara
Algorithms 2019, 12(4), 73; https://0-doi-org.brum.beds.ac.uk/10.3390/a12040073 - 08 Apr 2019
Cited by 3 | Viewed by 5030
Abstract
A multi-track string is a tuple of strings of the same length. Given the pattern and text of two multi-track strings, the permuted pattern matching problem is to find the occurrence positions of all permutations of the pattern in the text. In this [...] Read more.
A multi-track string is a tuple of strings of the same length. Given the pattern and text of two multi-track strings, the permuted pattern matching problem is to find the occurrence positions of all permutations of the pattern in the text. In this paper, we propose several algorithms for permuted pattern matching. Our first algorithm, which is based on the Knuth–Morris–Pratt (KMP) algorithm, has a fast theoretical computing time with O ( m k ) as the preprocessing time and O ( n k log σ ) as the matching time, where n, m, k, σ , and occ denote the length of the text, the length of the pattern, the number of strings in the multi-track, the alphabet size, and the number of occurrences of the pattern, respectively. We then improve the KMP-based algorithm by using an automaton, which has a better experimental running time. The next proposed algorithms are based on the Boyer–Moore algorithm and the Horspool algorithm that try to perform pattern matching. These algorithms are the fastest experimental algorithms. Furthermore, we propose an extension of the AC-automaton algorithm that can solve dictionary matching on multi-tracks, which is a task to find multiple multi-track patterns in a multi-track text. Finally, we propose filtering algorithms that can perform permuted pattern matching quickly in practice. Full article
(This article belongs to the Special Issue String Matching and Its Applications)
Show Figures

Figure 1

18 pages, 353 KiB  
Article
Efficient Algorithms for Computing the Inner Edit Distance of a Regular Language via Transducers
by Lila Kari, Stavros Konstantinidis, Steffen Kopecki and Meng Yang
Algorithms 2018, 11(11), 165; https://0-doi-org.brum.beds.ac.uk/10.3390/a11110165 - 23 Oct 2018
Viewed by 2829
Abstract
The concept of edit distance and its variants has applications in many areas such as computational linguistics, bioinformatics, and synchronization error detection in data communications. Here, we revisit the problem of computing the inner edit distance of a regular language given via a [...] Read more.
The concept of edit distance and its variants has applications in many areas such as computational linguistics, bioinformatics, and synchronization error detection in data communications. Here, we revisit the problem of computing the inner edit distance of a regular language given via a Nondeterministic Finite Automaton (NFA). This problem relates to the inherent maximal error-detecting capability of the language in question. We present two efficient algorithms for solving this problem, both of which execute in time O ( r 2 n 2 d ) , where r is the cardinality of the alphabet involved, n is the number of transitions in the given NFA, and d is the computed edit distance. We have implemented one of the two algorithms and present here a set of performance tests. The correctness of the algorithms is based on the connection between word distances and error detection and the fact that nondeterministic transducers can be used to represent the errors (resp., edit operations) involved in error-detection (resp., in word distances). Full article
(This article belongs to the Special Issue String Matching and Its Applications)
Show Figures

Figure 1

Back to TopTop