skip to main content
10.1007/978-3-031-43980-3guideproceedingsBook PagePublication PagesConference Proceedingsacm-pubtype
String Processing and Information Retrieval: 30th International Symposium, SPIRE 2023, Pisa, Italy, September 26–28, 2023, Proceedings
2023 Proceeding
Publisher:
  • Springer-Verlag
  • Berlin, Heidelberg
Conference:
International Symposium on String Processing and Information RetrievalPisa, Italy26 September 2023
ISBN:
978-3-031-43979-7
Published:
12 October 2023

Reflects downloads up to 01 Nov 2024Bibliometrics
Abstract

No abstract available.

front-matter
Front Matter
Pages i–xix
back-matter
Back Matter
Article
Longest Common Prefix Arrays for Succinct k-Spectra
Abstract

The k-spectrum of a string is the set of all distinct substrings of length k occurring in the string. K-spectra have many applications in bioinformatics including pseudoalignment and genome assembly. The Spectral Burrows-Wheeler Transform (SBWT) ...

Article
On Suffix Tree Detection
Abstract

A suffix tree is a fundamental data structure for string processing and information retrieval, however, its structure is still not well understood. The suffix trees reverse engineering problem, which its research aims at reducing this gap, is the ...

Article
Optimally Computing Compressed Indexing Arrays Based on the Compact Directed Acyclic Word Graph
Abstract

In this paper, we present the first study of the computational complexity of converting an automata-based text index structure, called the Compact Directed Acyclic Word Graph (CDAWG), of size e for a text T of length n into other text indexing ...

Article
Evaluating Regular Path Queries on Compressed Adjacency Matrices
Abstract

Regular Path Queries (RPQs), which are essentially regular expressions to be matched against the labels of paths in labeled graphs, are at the core of graph database query languages like SPARQL. A way to solve RPQs is to translate them into a ...

Article
Approximate Cartesian Tree Matching: An Approach Using Swaps
Abstract

Cartesian tree pattern matching consists of finding all the factors of a text that have the same Cartesian tree than a given pattern. There already exist theoretical and practical solutions for the exact case. In this paper, we propose the first ...

Article
Optimal Wheeler Language Recognition
Abstract

A Wheeler automaton is a finite state automaton whose states admit a total Wheeler order, reflecting the co-lexicographic order of the strings labeling source-to-node paths). A Wheeler language is a regular language admitting an accepting Wheeler ...

Article
Approximation and Fixed Parameter Algorithms for the Approximate Cover Problem
Abstract

Amir et al. (CPM 2017) introduce the approximate string cover problem (ACP) motivated by applications including molecular biology, coding, automata theory, formal language theory and combinatorics. A cover of a string T is a string C for which ...

Article
Data Structures for SMEM-Finding in the PBWT
Abstract

The positional Burrows–Wheeler Transform (PBWT) was presented as a means to find set-maximal exact matches (SMEMs) in haplotype data via the computation of the divergence array. Although run-length encoding the PBWT has been previously considered, ...

Article
Compressibility Measures for Two-Dimensional Data
Abstract

In this paper we extend to two-dimensional data two recently introduced one-dimensional compressibility measures: the measure defined in terms of the smallest string attractor, and the measure defined in terms of the number of distinct ...

Article
From de Bruijn Graphs to Variation Graphs – Relationships Between Pangenome Models
Abstract

Pangenomes serve as a framework for joint analysis of genomes of related organisms. Several pangenome models were proposed, offering different functionalities, applications provided by available tools, their efficiency etc. Among them, two graph-...

Article
CAGE: Cache-Aware Graphlet Enumeration
Abstract

When information is (implicitly or explicitly) linked in its own nature, and is modeled as a network, retrieving patterns can benefit from this linked structure. In networks, “graphlets” (connected induced subgraphs of a given size k) are the ...

Article
Space-Time Trade-Offs for the LCP Array of Wheeler DFAs
Abstract

Recently, Conte et al. generalized the longest-common prefix (LCP) array from strings to Wheeler DFAs, and they showed that it can be used to efficiently determine matching statistics on a Wheeler DFA [DCC 2023]. However, storing the LCP array ...

Article
Computing All-vs-All MEMs in Grammar-Compressed Text
Abstract

We describe a compression-aware method to find all-vs-all maximal exact matches (MEM) among strings of a repetitive collection . The key concept in our work is the construction of a fully-balanced grammar from that meets a property that we call ...

Article
Sublinear Time Lempel-Ziv (LZ77) Factorization
Abstract

The Lempel-Ziv (LZ77) factorization of a string is a widely-used algorithmic tool that plays a central role in data compression and indexing. For a length-n string over integer alphabet with , and on a word RAM of width , it can be computed in ...

Article
New Advances in Rightmost Lempel-Ziv
Abstract

The Lempel-Ziv (LZ) 77 factorization of a string is a widely-used algorithmic tool that plays a central role in compression and indexing. For a length-n string over a linearly-sortable alphabet, e.g., with , it can be computed in time. It is ...

Article
Engineering a Textbook Approach to Index Massive String Dictionaries
Abstract

We study the problem of engineering space-time efficient indexes that support membership and lexicographic (rank) queries on very large static dictionaries of strings.

Our solution is based on a very simple approach that consists of decoupling ...

Article
Count-Min Sketch with Variable Number of Hash Functions: An Experimental Study
Abstract

Conservative Count-Min, a stronger version of the popular Count-Min sketch [Cormode, Muthukrishnan 2005], is an online-maintained hashing-based sketch summarizing element frequency information of a stream. Although several works attempted to ...

Article
Dynamic Compact Planar Embeddings
Abstract

This paper presents a way to compactly represent dynamic connected planar embeddings, which may contain self loops and multi-edges, in bits, to support basic navigation in time and edge and vertex insertion and deletion in time, where n and m ...

Article
A Simple Grammar-Based Index for Finding Approximately Longest Common Substrings
Abstract

We show how, given positive constants and , and an -balanced straight-line program with g rules for a text T[1..n], we can build an O(g)-space index that, given a pattern P[1..m], in time finds w.h.p. a substring of P that occurs in T and whose ...

Article
On the Number of Factors in the LZ-End Factorization
Abstract

Kreft and Navarro [DCC 2010] introduced a restricted variant of the well-known Lempel-Ziv factorization, called the LZ-End factorization. Only recently Kempa and Saha [SODA2022] were able to obtain a good upper bound on the size of the LZ-End ...

Article
Non-overlapping Indexing in BWT-Runs Bounded Space
Abstract

We revisit the non-overlapping indexing problem for an efficient repetition-aware solution. The problem is to index a text T[1..n], such that whenever a pattern P[1..p] comes as a query, we can report the largest set of non-overlapping occurrences ...

Article
Efficient Parameterized Pattern Matching in Sublinear Space
Abstract

The parameterized matching problem is a variant of string matching, which is to search for all parameterized occurrences of a pattern P in a text T. In considering matching algorithms, the combinatorial natures of strings, especially periodicity, ...

Article
Largest Repetition Factorization of Fibonacci Words
Abstract

A factorization of a string w is said to be a repetition factorization of w if every factor in the factorization is a repetition (i.e., the factor has a period shorter than or equal to the half of its length). Inoue et al. [TOCS 2022] showed how ...

Article
String Covers of a Tree Revisited
Abstract

We consider covering labeled trees with a collection of paths with the same string label, called a (string) cover of a tree. This problem was originated by Radoszewski et al. (SPIRE 2021), who show how to compute all covers of a directed rooted ...

Article
Compacting Massive Public Transport Data
Abstract

In this work, we present a compact method for storing and indexing users’ trips across transport networks. This research is part of a larger project focused on providing transportation managers with the tools to analyze the need for improvements ...

Article
Constant Time and Space Updates for the Sigma-Tau Problem
Abstract

Sawada and Williams in [SODA 2018] and [ACM Trans. Alg. 2020] gave algorithms for constructing Hamiltonian paths and cycles in the Sigma-Tau graph, thereby solving a problem of Nijenhuis and Wilf that had been open for over 40 years. The Sigma-Tau ...

Article
Linear-Time Computation of Generalized Minimal Absent Words for Multiple Strings
Abstract

A string w is called a minimal absent word (MAW) for a string S if w does not occur as a substring in S and all proper substrings of w occur in S. MAWs are well-studied combinatorial string objects that have potential applications in areas ...

Article
Frequency-Constrained Substring Complexity
Abstract

We introduce the notion of frequency-constrained substring complexity. For any finite string, it counts the distinct substrings of the string per length and frequency class. For a string x of length n and a partition of [n] in intervals, , the ...

Contributors
  • Institute of Information Science and Technologies "Alessandro Faedo"
  • University of Pisa
  • University of Pisa
Please enable JavaScript to view thecomments powered by Disqus.

Recommendations