No abstract available.
Front Matter
Longest Common Prefix Arrays for Succinct k-Spectra
The k-spectrum of a string is the set of all distinct substrings of length k occurring in the string. K-spectra have many applications in bioinformatics including pseudoalignment and genome assembly. The Spectral Burrows-Wheeler Transform (SBWT) ...
On Suffix Tree Detection
A suffix tree is a fundamental data structure for string processing and information retrieval, however, its structure is still not well understood. The suffix trees reverse engineering problem, which its research aims at reducing this gap, is the ...
Optimally Computing Compressed Indexing Arrays Based on the Compact Directed Acyclic Word Graph
In this paper, we present the first study of the computational complexity of converting an automata-based text index structure, called the Compact Directed Acyclic Word Graph (CDAWG), of size e for a text T of length n into other text indexing ...
Evaluating Regular Path Queries on Compressed Adjacency Matrices
Regular Path Queries (RPQs), which are essentially regular expressions to be matched against the labels of paths in labeled graphs, are at the core of graph database query languages like SPARQL. A way to solve RPQs is to translate them into a ...
Approximate Cartesian Tree Matching: An Approach Using Swaps
Cartesian tree pattern matching consists of finding all the factors of a text that have the same Cartesian tree than a given pattern. There already exist theoretical and practical solutions for the exact case. In this paper, we propose the first ...
Optimal Wheeler Language Recognition
A Wheeler automaton is a finite state automaton whose states admit a total Wheeler order, reflecting the co-lexicographic order of the strings labeling source-to-node paths). A Wheeler language is a regular language admitting an accepting Wheeler ...
Approximation and Fixed Parameter Algorithms for the Approximate Cover Problem
Amir et al. (CPM 2017) introduce the approximate string cover problem (ACP) motivated by applications including molecular biology, coding, automata theory, formal language theory and combinatorics. A cover of a string T is a string C for which ...
Data Structures for SMEM-Finding in the PBWT
The positional Burrows–Wheeler Transform (PBWT) was presented as a means to find set-maximal exact matches (SMEMs) in haplotype data via the computation of the divergence array. Although run-length encoding the PBWT has been previously considered, ...
Compressibility Measures for Two-Dimensional Data
In this paper we extend to two-dimensional data two recently introduced one-dimensional compressibility measures: the measure defined in terms of the smallest string attractor, and the measure defined in terms of the number of distinct ...
From de Bruijn Graphs to Variation Graphs – Relationships Between Pangenome Models
Pangenomes serve as a framework for joint analysis of genomes of related organisms. Several pangenome models were proposed, offering different functionalities, applications provided by available tools, their efficiency etc. Among them, two graph-...
CAGE: Cache-Aware Graphlet Enumeration
When information is (implicitly or explicitly) linked in its own nature, and is modeled as a network, retrieving patterns can benefit from this linked structure. In networks, “graphlets” (connected induced subgraphs of a given size k) are the ...
Space-Time Trade-Offs for the LCP Array of Wheeler DFAs
Recently, Conte et al. generalized the longest-common prefix (LCP) array from strings to Wheeler DFAs, and they showed that it can be used to efficiently determine matching statistics on a Wheeler DFA [DCC 2023]. However, storing the LCP array ...
Sublinear Time Lempel-Ziv (LZ77) Factorization
The Lempel-Ziv (LZ77) factorization of a string is a widely-used algorithmic tool that plays a central role in data compression and indexing. For a length-n string over integer alphabet with , and on a word RAM of width , it can be computed in ...
New Advances in Rightmost Lempel-Ziv
The Lempel-Ziv (LZ) 77 factorization of a string is a widely-used algorithmic tool that plays a central role in compression and indexing. For a length-n string over a linearly-sortable alphabet, e.g., with , it can be computed in time. It is ...
Engineering a Textbook Approach to Index Massive String Dictionaries
We study the problem of engineering space-time efficient indexes that support membership and lexicographic (rank) queries on very large static dictionaries of strings.
Our solution is based on a very simple approach that consists of decoupling ...
Count-Min Sketch with Variable Number of Hash Functions: An Experimental Study
Conservative Count-Min, a stronger version of the popular Count-Min sketch [Cormode, Muthukrishnan 2005], is an online-maintained hashing-based sketch summarizing element frequency information of a stream. Although several works attempted to ...
Dynamic Compact Planar Embeddings
This paper presents a way to compactly represent dynamic connected planar embeddings, which may contain self loops and multi-edges, in bits, to support basic navigation in time and edge and vertex insertion and deletion in time, where n and m ...
A Simple Grammar-Based Index for Finding Approximately Longest Common Substrings
We show how, given positive constants and , and an -balanced straight-line program with g rules for a text T[1..n], we can build an O(g)-space index that, given a pattern P[1..m], in time finds w.h.p. a substring of P that occurs in T and whose ...
On the Number of Factors in the LZ-End Factorization
Kreft and Navarro [DCC 2010] introduced a restricted variant of the well-known Lempel-Ziv factorization, called the LZ-End factorization. Only recently Kempa and Saha [SODA2022] were able to obtain a good upper bound on the size of the LZ-End ...
Non-overlapping Indexing in BWT-Runs Bounded Space
We revisit the non-overlapping indexing problem for an efficient repetition-aware solution. The problem is to index a text T[1..n], such that whenever a pattern P[1..p] comes as a query, we can report the largest set of non-overlapping occurrences ...
Efficient Parameterized Pattern Matching in Sublinear Space
The parameterized matching problem is a variant of string matching, which is to search for all parameterized occurrences of a pattern P in a text T. In considering matching algorithms, the combinatorial natures of strings, especially periodicity, ...
Largest Repetition Factorization of Fibonacci Words
A factorization of a string w is said to be a repetition factorization of w if every factor in the factorization is a repetition (i.e., the factor has a period shorter than or equal to the half of its length). Inoue et al. [TOCS 2022] showed how ...
String Covers of a Tree Revisited
We consider covering labeled trees with a collection of paths with the same string label, called a (string) cover of a tree. This problem was originated by Radoszewski et al. (SPIRE 2021), who show how to compute all covers of a directed rooted ...
Compacting Massive Public Transport Data
In this work, we present a compact method for storing and indexing users’ trips across transport networks. This research is part of a larger project focused on providing transportation managers with the tools to analyze the need for improvements ...
Constant Time and Space Updates for the Sigma-Tau Problem
Sawada and Williams in [SODA 2018] and [ACM Trans. Alg. 2020] gave algorithms for constructing Hamiltonian paths and cycles in the Sigma-Tau graph, thereby solving a problem of Nijenhuis and Wilf that had been open for over 40 years. The Sigma-Tau ...
Linear-Time Computation of Generalized Minimal Absent Words for Multiple Strings
A string w is called a minimal absent word (MAW) for a string S if w does not occur as a substring in S and all proper substrings of w occur in S. MAWs are well-studied combinatorial string objects that have potential applications in areas ...
Frequency-Constrained Substring Complexity
We introduce the notion of frequency-constrained substring complexity. For any finite string, it counts the distinct substrings of the string per length and frequency class. For a string x of length n and a partition of [n] in intervals, , the ...
Recommendations
The integration of french language processing in an information retrieval
RIAO '97: Computer-Assisted Information Searching on Internet - Volume 2Cet article décrit les approches que nous avons implantées dans le cadre d'une collaboration de recherche entre nos deux groupes. Ces approches visent à créer une représentation plus précise pour les documents et les requêtes dans un système de ...