PVLDB: Vol 17, No 8

Volume 17, Issue 8April 2024

Volume 17, Issue 8

April 2024

Editor:

Meihui Zhang
Beijing Institute of Technology
,
Cyrus Shahabi
University of Southern California

Publisher:

VLDB Endowment

ISSN:2150-8097

Subscribe to Journal Recommend ACM DL

ALREADY A SUBSCRIBER?SIGN IN

Bibliometrics

Issue Downloads

PDFFront matter (Cover, Contents, Organization, Letter from the editors in chief)

Select All

Export Citations Save to Binder

research-article

FlowWalker: A Memory-Efficient and High-Performance GPU-Based Dynamic Graph Random Walk Framework

Pages 1788–1801https://doi.org/10.14778/3659437.3659438

Dynamic graph random walk (DGRW) emerges as a practical tool for capturing structural relations within a graph. Effectively executing DGRW on GPU presents certain challenges. First, existing sampling methods demand a pre-processing buffer, causing ...

research-article

Accelerating String-Key Learned Index Structures via Memoization-Based Incremental Training

Pages 1802–1815https://doi.org/10.14778/3659437.3659439

Learned indexes use machine learning models to learn the mappings between keys and their corresponding positions in key-value indexes. These indexes use the mapping information as training data. Learned indexes require frequent retrainings of their ...

research-article

Truss-Based Community Search over Streaming Directed Graphs

Pages 1816–1829https://doi.org/10.14778/3659437.3659440

Community search aims to retrieve dense subgraphs that contain the query vertices. While many effective community models and algorithms have been proposed in the literature, none of them address the unique challenges posed by streaming graphs, where ...

research-article

InferDB: In-Database Machine Learning Inference Using Indexes

Pages 1830–1842https://doi.org/10.14778/3659437.3659441

The performance of inference with machine learning (ML) models and its integration with analytical query processing have become critical bottlenecks for data analysis in many organizations. An ML inference pipeline typically consists of a preprocessing ...

research-article

AAA: An Adaptive Mechanism for Locally Differentially Private Mean Estimation

Pages 1843–1855https://doi.org/10.14778/3659437.3659442

Local differential privacy (LDP) is a strong privacy standard that has been adopted by popular software systems, including Chrome, iOS, MacOS, and Windows. The main idea is that each individual perturbs their own data locally, and only submits the ...

research-article

Accelerating Merkle Patricia Trie with GPU

Pages 1856–1869https://doi.org/10.14778/3659437.3659443

Merkle Patricia Trie (MPT) is a type of trie structure that offers efficient lookup and insert operators for immutable data systems that require multi-version access and tamper-evident controls, such as blockchains and verifiable databases. The ...

research-article

Privacy Amplification via Shuffling: Unified, Simplified, and Tightened

Pages 1870–1883https://doi.org/10.14778/3659437.3659444

The shuffle model of differential privacy provides promising privacy-utility balances in decentralized, privacy-preserving data analysis. However, the current analyses of privacy amplification via shuffling lack both tightness and generality. To address ...

research-article

Detecting Metadata-Related Logic Bugs in Database Systems via Raw Database Construction

Pages 1884–1897https://doi.org/10.14778/3659437.3659445

Database Management Systems (DBMSs) are widely used to efficiently store and retrieve data. DBMSs usually support various metadata, e.g., integrity constraints for ensuring data integrity and indexes for locating data. DBMSs can further utilize these ...

research-article

From Zero to Hero: Detecting Leaked Data through Synthetic Data Injection and Model Querying

Pages 1898–1910https://doi.org/10.14778/3659437.3659446

Safeguarding the Intellectual Property (IP) of data has become critically important as machine learning applications continue to proliferate, and their success heavily relies on the quality of training data. While various mechanisms exist to secure data ...

research-article

Oasis: An Optimal Disjoint Segmented Learned Range Filter

Pages 1911–1924https://doi.org/10.14778/3659437.3659447

The learning-enhanced data structure has inspired the development of the range filter, bringing significantly better false positive rate (FPR) than traditional non-learned range filters. Its core idea is to employ piece-wise linear functions that ...

research-article

LakeBench: A Benchmark for Discovering Joinable and Unionable Tables in Data Lakes

Pages 1925–1938https://doi.org/10.14778/3659437.3659448

Discovering tables from poorly maintained data lakes is a significant challenge in data management. Two key tasks are identifying joinable and unionable tables, crucial for data integration, analysis, and machine learning. However, there's a lack of a ...

research-article

GPTuner: A Manual-Reading Database Tuning System via GPT-Guided Bayesian Optimization

Pages 1939–1952https://doi.org/10.14778/3659437.3659449

Modern database management systems (DBMS) expose hundreds of configurable knobs to control system behaviours. Determining the appropriate values for these knobs to improve DBMS performance is a long-standing problem in the database community. As there is ...

research-article

Raising the ClaSS of Streaming Time Series Segmentation

Pages 1953–1966https://doi.org/10.14778/3659437.3659450

Ubiquitous sensors today emit high frequency streams of numerical measurements that reflect properties of human, animal, industrial, commercial, and natural processes. Shifts in such processes, e.g. caused by external events or internal state changes, ...

research-article

Fast Local Subgraph Counting

Pages 1967–1980https://doi.org/10.14778/3659437.3659451

We study local subgraph counting queries, Q = (p, o), to count how many times a given k-node pattern graph p appears around every node υ in a data graph G when the given center node o in p maps to υ. Such local subgraph counting becomes important in GNNs ...

research-article

ReAcTable: Enhancing ReAct for Table Question Answering

Pages 1981–1994https://doi.org/10.14778/3659437.3659452

Table Question Answering (TQA) presents a substantial challenge at the intersection of natural language processing and data analytics. This task involves answering natural language (NL) questions on top of tabular data, demanding proficiency in logical ...

research-article

NeutronOrch: Rethinking Sample-Based GNN Training under CPU-GPU Heterogeneous Environments

Pages 1995–2008https://doi.org/10.14778/3659437.3659453

Graph Neural Networks (GNNs) have shown exceptional performance across a wide range of applications. Current frameworks leverage CPU-GPU heterogeneous environments for GNN model training, incorporating mini-batch and sampling techniques to mitigate GPU ...

research-article

Rapidash: Efficient Detection of Constraint Violations

Pages 2009–2021https://doi.org/10.14778/3659437.3659454

Denial Constraint (DC) is a well-established formalism that captures a wide range of integrity constraints commonly encountered, including candidate keys, functional dependencies, and ordering constraints, among others. Given their significance, there ...

research-article

Differentially Private Data Generation with Missing Data

Pages 2022–2035https://doi.org/10.14778/3659437.3659455

Despite several works that succeed in generating synthetic data with differential privacy (DP) guarantees, they are inadequate for generating high-quality synthetic data when the input data has missing values. In this work, we formalize the problems of ...

research-article

Everything You Always Wanted to Know About Storage Compressibility of Pre-Trained ML Models but Were Afraid to Ask

Pages 2036–2049https://doi.org/10.14778/3659437.3659456

As the number of pre-trained machine learning (ML) models is growing exponentially, data reduction tools are not catching up. Existing data reduction techniques are not specifically designed for pre-trained model (PTM) dataset files. This is largely due ...

research-article

Fight Fire with Fire: Towards Robust Graph Neural Networks on Dynamic Graphs via Actively Defense

Pages 2050–2063https://doi.org/10.14778/3659437.3659457

Graph neural networks (GNNs) have achieved great success on various graph tasks. However, recent studies have revealed that GNNs are vulnerable to injective attacks. Due to the openness of platforms, attackers can inject malicious nodes with carefully ...

research-article

SeLeP: Learning Based Semantic Prefetching for Exploratory Database Workloads

Pages 2064–2076https://doi.org/10.14778/3659437.3659458

Prefetching is a crucial technique employed in traditional databases to enhance interactivity, particularly in the context of data exploration. Data exploration is a query processing paradigm in which users search for insights buried in the data, often ...

research-article

Contributions Estimation in Federated Learning: A Comprehensive Experimental Evaluation

Pages 2077–2090https://doi.org/10.14778/3659437.3659459

Federated Learning (FL) provides a privacy-preserving and decentralized approach to collaborative machine learning for multiple FL clients. The contribution estimation mechanism in FL is extensively studied within the database community, which aims to ...

research-article

Visualization-Aware Time Series Min-Max Caching with Error Bound Guarantees

Pages 2091–2103https://doi.org/10.14778/3659437.3659460

This paper addresses the challenges in interactive visual exploration of large multi-variate time series data. Traditional data reduction techniques may improve latency but can distort visualizations. State-of-the-art methods aimed at 100% accurate ...

research-article

Chorus: Foundation Models for Unified Data Discovery and Exploration

Pages 2104–2114https://doi.org/10.14778/3659437.3659461

We apply foundation models to data discovery and exploration tasks. Foundation models are large language models (LLMS) that show promising performance on a range of diverse tasks unrelated to their training. We show that these models are highly ...

research-article

Cloud-Native Database Systems and Unikernels: Reimagining OS Abstractions for Modern Hardware

Pages 2115–2122https://doi.org/10.14778/3659437.3659462

This paper explores the intersection of operating systems and database systems, focusing on the potential of specialized kernels for cloud-native database systems. Although the idea of custom, DBMS-optimized OS kernels is old, it is largely unrealized ...

Subjects

Currently Not Available

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Proceedings of the VLDB Endowment

Sections

Issue Downloads

FlowWalker: A Memory-Efficient and High-Performance GPU-Based Dynamic Graph Random Walk Framework

Accelerating String-Key Learned Index Structures via Memoization-Based Incremental Training

Truss-Based Community Search over Streaming Directed Graphs

InferDB: In-Database Machine Learning Inference Using Indexes

AAA: An Adaptive Mechanism for Locally Differentially Private Mean Estimation

Accelerating Merkle Patricia Trie with GPU

Privacy Amplification via Shuffling: Unified, Simplified, and Tightened

Detecting Metadata-Related Logic Bugs in Database Systems via Raw Database Construction

From Zero to Hero: Detecting Leaked Data through Synthetic Data Injection and Model Querying

Oasis: An Optimal Disjoint Segmented Learned Range Filter

LakeBench: A Benchmark for Discovering Joinable and Unionable Tables in Data Lakes

GPTuner: A Manual-Reading Database Tuning System via GPT-Guided Bayesian Optimization

Raising the ClaSS of Streaming Time Series Segmentation

Fast Local Subgraph Counting

ReAcTable: Enhancing ReAct for Table Question Answering

NeutronOrch: Rethinking Sample-Based GNN Training under CPU-GPU Heterogeneous Environments

Rapidash: Efficient Detection of Constraint Violations

Differentially Private Data Generation with Missing Data

Everything You Always Wanted to Know About Storage Compressibility of Pre-Trained ML Models but Were Afraid to Ask

Fight Fire with Fire: Towards Robust Graph Neural Networks on Dynamic Graphs via Actively Defense

SeLeP: Learning Based Semantic Prefetching for Exploratory Database Workloads

Contributions Estimation in Federated Learning: A Comprehensive Experimental Evaluation

Visualization-Aware Time Series Min-Max Caching with Error Bound Guarantees

Chorus: Foundation Models for Unified Data Discovery and Exploration

Cloud-Native Database Systems and Unikernels: Reimagining OS Abstractions for Modern Hardware