skip to main content
Volume 17, Issue 8April 2024
Editor:
Publisher:
  • VLDB Endowment
ISSN:2150-8097
Reflects downloads up to 05 Nov 2024Bibliometrics
Skip Table Of Content Section
FlowWalker: A Memory-Efficient and High-Performance GPU-Based Dynamic Graph Random Walk Framework

Dynamic graph random walk (DGRW) emerges as a practical tool for capturing structural relations within a graph. Effectively executing DGRW on GPU presents certain challenges. First, existing sampling methods demand a pre-processing buffer, causing ...

Accelerating String-Key Learned Index Structures via Memoization-Based Incremental Training

Learned indexes use machine learning models to learn the mappings between keys and their corresponding positions in key-value indexes. These indexes use the mapping information as training data. Learned indexes require frequent retrainings of their ...

Truss-Based Community Search over Streaming Directed Graphs

Community search aims to retrieve dense subgraphs that contain the query vertices. While many effective community models and algorithms have been proposed in the literature, none of them address the unique challenges posed by streaming graphs, where ...

InferDB: In-Database Machine Learning Inference Using Indexes

The performance of inference with machine learning (ML) models and its integration with analytical query processing have become critical bottlenecks for data analysis in many organizations. An ML inference pipeline typically consists of a preprocessing ...

AAA: An Adaptive Mechanism for Locally Differentially Private Mean Estimation

Local differential privacy (LDP) is a strong privacy standard that has been adopted by popular software systems, including Chrome, iOS, MacOS, and Windows. The main idea is that each individual perturbs their own data locally, and only submits the ...

Accelerating Merkle Patricia Trie with GPU

Merkle Patricia Trie (MPT) is a type of trie structure that offers efficient lookup and insert operators for immutable data systems that require multi-version access and tamper-evident controls, such as blockchains and verifiable databases. The ...

Privacy Amplification via Shuffling: Unified, Simplified, and Tightened

The shuffle model of differential privacy provides promising privacy-utility balances in decentralized, privacy-preserving data analysis. However, the current analyses of privacy amplification via shuffling lack both tightness and generality. To address ...

Detecting Metadata-Related Logic Bugs in Database Systems via Raw Database Construction

Database Management Systems (DBMSs) are widely used to efficiently store and retrieve data. DBMSs usually support various metadata, e.g., integrity constraints for ensuring data integrity and indexes for locating data. DBMSs can further utilize these ...

From Zero to Hero: Detecting Leaked Data through Synthetic Data Injection and Model Querying

Safeguarding the Intellectual Property (IP) of data has become critically important as machine learning applications continue to proliferate, and their success heavily relies on the quality of training data. While various mechanisms exist to secure data ...

Oasis: An Optimal Disjoint Segmented Learned Range Filter

The learning-enhanced data structure has inspired the development of the range filter, bringing significantly better false positive rate (FPR) than traditional non-learned range filters. Its core idea is to employ piece-wise linear functions that ...

LakeBench: A Benchmark for Discovering Joinable and Unionable Tables in Data Lakes

Discovering tables from poorly maintained data lakes is a significant challenge in data management. Two key tasks are identifying joinable and unionable tables, crucial for data integration, analysis, and machine learning. However, there's a lack of a ...

GPTuner: A Manual-Reading Database Tuning System via GPT-Guided Bayesian Optimization

Modern database management systems (DBMS) expose hundreds of configurable knobs to control system behaviours. Determining the appropriate values for these knobs to improve DBMS performance is a long-standing problem in the database community. As there is ...

Raising the ClaSS of Streaming Time Series Segmentation

Ubiquitous sensors today emit high frequency streams of numerical measurements that reflect properties of human, animal, industrial, commercial, and natural processes. Shifts in such processes, e.g. caused by external events or internal state changes, ...

Fast Local Subgraph Counting

We study local subgraph counting queries, Q = (p, o), to count how many times a given k-node pattern graph p appears around every node υ in a data graph G when the given center node o in p maps to υ. Such local subgraph counting becomes important in GNNs ...

ReAcTable: Enhancing ReAct for Table Question Answering

Table Question Answering (TQA) presents a substantial challenge at the intersection of natural language processing and data analytics. This task involves answering natural language (NL) questions on top of tabular data, demanding proficiency in logical ...

NeutronOrch: Rethinking Sample-Based GNN Training under CPU-GPU Heterogeneous Environments

Graph Neural Networks (GNNs) have shown exceptional performance across a wide range of applications. Current frameworks leverage CPU-GPU heterogeneous environments for GNN model training, incorporating mini-batch and sampling techniques to mitigate GPU ...

research-article
Rapidash: Efficient Detection of Constraint Violations

Denial Constraint (DC) is a well-established formalism that captures a wide range of integrity constraints commonly encountered, including candidate keys, functional dependencies, and ordering constraints, among others. Given their significance, there ...

Differentially Private Data Generation with Missing Data

Despite several works that succeed in generating synthetic data with differential privacy (DP) guarantees, they are inadequate for generating high-quality synthetic data when the input data has missing values. In this work, we formalize the problems of ...

Everything You Always Wanted to Know About Storage Compressibility of Pre-Trained ML Models but Were Afraid to Ask

As the number of pre-trained machine learning (ML) models is growing exponentially, data reduction tools are not catching up. Existing data reduction techniques are not specifically designed for pre-trained model (PTM) dataset files. This is largely due ...

Fight Fire with Fire: Towards Robust Graph Neural Networks on Dynamic Graphs via Actively Defense

Graph neural networks (GNNs) have achieved great success on various graph tasks. However, recent studies have revealed that GNNs are vulnerable to injective attacks. Due to the openness of platforms, attackers can inject malicious nodes with carefully ...

SeLeP: Learning Based Semantic Prefetching for Exploratory Database Workloads

Prefetching is a crucial technique employed in traditional databases to enhance interactivity, particularly in the context of data exploration. Data exploration is a query processing paradigm in which users search for insights buried in the data, often ...

Contributions Estimation in Federated Learning: A Comprehensive Experimental Evaluation

Federated Learning (FL) provides a privacy-preserving and decentralized approach to collaborative machine learning for multiple FL clients. The contribution estimation mechanism in FL is extensively studied within the database community, which aims to ...

Visualization-Aware Time Series Min-Max Caching with Error Bound Guarantees

This paper addresses the challenges in interactive visual exploration of large multi-variate time series data. Traditional data reduction techniques may improve latency but can distort visualizations. State-of-the-art methods aimed at 100% accurate ...

Chorus: Foundation Models for Unified Data Discovery and Exploration

We apply foundation models to data discovery and exploration tasks. Foundation models are large language models (LLMS) that show promising performance on a range of diverse tasks unrelated to their training. We show that these models are highly ...

Cloud-Native Database Systems and Unikernels: Reimagining OS Abstractions for Modern Hardware

This paper explores the intersection of operating systems and database systems, focusing on the potential of specialized kernels for cloud-native database systems. Although the idea of custom, DBMS-optimized OS kernels is old, it is largely unrealized ...

Subjects

Currently Not Available

Comments

Please enable JavaScript to view thecomments powered by Disqus.