Ai alignment: A comprehensive survey
AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …
Muter: Machine unlearning on adversarially trained models
Abstract Machine unlearning is an emerging task of removing the influence of selected
training datapoints from a trained model upon data deletion requests, which echoes the …
training datapoints from a trained model upon data deletion requests, which echoes the …
Salun: Empowering machine unlearning via gradient-based weight saliency in both image classification and generation
With evolving data regulations, machine unlearning (MU) has become an important tool for
fostering trust and safety in today's AI models. However, existing MU methods focusing on …
fostering trust and safety in today's AI models. However, existing MU methods focusing on …
Citation: A key to building responsible and accountable large language models
Large Language Models (LLMs) bring transformative benefits alongside unique challenges,
including intellectual property (IP) and ethical concerns. This position paper explores a …
including intellectual property (IP) and ethical concerns. This position paper explores a …
Datainf: Efficiently estimating data influence in lora-tuned llms and diffusion models
Quantifying the impact of training data points is crucial for understanding the outputs of
machine learning models and for improving the transparency of the AI pipeline. The …
machine learning models and for improving the transparency of the AI pipeline. The …
Intriguing properties of data attribution on diffusion models
Data attribution seeks to trace model outputs back to training data. With the recent
development of diffusion models, data attribution has become a desired module to properly …
development of diffusion models, data attribution has become a desired module to properly …
Contextual Confidence and Generative AI
Generative AI models perturb the foundations of effective human communication. They
present new challenges to contextual confidence, disrupting participants' ability to identify …
present new challenges to contextual confidence, disrupting participants' ability to identify …
Adaptive instrument design for indirect experiments
Y Chandak, S Shankar, V Syrgkanis… - The Twelfth International …, 2023 - openreview.net
Indirect experiments provide a valuable framework for estimating treatment effects in
situations where conducting randomized control trials (RCTs) is impractical or unethical …
situations where conducting randomized control trials (RCTs) is impractical or unethical …
Merging by matching models in task subspaces
Model merging aims to cheaply combine individual task-specific models into a single
multitask model. In this work, we view past merging methods as leveraging different notions …
multitask model. In this work, we view past merging methods as leveraging different notions …
Structured inverse-free natural gradient: Memory-efficient & numerically-stable kfac for large neural nets
Second-order methods for deep learning--such as KFAC--can be useful for neural net
training. However, they are often memory-inefficient and numerically unstable for low …
training. However, they are often memory-inefficient and numerically unstable for low …