Ai alignment: A comprehensive survey

J Ji, T Qiu, B Chen, B Zhang, H Lou, K Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …

Muter: Machine unlearning on adversarially trained models

J Liu, M Xue, J Lou, X Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Machine unlearning is an emerging task of removing the influence of selected
training datapoints from a trained model upon data deletion requests, which echoes the …

Salun: Empowering machine unlearning via gradient-based weight saliency in both image classification and generation

C Fan, J Liu, Y Zhang, D Wei, E Wong, S Liu - arXiv preprint arXiv …, 2023 - arxiv.org
With evolving data regulations, machine unlearning (MU) has become an important tool for
fostering trust and safety in today's AI models. However, existing MU methods focusing on …

Citation: A key to building responsible and accountable large language models

J Huang, KCC Chang - arXiv preprint arXiv:2307.02185, 2023 - arxiv.org
Large Language Models (LLMs) bring transformative benefits alongside unique challenges,
including intellectual property (IP) and ethical concerns. This position paper explores a …

Datainf: Efficiently estimating data influence in lora-tuned llms and diffusion models

Y Kwon, E Wu, K Wu, J Zou - arXiv preprint arXiv:2310.00902, 2023 - arxiv.org
Quantifying the impact of training data points is crucial for understanding the outputs of
machine learning models and for improving the transparency of the AI pipeline. The …

Intriguing properties of data attribution on diffusion models

X Zheng, T Pang, C Du, J Jiang, M Lin - arXiv preprint arXiv:2311.00500, 2023 - arxiv.org
Data attribution seeks to trace model outputs back to training data. With the recent
development of diffusion models, data attribution has become a desired module to properly …

Contextual Confidence and Generative AI

S Jain, Z Hitzig, P Mishkin - arXiv preprint arXiv:2311.01193, 2023 - arxiv.org
Generative AI models perturb the foundations of effective human communication. They
present new challenges to contextual confidence, disrupting participants' ability to identify …

Adaptive instrument design for indirect experiments

Y Chandak, S Shankar, V Syrgkanis… - The Twelfth International …, 2023 - openreview.net
Indirect experiments provide a valuable framework for estimating treatment effects in
situations where conducting randomized control trials (RCTs) is impractical or unethical …

Merging by matching models in task subspaces

D Tam, M Bansal, C Raffel - arXiv preprint arXiv:2312.04339, 2023 - arxiv.org
Model merging aims to cheaply combine individual task-specific models into a single
multitask model. In this work, we view past merging methods as leveraging different notions …

Structured inverse-free natural gradient: Memory-efficient & numerically-stable kfac for large neural nets

W Lin, F Dangel, R Eschenhagen, K Neklyudov… - arXiv preprint arXiv …, 2023 - arxiv.org
Second-order methods for deep learning--such as KFAC--can be useful for neural net
training. However, they are often memory-inefficient and numerically unstable for low …