[Quantization] bring quantization to diffusers core #9174

sayakpaul · 2024-08-14T08:05:34Z

Now that we have a working PoC (#9165) of NF4 quantization through bitsandbytes and also this through optimum.quanto, it's time to bring in quantization more formally in diffusers 🎸

In this issue, I want to devise a rough plan to attack the integration. We are going to start with bitsandbytes and then slowly increase the list of our supported quantizers based on community interest. This integration will also allow us to do LoRA fine-tuning of large models like Flux through peft (guide).

Three PRs are expected:

Introduce a base quantization config class like we have in transformers.
Introduce bitsandbytes related utilities to handle processing, post-processing of layers for injecting bitsandbytes layers. Example is here.
Introduce a bitsandbytes config (example) and quantization loader mixin aka QuantizationLoaderMixin. This loader will enable passing a quantization config to from_pretrained() of a ModelMixin and will tackle how to modify and prepare the model for the provided quantization config. This will also allow us to serialize the model according to the quantization config.

Notes:

We could have done this with accelerate (guide) but this doesn't yet support NF4 serialization.
Good example PR: Add TorchAOHfQuantizer transformers#32306

@DN6 @SunMarc sounds good?

The text was updated successfully, but these errors were encountered:

HighCWu · 2024-08-15T12:07:36Z

I don't know if my work a few days ago can help you. Flux nf4 in diffusers: HighCWu/flux-4bit. I directly used transformers to wrap fluxtransformer2d, so that it can directly reuse the quantization configuration of transformers

Ednaordinary · 2024-08-15T12:58:56Z

Here's another new quantization scheme: GGUF
https://huggingface.co/city96/FLUX.1-dev-gguf
It's very recent but has extremely promising results, possibly better than NF4 at the same bpw. It may be DiT only, though.

sayakpaul · 2024-08-15T13:02:28Z

Thanks for all the links but let's keep this issue solely for what is laid out in the OP so that we don't lose track.

bghira · 2024-08-15T13:52:13Z

getting distracted with amazing features... THE BEST KIND of distraction... haha! i think we just need separate issues for those to track them.

matthewdouglas · 2024-08-15T14:11:00Z

I don't see why NF4 serialization shouldn't work with accelerate. Maybe the doc is out of date and/or it's just not well tested. @muellerzr could I be missing something here?

sayakpaul · 2024-08-15T14:29:25Z

I don't see why NF4 serialization shouldn't work with accelerate. Maybe the doc is out of date and/or it's just not well tested. @muellerzr could I be missing something here?

Cc: @SunMarc as well.

I am all in for depending on accelerate for this stuff just like we simply delegate our sharding and device mapping to accelerate differing from transformers.

Also, another potential downside I can see with relying on just accelerate for quantization is that we're limited to just bitsandbytes there (my reference here). But torchao and quanto also work with diffusion models quite well in different situations. So, having an integration would make more sense here IMO.

But I can totally envision us moving all that transformers quantization stuff (and potentially diffuserers, too) to accelerate and doing the refactorings needed in the future.

Just thinking out loud.

bghira · 2024-08-15T16:11:43Z

in an ideal world quanto would be a sort of go-between for accelerate, transformers, and diffusers to access other quantisation libraries, no? on paper to me at least it feels like a smart idea. as it would present a single interface for all HF projects to use quantisation. and it would be consistently implementable everywhere without accelerate being involved, if needed

muellerzr · 2024-08-15T19:59:44Z

That is also how I viewed it/thought about it. (Accelerate could be a flow-thru, potentially, but yes IMO quanto)

sayakpaul · 2024-08-16T01:55:24Z

So, when that potential in, I wouldn't mind going through the refactoring myself but till then I think we will have consider the integration as laid out. WDYT?

Would like to reiterate that I would LOVE to have the flow-through as soon as it's there :)

lonngxiang · 2024-08-31T01:21:07Z

very much looking forward to it

lonngxiang · 2024-08-31T01:43:57Z

I don't know if my work a few days ago can help you. Flux nf4 in diffusers: HighCWu/flux-4bit. I directly used transformers to wrap fluxtransformer2d, so that it can directly reuse the quantization configuration of transformers我不知道我前几天的工作能不能帮到你。扩散器中的通量nf 4：HighCWu/flux-4 bit。我直接使用transformers来包装fluxtransformer 2d，这样它就可以直接重用transformers的量化配置

The generation rate is very slow

github-actions · 2024-09-24T15:03:12Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions · 2024-10-19T15:03:18Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

sayakpaul · 2024-10-19T16:43:31Z

Not stale. See #9213.

sayakpaul · 2024-10-21T04:42:46Z

We merged in #9213. Feel free to try out and report issues if observed.

sayakpaul added the quantization label Aug 14, 2024

DN6 mentioned this issue Aug 14, 2024

[WIP] Layerwise dynamic upcasting to Diffusers Models #9177

Open

6 tasks

Ednaordinary mentioned this issue Aug 15, 2024

Support FLUX nf4 & pf8 for GPUs with 6GB/8GB VRAM (method and checkpoints by lllyasviel) #9149

Closed

sayakpaul mentioned this issue Aug 15, 2024

NF4 Flux params in diffusers #9165

Closed

sayakpaul mentioned this issue Aug 16, 2024

Problem with Flux Schnell bfloat16 multiGPU #9195

Closed

sayakpaul self-assigned this Aug 19, 2024

sayakpaul mentioned this issue Aug 19, 2024

[Quantization] Add quantization support for bitsandbytes #9213

Merged

8 tasks

github-actions bot added the stale Issues that haven't received updates label Sep 24, 2024

sayakpaul removed the stale Issues that haven't received updates label Sep 24, 2024

github-actions bot added the stale Issues that haven't received updates label Oct 19, 2024

sayakpaul removed the stale Issues that haven't received updates label Oct 19, 2024

sayakpaul closed this as completed Oct 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Quantization] bring quantization to diffusers core #9174

[Quantization] bring quantization to diffusers core #9174

sayakpaul commented Aug 14, 2024 •

edited

Loading

HighCWu commented Aug 15, 2024

Ednaordinary commented Aug 15, 2024

sayakpaul commented Aug 15, 2024

bghira commented Aug 15, 2024

matthewdouglas commented Aug 15, 2024

sayakpaul commented Aug 15, 2024 •

edited

Loading

bghira commented Aug 15, 2024 •

edited

Loading

muellerzr commented Aug 15, 2024 •

edited

Loading

sayakpaul commented Aug 16, 2024

lonngxiang commented Aug 31, 2024

lonngxiang commented Aug 31, 2024

github-actions bot commented Sep 24, 2024

github-actions bot commented Oct 19, 2024

sayakpaul commented Oct 19, 2024

sayakpaul commented Oct 21, 2024

[Quantization] bring quantization to diffusers core #9174

[Quantization] bring quantization to diffusers core #9174

Comments

sayakpaul commented Aug 14, 2024 • edited Loading

HighCWu commented Aug 15, 2024

Ednaordinary commented Aug 15, 2024

sayakpaul commented Aug 15, 2024

bghira commented Aug 15, 2024

matthewdouglas commented Aug 15, 2024

sayakpaul commented Aug 15, 2024 • edited Loading

bghira commented Aug 15, 2024 • edited Loading

muellerzr commented Aug 15, 2024 • edited Loading

sayakpaul commented Aug 16, 2024

lonngxiang commented Aug 31, 2024

lonngxiang commented Aug 31, 2024

github-actions bot commented Sep 24, 2024

github-actions bot commented Oct 19, 2024

sayakpaul commented Oct 19, 2024

sayakpaul commented Oct 21, 2024

sayakpaul commented Aug 14, 2024 •

edited

Loading

sayakpaul commented Aug 15, 2024 •

edited

Loading

bghira commented Aug 15, 2024 •

edited

Loading

muellerzr commented Aug 15, 2024 •

edited

Loading