Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Quantization] bring quantization to diffusers core #9174

Closed
3 tasks
sayakpaul opened this issue Aug 14, 2024 · 15 comments
Closed
3 tasks

[Quantization] bring quantization to diffusers core #9174

sayakpaul opened this issue Aug 14, 2024 · 15 comments
Assignees

Comments

@sayakpaul
Copy link
Member

sayakpaul commented Aug 14, 2024

Now that we have a working PoC (#9165) of NF4 quantization through bitsandbytes and also this through optimum.quanto, it's time to bring in quantization more formally in diffusers 🎸

In this issue, I want to devise a rough plan to attack the integration. We are going to start with bitsandbytes and then slowly increase the list of our supported quantizers based on community interest. This integration will also allow us to do LoRA fine-tuning of large models like Flux through peft (guide).

Three PRs are expected:

  • Introduce a base quantization config class like we have in transformers.
  • Introduce bitsandbytes related utilities to handle processing, post-processing of layers for injecting bitsandbytes layers. Example is here.
  • Introduce a bitsandbytes config (example) and quantization loader mixin aka QuantizationLoaderMixin. This loader will enable passing a quantization config to from_pretrained() of a ModelMixin and will tackle how to modify and prepare the model for the provided quantization config. This will also allow us to serialize the model according to the quantization config.

Notes:


@DN6 @SunMarc sounds good?

@HighCWu
Copy link

HighCWu commented Aug 15, 2024

I don't know if my work a few days ago can help you. Flux nf4 in diffusers: HighCWu/flux-4bit. I directly used transformers to wrap fluxtransformer2d, so that it can directly reuse the quantization configuration of transformers

@Ednaordinary
Copy link

Here's another new quantization scheme: GGUF
https://huggingface.co/city96/FLUX.1-dev-gguf
It's very recent but has extremely promising results, possibly better than NF4 at the same bpw. It may be DiT only, though.

@sayakpaul
Copy link
Member Author

Thanks for all the links but let's keep this issue solely for what is laid out in the OP so that we don't lose track.

@bghira
Copy link
Contributor

bghira commented Aug 15, 2024

getting distracted with amazing features... THE BEST KIND of distraction... haha! i think we just need separate issues for those to track them.

@matthewdouglas
Copy link
Member

I don't see why NF4 serialization shouldn't work with accelerate. Maybe the doc is out of date and/or it's just not well tested. @muellerzr could I be missing something here?

@sayakpaul
Copy link
Member Author

sayakpaul commented Aug 15, 2024

I don't see why NF4 serialization shouldn't work with accelerate. Maybe the doc is out of date and/or it's just not well tested. @muellerzr could I be missing something here?

Cc: @SunMarc as well.

I am all in for depending on accelerate for this stuff just like we simply delegate our sharding and device mapping to accelerate differing from transformers.

Also, another potential downside I can see with relying on just accelerate for quantization is that we're limited to just bitsandbytes there (my reference here). But torchao and quanto also work with diffusion models quite well in different situations. So, having an integration would make more sense here IMO.

But I can totally envision us moving all that transformers quantization stuff (and potentially diffuserers, too) to accelerate and doing the refactorings needed in the future.

Just thinking out loud.

@bghira
Copy link
Contributor

bghira commented Aug 15, 2024

in an ideal world quanto would be a sort of go-between for accelerate, transformers, and diffusers to access other quantisation libraries, no? on paper to me at least it feels like a smart idea. as it would present a single interface for all HF projects to use quantisation. and it would be consistently implementable everywhere without accelerate being involved, if needed

@muellerzr
Copy link
Contributor

muellerzr commented Aug 15, 2024

That is also how I viewed it/thought about it. (Accelerate could be a flow-thru, potentially, but yes IMO quanto)

@sayakpaul
Copy link
Member Author

So, when that potential in, I wouldn't mind going through the refactoring myself but till then I think we will have consider the integration as laid out. WDYT?

Would like to reiterate that I would LOVE to have the flow-through as soon as it's there :)

@lonngxiang
Copy link

very much looking forward to it

@lonngxiang
Copy link

I don't know if my work a few days ago can help you. Flux nf4 in diffusers: HighCWu/flux-4bit. I directly used transformers to wrap fluxtransformer2d, so that it can directly reuse the quantization configuration of transformers我不知道我前几天的工作能不能帮到你。扩散器中的通量nf 4:HighCWu/flux-4 bit。我直接使用transformers来包装fluxtransformer 2d,这样它就可以直接重用transformers的量化配置

The generation rate is very slow

Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot added the stale Issues that haven't received updates label Sep 24, 2024
@sayakpaul sayakpaul removed the stale Issues that haven't received updates label Sep 24, 2024
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot added the stale Issues that haven't received updates label Oct 19, 2024
@sayakpaul sayakpaul removed the stale Issues that haven't received updates label Oct 19, 2024
@sayakpaul
Copy link
Member Author

Not stale. See #9213.

@sayakpaul
Copy link
Member Author

We merged in #9213. Feel free to try out and report issues if observed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants