-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Quantization] bring quantization to diffusers core #9174
Comments
I don't know if my work a few days ago can help you. Flux nf4 in diffusers: HighCWu/flux-4bit. I directly used transformers to wrap fluxtransformer2d, so that it can directly reuse the quantization configuration of transformers |
Here's another new quantization scheme: GGUF |
Thanks for all the links but let's keep this issue solely for what is laid out in the OP so that we don't lose track. |
getting distracted with amazing features... THE BEST KIND of distraction... haha! i think we just need separate issues for those to track them. |
I don't see why NF4 serialization shouldn't work with accelerate. Maybe the doc is out of date and/or it's just not well tested. @muellerzr could I be missing something here? |
Cc: @SunMarc as well. I am all in for depending on Also, another potential downside I can see with relying on just But I can totally envision us moving all that Just thinking out loud. |
in an ideal world quanto would be a sort of go-between for accelerate, transformers, and diffusers to access other quantisation libraries, no? on paper to me at least it feels like a smart idea. as it would present a single interface for all HF projects to use quantisation. and it would be consistently implementable everywhere without accelerate being involved, if needed |
That is also how I viewed it/thought about it. (Accelerate could be a flow-thru, potentially, but yes IMO quanto) |
So, when that potential in, I wouldn't mind going through the refactoring myself but till then I think we will have consider the integration as laid out. WDYT? Would like to reiterate that I would LOVE to have the flow-through as soon as it's there :) |
very much looking forward to it |
The generation rate is very slow |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Not stale. See #9213. |
We merged in #9213. Feel free to try out and report issues if observed. |
Now that we have a working PoC (#9165) of NF4 quantization through
bitsandbytes
and also this throughoptimum.quanto
, it's time to bring in quantization more formally indiffusers
🎸In this issue, I want to devise a rough plan to attack the integration. We are going to start with
bitsandbytes
and then slowly increase the list of our supported quantizers based on community interest. This integration will also allow us to do LoRA fine-tuning of large models like Flux throughpeft
(guide).Three PRs are expected:
transformers
.bitsandbytes
related utilities to handle processing, post-processing of layers for injectingbitsandbytes
layers. Example is here.bitsandbytes
config (example) and quantization loader mixin akaQuantizationLoaderMixin
. This loader will enable passing a quantization config tofrom_pretrained()
of aModelMixin
and will tackle how to modify and prepare the model for the provided quantization config. This will also allow us to serialize the model according to the quantization config.Notes:
accelerate
(guide) but this doesn't yet support NF4 serialization.@DN6 @SunMarc sounds good?
The text was updated successfully, but these errors were encountered: