-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for Flux model in diffusers #763
Comments
Started to work on the T5Encoder and need help with the expected call flow in export model (not inference yet). Changes are in https://github.com/yahavb/optimum-neuron/tree/main/optimum/exporters/neuron. I can't figure out how to add the sharding functions into the flow. Here are the steps (followed by changes I made):
I can't figure out how to trigger the call create_optimized_model so will appreciate any hint if I'm in the right direction. |
I also see that optimum/exporters/neuron/model_configs/traced_configs.py already included with tensor_parallel_size and calls to neuronx_distributed.parallel_layers.load when tp>1. Isn't this the flow you meant in Text encoder 2 above? |
Hi @yahavb, as you mentioned, the attributes, functions related to the sharding of T5 encoder shall be within the NeuronConfig(eg To locate how the sharding is done, let's take T5 encoder as an example:
[In untill here, the model is not yet sharded, but we have elements we need for the sharding within the neuron config of T5 encoder(
[In Besides, for sharding the model we actually call the So,
Yeah it's what I meant in the description for Text encoder 2. Let me know if it's not clear enough. |
Feature request
Hey @yahavb, to continue our discussion on supporting the Flux model here.
1st step: Export support
Add export support for each component
No need to add the wrapper, it should be already handled by
CLIPTextNeuronConfig
.Can reuse the
T5EncoderForDiffusersNeuronConfig
class, in which theT5EncoderWrapper
is equivalent toTracingT5TextEncoderWrapper
.For tp support, will need to enable
tensor_parallel_size
arg for many functions inoptimum/exporters/neuron/__main__.py
(eg._get_submodels_and_neuron_configs_for_stable_diffusion
). And we need to write aget_parallel_callable
for the neuron config class to shard the model, we can take the one inT5EncoderForTransformersNeuronConfig
as a reference.In the script, transformer_embedders / transformer_blocks / ingle_transformer_blocks / transformer_out_layers are traced separately, would it be possible to fit into neuron devices while tracing the whole transformer together? If it's not the case, in Optimum Neuron logic, each traced component means a NeuronConfig class, then you will need to create a class for each component and build a function to fetch each module (similar to
_get_submodels_and_neuron_configs_for_stable_diffusion
), otherwise, you would be able to create a singleFluxTransformerNeuronConfig
, and add the tp config there.Nothing to do, already supported
Test
Here is an example of testing the compilation:
optimum-cli export neuron --model black-forest-labs/FLUX.1-dev --tensor_parallel_size 8 --batch_size 1 --height 1024 --width 1024 --num_images_per_prompt 1 --sequence_length 512 --torch_dtype bfloat16 flux_neuron/
2nd step: Inference support
Depends on the 1st step to decide if we need to override some functions in
FluxPipeline
(if we traced transformer in multiple module), but whatever we need to put underoptimum/neuron/pipelines/diffusers
. We will need to add the following class underoptimum/neuron/modeling_diffusion.py
:Reference PRs:
Motivation
Add Flux support
Your contribution
Will collaborate with @yahavb on this.
The text was updated successfully, but these errors were encountered: