Support for Flux model in diffusers #763

JingyaHuang · 2025-01-15T16:30:07Z

Feature request

Hey @yahavb, to continue our discussion on supporting the Flux model here.

1st step: Export support

Add export support for each component

Text encoder 1

No need to add the wrapper, it should be already handled by CLIPTextNeuronConfig.

Text encoder 2

Can reuse the T5EncoderForDiffusersNeuronConfig class, in which the T5EncoderWrapper is equivalent to TracingT5TextEncoderWrapper.
For tp support, will need to enable tensor_parallel_size arg for many functions in optimum/exporters/neuron/__main__.py (eg. _get_submodels_and_neuron_configs_for_stable_diffusion). And we need to write a get_parallel_callable for the neuron config class to shard the model, we can take the one in T5EncoderForTransformersNeuronConfig as a reference.

Transformer

In the script, transformer_embedders / transformer_blocks / ingle_transformer_blocks / transformer_out_layers are traced separately, would it be possible to fit into neuron devices while tracing the whole transformer together? If it's not the case, in Optimum Neuron logic, each traced component means a NeuronConfig class, then you will need to create a class for each component and build a function to fetch each module (similar to _get_submodels_and_neuron_configs_for_stable_diffusion), otherwise, you would be able to create a single FluxTransformerNeuronConfig, and add the tp config there.

VAE decoder

Nothing to do, already supported

Test

Here is an example of testing the compilation:

optimum-cli export neuron --model black-forest-labs/FLUX.1-dev --tensor_parallel_size 8 --batch_size 1 --height 1024 --width 1024 --num_images_per_prompt 1 --sequence_length 512 --torch_dtype bfloat16 flux_neuron/

2nd step: Inference support

Depends on the 1st step to decide if we need to override some functions in FluxPipeline (if we traced transformer in multiple module), but whatever we need to put under optimum/neuron/pipelines/diffusers. We will need to add the following class under optimum/neuron/modeling_diffusion.py:

class NeuronFluxPipeline(NeuronDiffusionPipelineBase, FluxPipeline):
    main_input_name = "prompt"
    auto_model_class = FluxPipeline

Reference PRs:

Add TP support for T5 model: Add tensor parallel support to T5 via NxD #697
Add Pixart support: Add PixArt support #731

Motivation

Add Flux support

Your contribution

Will collaborate with @yahavb on this.

The text was updated successfully, but these errors were encountered:

yahavb · 2025-01-21T23:11:20Z

Started to work on the T5Encoder and need help with the expected call flow in export model (not inference yet). Changes are in https://github.com/yahavb/optimum-neuron/tree/main/optimum/exporters/neuron. I can't figure out how to add the sharding functions into the flow. Here are the steps (followed by changes I made):

added tp size in the _get_submodels_and_neuron_configs_for_stable_diffusion and append it in the T5EncoderForDiffusersNeuronConfig
In the T5EncoderForDiffusersNeuronConfig I added create_optimized_model that calls a new wrapper and get_parallel_callable that does the same thing.
I added in model_wrappers.py the T5EncoderWrapperWithTP wrapper and text_encoder_sharding.py with all the sharding functions.

I can't figure out how to trigger the call create_optimized_model so will appreciate any hint if I'm in the right direction.

yahavb · 2025-01-22T01:49:43Z

I also see that optimum/exporters/neuron/model_configs/traced_configs.py already included with tensor_parallel_size and calls to neuronx_distributed.parallel_layers.load when tp>1. Isn't this the flow you meant in Text encoder 2 above?

JingyaHuang · 2025-01-22T19:03:36Z

Hi @yahavb, as you mentioned, the attributes, functions related to the sharding of T5 encoder shall be within the NeuronConfig(eg tensor_parallel_size and create_optimized_model). The idea is that everything model-specific should be within the Neuron config dedicated to the model(I saw you put the config of T5 in optimum/exporters/neuron/__main__.py, there we only put general functions for all models).

To locate how the sharding is done, let's take T5 encoder as an example:

1st step: load vanilla model and define its neuron config(model.neuron_config)

[In __main__.py] main_export -> load_models_and_neuron_configs -> get_submodels_and_neuron_configs -> _get_submodels_and_neuron_configs_for_stable_diffusion -> [In utils.py]get_diffusion_models_for_export(you shall add tensor_parallel_size as argument like in get_encoder_decoder_models_for_export) -> Instantiate the neuron config as text_encoder_config_constructor

untill here, the model is not yet sharded, but we have elements we need for the sharding within the neuron config of T5 encoder(tensor_parallel_size, the wrapper, patch_model_for_export: which returns the callable sent to neuronx_distributed.trace.parallel_model_trace and generate_io_aliases: we don't need it for flux).

2nd step: send the callable defining how to shard the model to neuronx_distributed.trace.parallel_model_trace during the export

[In __main__.py] main_export -> [In convert.py] export_neuronx in which we call config.patch_model_for_export what we defined in the 1st step before sending it to trace.parallel_model_trace.

Besides, for sharding the model we actually call the ParallelizersManager.parallelizer_for_model which is shared with the training...

So,

You shall complete T5EncoderForDiffusersNeuronConfig (I built this class for pixart) with all we need for tp. It should be quite similar to T5EncoderForTransformersNeuronConfig, the one we have for t5 text generation, the difference would be, we don't need past key values as output, and with different encoder wrapper(no need to initialize the past key values with the "encoder")

Yeah it's what I meant in the description for Text encoder 2. Let me know if it's not clear enough.

JingyaHuang self-assigned this Jan 15, 2025

JingyaHuang mentioned this issue Jan 22, 2025

Add support for new Black Forest's model (Flux) #676

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Flux model in diffusers #763

Support for Flux model in diffusers #763

JingyaHuang commented Jan 15, 2025 •

edited

Loading

yahavb commented Jan 21, 2025

yahavb commented Jan 22, 2025

JingyaHuang commented Jan 22, 2025

Support for Flux model in diffusers #763

Support for Flux model in diffusers #763

Comments

JingyaHuang commented Jan 15, 2025 • edited Loading

Feature request

1st step: Export support

Add export support for each component

Test

2nd step: Inference support

Motivation

Your contribution

yahavb commented Jan 21, 2025

yahavb commented Jan 22, 2025

JingyaHuang commented Jan 22, 2025

JingyaHuang commented Jan 15, 2025 •

edited

Loading