Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Isn't there any inference script besides training ? #1

Open
zhangpzh opened this issue Feb 11, 2025 · 1 comment
Open

Isn't there any inference script besides training ? #1

zhangpzh opened this issue Feb 11, 2025 · 1 comment

Comments

@zhangpzh
Copy link

As suggested by the title.

@codepassionor
Copy link
Owner

Hi, for testing, there are three key modules to consider. Using TokenFlow as an example:

  1. Loading the Pretrained Adapter:
    In the video editing pipeline, you need to load the trained adapter. You can find the relevant implementation here:
    Code Reference.
    Additionally, LoRA weight control over timesteps is handled here.

  2. Processing Text Embeddings & Token Integration:
    After extracting the text embeddings, you need to load the trained shared token weights and concatenate them with the unsahred token weights extracted from the current video before passing them into the U-Net. You can refer to the following lines in the code:

  3. Applying Bilateral Filtering During DDIM Inversion:
    During DDIM inversion, we use a hooker function to process the U-Net for filtering noisy latents. For TokenFlow, this is implemented here.

For other 2D U-Net-based video editing algorithms like Vid2Vid, you should follow a similar approach:

  • Load the trained LoRA adapter in the pipeline (ensuring version compatibility, e.g., a LoRA trained on SD 1.5 should be used with an SD 1.5-based U-Net).
  • Identify where the text embeddings are processed and concatenate the shared and unsahred tokens.
  • Define a custom hooker function to handle DDIM inversion, applying bilateral filtering as needed.

Additionally, I have updated the repository with all the missing bilateral filtering implementations for the remaining algorithms in our paper. You can now find them here:

  • Text2Video: text2video(model_lora.py)
  • Vid2Vid: test_vid2vid_zero_lora.py
  • VidTome: main.py
  • DDIM Inversion Implementation: ddim_inversion.py

For algorithms not covered in our paper, you can follow the same steps used for TokenFlow to conduct testing.

I plan to further refine the testing code in the coming days by modularizing different components and improving the README for better clarity few days later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants