Diffusion Models

Image Generation

FLUX.1 [Code]
- Black Forest Labs
- Text-to-image generation
- Models
  - FLUX.1-dev: https://huggingface.co/black-forest-labs/FLUX.1-dev
  - FLUX.1-schnell: https://huggingface.co/black-forest-labs/FLUX.1-schnell
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis (arXiv:2403.03206) [arXiv] [Blog]
- Stability AI
- Stable Diffusion 3 (SD3)
- Multimodal Diffusion Transformer (MMDiT)
- Models
  - Stable Diffusion 3 Medium: https://huggingface.co/stabilityai/stable-diffusion-3-medium
Scalable Diffusion Models with Transformers (ICCV 2023) [arXiv] [Paper] [Code] [Homepage]
- UC Berkeley & NYU
- DiT

Kolors: Effective Training of Diffusion Model for Photorealistic Text-to-Image Synthesis [Technical Report]
- Kuaishou Kolors
- Text-to-image generation
- Model: https://huggingface.co/Kwai-Kolors/Kolors
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis (arXiv:2307.01952) [arXiv]
- Stability AI
- Models
  - https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0
  - https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0
High-Resolution Image Synthesis with Latent Diffusion Models (CVPR 2022) [Paper] [arXiv] [Code]
- LMU Munich & Runway ML
- Latent Diffusion Models (LDMs)
- Models
  - Stable-Diffusion-v1-5: https://huggingface.co/runwayml/stable-diffusion-v1-5
    - Initialized with the weights of the Stable-Diffusion-v1-2 checkpoint and subsequently fine-tuned on 595k steps at resolution 512x512.

Stable Video 4D (SV4D)
- Stability AI
- Model: https://huggingface.co/stabilityai/sv4d
  - Generate 40 frames (5 video frames x 8 camera views) at 576x576 resolution, given 5 reference frames of the same size.
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets (arXiv:2311.15127) [arXiv] [Blog]
- Stability AI
- Stable Video Diffusion (SVD)
- Text-to-video and image-to-video generation
- Models
  - https://huggingface.co/stabilityai/stable-video-diffusion-img2vid
    - Generate 14 frames at resolution 576x1024 given a context frame of the same size.
  - https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt
    - Fine-tuned from the SVD-img2vid.
    - Generate 25 frames at resolution 576x1024 given a context frame of the same size.