- FLUX.1 [Code]
- Black Forest Labs
- Text-to-image generation
- Models
- FLUX.1-dev: https://huggingface.co/black-forest-labs/FLUX.1-dev
- FLUX.1-schnell: https://huggingface.co/black-forest-labs/FLUX.1-schnell
- Scaling Rectified Flow Transformers for High-Resolution Image Synthesis (arXiv:2403.03206) [arXiv] [Blog]
- Stability AI
- Stable Diffusion 3 (SD3)
- Multimodal Diffusion Transformer (MMDiT)
- Models
- Stable Diffusion 3 Medium: https://huggingface.co/stabilityai/stable-diffusion-3-medium
- Scalable Diffusion Models with Transformers (ICCV 2023) [arXiv] [Paper] [Code] [Homepage]
- UC Berkeley & NYU
- DiT
- Kolors: Effective Training of Diffusion Model for Photorealistic Text-to-Image Synthesis [Technical Report]
- Kuaishou Kolors
- Text-to-image generation
- Model: https://huggingface.co/Kwai-Kolors/Kolors
- SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis (arXiv:2307.01952) [arXiv]
- High-Resolution Image Synthesis with Latent Diffusion Models (CVPR 2022) [Paper] [arXiv] [Code]
- LMU Munich & Runway ML
- Latent Diffusion Models (LDMs)
- Models
- Stable-Diffusion-v1-5: https://huggingface.co/runwayml/stable-diffusion-v1-5
- Initialized with the weights of the Stable-Diffusion-v1-2 checkpoint and subsequently fine-tuned on 595k steps at resolution 512x512.
- Stable-Diffusion-v1-5: https://huggingface.co/runwayml/stable-diffusion-v1-5
- Stable Video 4D (SV4D)
- Stability AI
- Model: https://huggingface.co/stabilityai/sv4d
- Generate 40 frames (5 video frames x 8 camera views) at 576x576 resolution, given 5 reference frames of the same size.
- Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets (arXiv:2311.15127) [arXiv] [Blog]
- Stability AI
- Stable Video Diffusion (SVD)
- Text-to-video and image-to-video generation
- Models
- https://huggingface.co/stabilityai/stable-video-diffusion-img2vid
- Generate 14 frames at resolution 576x1024 given a context frame of the same size.
- https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt
- Fine-tuned from the SVD-img2vid.
- Generate 25 frames at resolution 576x1024 given a context frame of the same size.
- https://huggingface.co/stabilityai/stable-video-diffusion-img2vid
- LLM: Large Language Model