The downsampling ratio of VAE in maisi #1841

DopamineLcy · 2024-09-23T10:21:13Z

DopamineLcy
Sep 23, 2024

The downsampling ratio of VAE in maisi is 4, i.e. a [128, 128, 128] patch results in a [32, 32, 32] latent.
While 32 ^ 3 = 32768 is very large for a transformer-based model.
Is there any pre-trained VAE with a larger down-sampling ratio like 8 ([128, 128, 128]->[16,16,16]?

Thank you!

Can-Zhao · 2024-10-28T02:56:29Z

Can-Zhao
Oct 28, 2024
Collaborator

We currently do not have 8x8x8 downsampling VAE. Thank you for letting us know that this is desired!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The downsampling ratio of VAE in maisi #1841

{{title}}

Replies: 1 comment

{{title}}

Select a reply

The downsampling ratio of VAE in maisi #1841

DopamineLcy Sep 23, 2024

Replies: 1 comment

Can-Zhao Oct 28, 2024 Collaborator

DopamineLcy
Sep 23, 2024

Can-Zhao
Oct 28, 2024
Collaborator