Using VQVAE to compress and then using Diffusion Models to model latent variables cannot generate good pictures.

Use vae to encode x and then train
 vae = AutoencoderKL.from_pretrained(f"stabilityai/sd-vae-ft-ema").to(device)
 x = vae.encode(x).latent_dist.sample().mul_(0.18215)
 
 Sample
 with torch.no_grad():
          z = ema_sample_method(opt.n_sample, z_shape, guide_w=opt.w)
          x_gen = vae.decode(z / 0.18215).sample
The generation effect is poor
![image_ep400_w0 3_ema](https://github.com/user-attachments/assets/a569f3c6-40cb-4e31-8f90-efed8048ebb9)

Hope there is a solution


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using VQVAE to compress and then using Diffusion Models to model latent variables cannot generate good pictures. #6

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Using VQVAE to compress and then using Diffusion Models to model latent variables cannot generate good pictures. #6

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions