Skip to content

Using VQVAE to compress and then using Diffusion Models to model latent variables cannot generate good pictures. #6

@yh-xxx

Description

@yh-xxx

Use vae to encode x and then train
vae = AutoencoderKL.from_pretrained(f"stabilityai/sd-vae-ft-ema").to(device)
x = vae.encode(x).latent_dist.sample().mul_(0.18215)

Sample
with torch.no_grad():
z = ema_sample_method(opt.n_sample, z_shape, guide_w=opt.w)
x_gen = vae.decode(z / 0.18215).sample
The generation effect is poor
image_ep400_w0 3_ema

Hope there is a solution

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions