Latent Diffusion / DiT
Latent Diffusion Models (High-Resolution Image Synthesis with Latent Diffusion Models[)
https://arxiv.org/pdf/2112.10752
To generate high-resolution image.
The noise predictor is trained in the latent space of AutoEncoder.
Earlier models used U-Net with the attention module at each layer for the noise prediction.
DiT: Diffusion Models with Transformers
https://arxiv.org/pdf/2212.09748
DDPM 也可以不用Unet做扩散, DiT 就是用 ViT 代替 Unet
DiT is based on the Vision Transformer (ViT) architecture which operates on sequences of patches
Enjoy Reading This Article?
Here are some more articles you might like to read next: