metadata
language: en
license: mit
tags:
- diffusion
- autoencoder
- feature-space
- svg
references:
- https://arxiv.org/abs/2510.15301
SVG: Latent Diffusion Model without Variational Autoencoder
Model Description
SVG is a latent diffusion model framework that replaces the traditional VAE latent space with semantically structured features from self-supervised vision models (e.g., DINOv3). This design improves generative capability and downstream transferability while maintaining efficiency comparable to standard VAE-based latent diffusion models.
Key features:
- Replaces low-dimensional VAE latent space with high-dimensional semantic feature space.
- Includes a lightweight residual encoder for refining fine-grained details.
- Enables strong generation and perception performance.
How to Use
For code, and instructions, see the GitHub repository:
https://github.com/shiml20/SVG
Official project page:
https://howlin-wang.github.io/svg/
Arxiv paper: