--- language: en license: mit tags: - diffusion - autoencoder - feature-space - svg references: - https://arxiv.org/abs/2510.15301 --- # SVG: Latent Diffusion Model without Variational Autoencoder ## Model Description SVG is a latent diffusion model framework that replaces the traditional VAE latent space with semantically structured features from self-supervised vision models (e.g., DINOv3). This design improves generative capability and downstream transferability while maintaining efficiency comparable to standard VAE-based latent diffusion models. Key features: - Replaces low-dimensional VAE latent space with high-dimensional semantic feature space. - Includes a lightweight residual encoder for refining fine-grained details. - Enables strong generation and perception performance. ## How to Use For code, and instructions, see the GitHub repository: [https://github.com/shiml20/SVG](https://github.com/shiml20/SVG) Official project page: [https://howlin-wang.github.io/svg/](https://howlin-wang.github.io/svg/) Arxiv paper: [https://arxiv.org/abs/2510.15301](https://arxiv.org/abs/2510.15301)