File size: 1,120 Bytes
5113f9e c9f46f1 5113f9e c9f46f1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
---
language: en
license: mit
tags:
- diffusion
- autoencoder
- feature-space
- svg
references:
- https://arxiv.org/abs/2510.15301
---
# SVG: Latent Diffusion Model without Variational Autoencoder
## Model Description
SVG is a latent diffusion model framework that replaces the traditional VAE latent space with semantically structured features from self-supervised vision models (e.g., DINOv3). This design improves generative capability and downstream transferability while maintaining efficiency comparable to standard VAE-based latent diffusion models.
Key features:
- Replaces low-dimensional VAE latent space with high-dimensional semantic feature space.
- Includes a lightweight residual encoder for refining fine-grained details.
- Enables strong generation and perception performance.
## How to Use
For code, and instructions, see the GitHub repository:
[https://github.com/shiml20/SVG](https://github.com/shiml20/SVG)
Official project page:
[https://howlin-wang.github.io/svg/](https://howlin-wang.github.io/svg/)
Arxiv paper:
[https://arxiv.org/abs/2510.15301](https://arxiv.org/abs/2510.15301) |