File size: 1,120 Bytes
5113f9e
 
 
 
 
 
 
 
c9f46f1
 
5113f9e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c9f46f1
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
---
language: en
license: mit
tags:
- diffusion
- autoencoder
- feature-space
- svg
references: 
- https://arxiv.org/abs/2510.15301
---

# SVG: Latent Diffusion Model without Variational Autoencoder

## Model Description

SVG is a latent diffusion model framework that replaces the traditional VAE latent space with semantically structured features from self-supervised vision models (e.g., DINOv3). This design improves generative capability and downstream transferability while maintaining efficiency comparable to standard VAE-based latent diffusion models.

Key features:

- Replaces low-dimensional VAE latent space with high-dimensional semantic feature space.
- Includes a lightweight residual encoder for refining fine-grained details.
- Enables strong generation and perception performance.


## How to Use

For code, and instructions, see the GitHub repository:

[https://github.com/shiml20/SVG](https://github.com/shiml20/SVG)


Official project page:

[https://howlin-wang.github.io/svg/](https://howlin-wang.github.io/svg/)

Arxiv paper:

[https://arxiv.org/abs/2510.15301](https://arxiv.org/abs/2510.15301)