SVG / README.md

howlin

Update README.md

c9f46f1 verified 18 days ago

preview code

raw

history blame contribute delete

1.12 kB

metadata

language: en
license: mit
tags:
  - diffusion
  - autoencoder
  - feature-space
  - svg
references:
  - https://arxiv.org/abs/2510.15301

SVG: Latent Diffusion Model without Variational Autoencoder

Model Description

SVG is a latent diffusion model framework that replaces the traditional VAE latent space with semantically structured features from self-supervised vision models (e.g., DINOv3). This design improves generative capability and downstream transferability while maintaining efficiency comparable to standard VAE-based latent diffusion models.

Key features:

Replaces low-dimensional VAE latent space with high-dimensional semantic feature space.
Includes a lightweight residual encoder for refining fine-grained details.
Enables strong generation and perception performance.

How to Use

For code, and instructions, see the GitHub repository:

https://github.com/shiml20/SVG

Official project page:

https://howlin-wang.github.io/svg/

Arxiv paper:

https://arxiv.org/abs/2510.15301