none-yet/anime-captions
Viewer • Updated • 337k • 800 • 30
How to use dixisouls/anime-diffusion with Diffusers:
pip install -U diffusers transformers accelerate
import torch
from diffusers import DiffusionPipeline
# switch to "mps" for apple devices
pipe = DiffusionPipeline.from_pretrained("dixisouls/anime-diffusion", dtype=torch.bfloat16, device_map="cuda")
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt).images[0]import torch
from diffusers import DiffusionPipeline
# switch to "mps" for apple devices
pipe = DiffusionPipeline.from_pretrained("dixisouls/anime-diffusion", dtype=torch.bfloat16, device_map="cuda")
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt).images[0]A UNet2DConditionModel fine-tuned for anime-style image generation, based on Stable Diffusion v1.4.
| Component | Model ID |
|---|---|
| VAE | stabilityai/sd-vae-ft-mse |
| Text Encoder | openai/clip-vit-large-patch14 |
| Tokenizer | openai/clip-vit-large-patch14 |
import torch
from diffusers import AutoencoderKL, DDIMScheduler, UNet2DConditionModel
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
from transformers import CLIPTextModel, CLIPTokenizer
# Load models
tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14")
text_encoder = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14")
vae = AutoencoderKL.from_pretrained("stabilityai/sd-vae-ft-mse")
unet = UNet2DConditionModel.from_pretrained("CompVis/stable-diffusion-v1-4", subfolder="unet")
# Load fine-tuned EMA weights
weights_path = hf_hub_download(repo_id="dixisouls/anime-diffusion", filename="model.safetensors")
unet.load_state_dict(load_file(weights_path))
# Use DDIMScheduler for inference
scheduler = DDIMScheduler(
num_train_timesteps=1000,
beta_schedule="linear",
clip_sample=False,
prediction_type="epsilon",
)
See the companion HuggingFace Space for a full interactive demo.
Base model
CompVis/stable-diffusion-v1-4