Walkyrie 1.3B β Text-to-Image
Walkyrie is a Text-to-Image diffusion model derived from Wan2.1-T2V-1.3B.
The text encoder (UMT5) was pruned to ~1B parameters and the model was re-trained for image generation, converting the original Text-to-Video architecture into a high-quality Text-to-Image pipeline.
| Version | Release repo |
|---|---|
| Preview1.0 | kpsss34/Walkyrie-1.3B-v1.0 |
| anime style | Coming soon |
| Turbo | Coming soon |
UPDATE
- MAY 11, 2026 = Re-Training (Reduce the amount of plastic color, lower the CFG to 2.5-3.0 for realism, and reduce the infer_steps to only 20 steps.)
Test my Custom_nodes in ComfyUI (incredible quality)
USE
git clone https://github.com/kpsss34/walkyrie.git
1.Download merge models to ComfyUI/models/checkpoints NAME: Walkyrie_bf16.safetensors or Walkyrie_fp8.safetensors
2.Make sure re-install diffusers version 0.33.0 Again
3.Open ComfyUI and search for node Walkyrie
Install dependencies
pip install git+https://github.com/huggingface/diffusers.git transformers accelerate torch torchvision ftfy
git clone https://github.com/kpsss34/Walkyrie-1.3B.git
cd Walkyrie-1.3B
Basic inference
import torch
from pipeline_walkyrie import pipeline_walkyrie
from diffusers import AutoencoderKLWan
from PIL import Image
device = "cuda" if torch.cuda.is_available() else "cpu"
model_dtype = torch.bfloat16
model_id = "kpsss34/Walkyrie-1.3B-v1.0"
pipe = pipeline_walkyrie.from_pretrained(
model_id,
torch_dtype=model_dtype
)
pipe.enable_model_cpu_offload() #pipe.to(device)
prompt = "a portrait of a young woman in a nightclub, cinematic film still, ultra wide aspect ratio, oval bokeh, soft highlight bloom, teal orange grading, film grain, moody lighting"
negative_prompt = ""
height = 1024
width = 1024
num_inference_steps = 20
guidance_scale = 3.0
generator = torch.Generator(device=device).manual_seed(0)
output = pipe(
prompt=prompt,
negative_prompt=negative_prompt,
height=height,
width=width,
num_inference_steps=num_inference_steps,
guidance_scale=guidance_scale,
generator=generator,
output_type="pil"
).frames[0]
output.save("output.png")
Memory-efficient inference (CPU offload)
pipe.enable_model_cpu_offload()
Model Details
| Property | Value |
|---|---|
| Base model | Wan2.1-T2V-1.3B |
| Task | Text-to-Image |
| Text Encoder | UMT5 (pruned to ~1B) |
| VAE | AutoencoderKLWan |
| Scheduler | FlowMatchEulerDiscreteScheduler |
| Precision | bfloat16 |
| Resolution | 1024Γ768, 768x1024 (recommended) |
What's Different from Wan2.1
- Text encoder pruned β UMT5 reduced to ~1B parameters for faster inference and lower VRAM usage
- Re-trained for T2I β fine-tuned specifically for image generation instead of video
Hardware Requirements
| VRAM | Setting |
|---|---|
| 16 GB+ | Full precision bfloat16 |
| 6β8 GB | enable_model_cpu_offload() |
License
This model is released under the Apache 2.0 License.
Free to use for both research and commercial purposes.
Citation
If you use this model in your work, please credit:
Walkyrie 1.3B β Text-to-Image model derived from Wan2.1-T2V-1.3B
https://huggingface.co/kpsss34/Walkyrie-1.3B-v1.0
https://github.com/kpsss34
https://huggingface.co/kpsss34
- Downloads last month
- 689


