Fish Audio S2 Pro — FP8 (AEmotionStudio Mirror)

FP8 weight-only quantization of Fish Audio S2 Pro.

Details

Property	Value
Source model	fishaudio/s2-pro
Quantization	Per-row symmetric FP8 (float8_e4m3fn)
Linear layers quantized	201
FP8 params	4.048B
BF16 params	0.514B
Model size	4.73 GB
VRAM requirement	~12 GB

How it works

All nn.Linear weight matrices are quantized to float8_e4m3fn with per-row float32 scale factors. Non-linear weights (embeddings, layer norms, codec) remain in bfloat16. No external quantization library is needed — dequantization is pure PyTorch:

W_bf16 = W_fp8.to(torch.bfloat16) * scale

Usage with ComfyUI-FFMPEGA

This model is automatically downloaded and used by the ComfyUI-FFMPEGA extension for TTS and voice cloning features when FP8 precision is selected.

License

Fish Audio Research License — see LICENSE file.

✅ Free for research and non-commercial use
❌ Commercial use requires a separate license from Fish Audio (contact: business@fish.audio)

Built with Fish Audio.

Downloads last month: 73

Model tree for AEmotionStudio/fish-speech-s2-pro-fp8

Base model

AEmotionStudio/fish-speech-s2-pro

Finetuned

(1)

this model