HQ-SVC: Towards High-Quality Zero-Shot Singing Voice Conversion in Low-Resource Scenarios

Official Repository of Paper: "Towards High-Quality Zero-Shot Singing Voice Conversion in Low-Resource Scenarios"(AAAI 2026)

HQ-SVC Logo

HQ-SVC is an efficient framework for high-quality zero-shot singing voice conversion (SVC) in low-resource scenarios. It achieves disentanglement of content and speaker features via a unified decoupled codec, and enhances synthesis quality through multi-feature fusion and progressive optimization.

Unlike existing methods that demand large datasets or heavy computational resources, HQ-SVC unifies:

🚀 Zero-shot conversion for unseen speakers without fine-tuning
⚡ Low-resource training (single consumer-grade GPU, <80h data)
🎧 Dual capabilities: high-quality singing voice conversion + voice super-resolution
🎯 Superior naturalness and speaker similarity compared to SOTA methods

🗞 News

[2025-11-08] 🎉 Paper accepted by AAAI 2026
[2025-11-12] 🎉 arXiv paper released
[2025-11-12] 🎉 Demo released
[2025-12-24] 🎉 Inference codes and pre-trained models released

📅 Release Plan

arXiv preprint
Online demo
Inference codes
Pre-trained models
Training codes

✨ New features

Singing style control
Improved quality

🎸 Try Inference

1. Download Codes and Environment（下载代码和环境）

Tested only on Linux platforms with CUDA >= 11.8 (仅在 Linux 平台、CUDA >= 11.8 的环境上测试通过)
Windows users can use WSL (Ubuntu) for deployment and execution (Windows 用户可以使用 WSL (Ubuntu) 进行部署运行)

git clone https://github.com/ShawnPi233/HQ-SVC.git
cd HQ-SVC

wget -c https://huggingface.co/shawnpi/HQ-SVC/resolve/main/environment.tar.gz

wget -c https://hf-mirror.com/shawnpi/HQ-SVC/resolve/main/environment.tar.gz # Optional mirror

2. Unzip Environment（解压环境）

mkdir -p venv
tar -xzf environment.tar.gz -C venv

3. Activate Environment（激活环境）

source venv/bin/activate

4. Running（运行）

export HF_ENDPOINT=https://hf-mirror.com # Optional mirror
python gradio_app.py

If you encounter the error Caught signal 11 (Segmentation fault: address not mapped to object at address (nil)) (如果报错 Caught signal 11 (Segmentation fault: address not mapped to object at address (nil)))
Please execute the following code before running the above code (请执行以下代码后再启动上述代码)

unset LD_LIBRARY_PATH

Zero-shot Super-Resolution (16 kHz to 44.1 kHz): Input only source audio

Zero-shot Singing Voice Conversion: Input both source audio and target audio

📜 Citation

If you use HQ-SVC in your research, please cite our work:

@inproceedings{bai2026hqsvc,
  author    = {Bingsong Bai and Yizhong Geng and Fengping Wang and Cong Wang and Puyuan Guo and Yingming Gao and Ya Li},
  title     = {HQ-SVC: Towards High-Quality Zero-Shot Singing Voice Conversion in Low-Resource Scenarios},
  booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence},
  volume    = {40},
  number    = {36},
  pages     = {30013--30021},
  year      = {2026},
  doi       = {10.1609/aaai.v40i36.40249},
  url       = {https://doi.org/10.1609/aaai.v40i36.40249}
}

🙏 Acknowledgement

We thank the open-source communities behind:

⭐️ Star History

Downloads last month: 77

Paper for shawnpi/HQ-SVC

HQ-SVC: Towards High-Quality Zero-Shot Singing Voice Conversion in Low-Resource Scenarios

Paper • 2511.08496 • Published Nov 11, 2025 • 1