HQ-SVC: Towards High-Quality Zero-Shot Singing Voice Conversion in Low-Resource Scenarios
Official Repository of Paper: "Towards High-Quality Zero-Shot Singing Voice Conversion in Low-Resource Scenarios"(AAAI 2026)
π News
- [2025-11-08] π Paper accepted by AAAI 2026
- [2025-11-12] π arXiv paper released
- [2025-11-12] π Demo released
- [2025-12-24] π Inference codes and pre-trained models released
π Release Plan
- arXiv preprint
- Online demo
- Inference codes
- Pre-trained models
- Training codes
β¨ New features
- Singing style control
- Improved quality
HQ-SVC is an efficient framework for high-quality zero-shot singing voice conversion (SVC) in low-resource scenarios. It achieves disentanglement of content and speaker features via a unified decoupled codec, and enhances synthesis quality through multi-feature fusion and progressive optimization.
Unlike existing methods that demand large datasets or heavy computational resources, HQ-SVC unifies:
- π Zero-shot conversion for unseen speakers without fine-tuning
- β‘ Low-resource training (single consumer-grade GPU, <80h data)
- π§ Dual capabilities: high-quality singing voice conversion + voice super-resolution
- π― Superior naturalness and speaker similarity compared to SOTA methods
πΈ Try Inference
1. Download Codes and Environment δΈθ½½δ»£η εη―ε’
git clone https://github.com/ShawnPi233/HQ-SVC.git
cd HQ-SVC
wget -c https://huggingface.co/shawnpi/HQ-SVC/resolve/main/environment.tar.gz
wget -c https://hf-mirror.com/shawnpi/HQ-SVC/resolve/main/environment.tar.gz # ε―ιιεζΊ
2. Unzip Environment θ§£εη―ε’
mkdir -p venv
tar -xzf environment.tar.gz -C venv
3. Activate Environment ζΏζ΄»η―ε’
source venv/bin/activate
4. Download Pretrained Models δΈθ½½ζι
export HF_HUB_ENABLE_HF_TRANSFER=0
huggingface-cli download shawnpi/HQ-SVC --include "utils/pretrain/*" --local-dir . --local-dir-use-symlinks False
5. Running θΏθ‘
python gradio_app.py
- ε¦ζζ₯ι Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
- θ―·ζ§θ‘δ»₯δΈδ»£η εεε―ε¨δΈθΏ°δ»£η
unset LD_LIBRARY_PATH
Zero-shot Super-Resolution (16 kHz to 44.1 kHz): Input only source audio
Zero-shot Singing Voice Conversion: Input both source audio and target audio
π Citation
If you use HQ-SVC in your research, please cite our work:
@article{bai2025hq,
title={HQ-SVC: Towards High-Quality Zero-Shot Singing Voice Conversion in Low-Resource Scenarios},
author={Bai, Bingsong and Geng, Yizhong and Wang, Fengping and Wang, Cong and Guo, Puyuan and Gao, Yingming and Li, Ya},
journal={arXiv preprint arXiv:2511.08496},
year={2025}
}
π Acknowledgement
We thank the open-source communities behind: