Instructions to use Serveurperso/ACE-Step-1.5-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use Serveurperso/ACE-Step-1.5-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Serveurperso/ACE-Step-1.5-GGUF", filename="Qwen3-Embedding-0.6B-BF16.gguf", )
output = llm( "Once upon a time,", max_tokens=512, echo=True ) print(output)
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use Serveurperso/ACE-Step-1.5-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Serveurperso/ACE-Step-1.5-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf Serveurperso/ACE-Step-1.5-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Serveurperso/ACE-Step-1.5-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf Serveurperso/ACE-Step-1.5-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf Serveurperso/ACE-Step-1.5-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf Serveurperso/ACE-Step-1.5-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf Serveurperso/ACE-Step-1.5-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf Serveurperso/ACE-Step-1.5-GGUF:Q4_K_M
Use Docker
docker model run hf.co/Serveurperso/ACE-Step-1.5-GGUF:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use Serveurperso/ACE-Step-1.5-GGUF with Ollama:
ollama run hf.co/Serveurperso/ACE-Step-1.5-GGUF:Q4_K_M
- Unsloth Studio new
How to use Serveurperso/ACE-Step-1.5-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Serveurperso/ACE-Step-1.5-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Serveurperso/ACE-Step-1.5-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Serveurperso/ACE-Step-1.5-GGUF to start chatting
- Docker Model Runner
How to use Serveurperso/ACE-Step-1.5-GGUF with Docker Model Runner:
docker model run hf.co/Serveurperso/ACE-Step-1.5-GGUF:Q4_K_M
- Lemonade
How to use Serveurperso/ACE-Step-1.5-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull Serveurperso/ACE-Step-1.5-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.ACE-Step-1.5-GGUF-Q4_K_M
List all available models
lemonade list
output = llm(
"Once upon a time,",
max_tokens=512,
echo=True
)
print(output)ACE-Step 1.5 GGUF
Pre-quantized GGUF models for acestep.cpp, a portable C++17 implementation of the ACE-Step 1.5 AI Music Generation using GGML.
Text + lyrics in, stereo 48kHz audio out. Runs on CPU, CUDA, Metal, Vulkan.
Quick start
git clone --recurse-submodules https://github.com/ServeurpersoCom/acestep.cpp
cd acestep.cpp
pip install huggingface_hub
./models.sh # downloads Q8_0 turbo essentials (~7.7 GB)
mkdir build && cd build
cmake .. -DGGML_CUDA=ON
cmake --build . --config Release -j$(nproc)
cd ..
./build/ace-server --models ./models --host 0.0.0.0 --port 8085
Open http://localhost:8085 in your browser. The embedded WebUI handles everything: write a caption, set lyrics, generate, play, and download tracks.
Models are loaded on demand and swapped automatically from the UI.
CLI tools (without the server)
# LM: generate lyrics + audio codes
./build/ace-lm \
--request /tmp/request.json \
--lm models/acestep-5Hz-lm-4B-Q8_0.gguf
# DiT + VAE: synthesize audio
./build/ace-synth \
--request /tmp/request0.json \
--embedding models/Qwen3-Embedding-0.6B-Q8_0.gguf \
--dit models/acestep-v15-turbo-Q8_0.gguf \
--vae models/vae-BF16.gguf
Available models
Text encoder
| File | Quant | Size |
|---|---|---|
| Qwen3-Embedding-0.6B-BF16.gguf | BF16 | 1.2 GB |
| Qwen3-Embedding-0.6B-Q8_0.gguf | Q8_0 | 748 MB |
Frozen Qwen3 encoder (28 layers, 1024-dim). The DiT was trained end-to-end with this exact model. Its CondEncoder projection weights (1024 to 2048) are baked into every DiT checkpoint, so the Text-Enc is architecturally locked to 0.6B.
LM (Qwen3 causal, audio code generation)
| File | Params | Quant | Size |
|---|---|---|---|
| acestep-5Hz-lm-4B-BF16.gguf | 4B | BF16 | 7.9 GB |
| acestep-5Hz-lm-4B-Q8_0.gguf | 4B | Q8_0 | 4.2 GB |
| acestep-5Hz-lm-4B-Q6_K.gguf | 4B | Q6_K | 3.3 GB |
| acestep-5Hz-lm-4B-Q5_K_M.gguf | 4B | Q5_K_M | 2.9 GB |
| acestep-5Hz-lm-1.7B-BF16.gguf | 1.7B | BF16 | 3.5 GB |
| acestep-5Hz-lm-1.7B-Q8_0.gguf | 1.7B | Q8_0 | 1.9 GB |
| acestep-5Hz-lm-0.6B-BF16.gguf | 0.6B | BF16 | 1.3 GB |
| acestep-5Hz-lm-0.6B-Q8_0.gguf | 0.6B | Q8_0 | 677 MB |
Small LMs (0.6B/1.7B) only have BF16 + Q8_0 (too small for aggressive quantization). The 4B LM does not have Q4_K_M (breaks audio code generation).
DiT (flow matching diffusion transformer)
Standard (2B)
Available for 7 variants: turbo, sft, sftturbo50, base, turbo-shift1, turbo-shift3, turbo-continuous.
| Quant | Size per variant |
|---|---|
| BF16 | 4.5 GB |
| Q8_0 | 2.4 GB |
| Q6_K | 1.9 GB |
| Q5_K_M | 1.6 GB |
| Q4_K_M | 1.4 GB |
XL (4B)
Available for 4 variants: xl-turbo, xl-sft, xl-sftturbo50, xl-base.
| Quant | Size per variant |
|---|---|
| BF16 | 9.3 GB |
| Q8_0 | 5.0 GB |
| Q6_K | 3.9 GB |
| Q5_K_M | 3.3 GB |
| Q4_K_M | 2.8 GB |
Turbo: 8 steps. SFT/Base: 32-50 steps. SftTurbo50: weight merge provides a little more richness from the SFT while maintaining the low number of Turbo steps.
VAE
| File | Size |
|---|---|
| vae-BF16.gguf | 322 MB |
Always BF16 (small, bandwidth-bound, quality-critical).
Pipeline
Compose (ace-lm):
Caption -> Qwen3 LM (0.6B/1.7B/4B) -> metadata + lyrics + audio codes (5Hz)
Synthesize (ace-synth / ace-server):
Caption + Lyrics -> Text-Enc (Qwen3-Embedding-0.6B) -> CondEncoder
Audio codes 5Hz -> FSQ detokenizer (neural net in DiT GGUF) -> latents 25Hz
LoRA (optional) -> DiT (flow matching, Euler steps) -> latents 25Hz -> VAE decode -> WAV 48kHz
Cover modes (ace-server):
Source audio -> VAE encode -> latents 25Hz -> DiT context
Reference audio -> timbre conditioning
The LM and DiT were co-trained on the same music data. The LM operates at 5Hz (each token = 200ms of music, vocabulary of 64000 learned codes) and builds the global musical structure autoregressively with creative sampling. The DiT takes over at 25Hz (one frame every 40ms) and uses flow matching to render the high-frequency details: timbre, transients, vocal articulation, stereo imaging.
Both stages support batching for parallel generation.
Music guide
For tips on writing effective prompts, understanding inference parameters, and getting the best results:
- A Musician's Guide (non-technical, for music makers)
- Tutorial (design philosophy, architecture, hyperparameters)
Acknowledgements
Independent C++/GGML implementation based on ACE-Step 1.5 by ACE Studio and StepFun. All original model weights are theirs.
SFT/Turbo merge weights (standard 2B DiT) by Aryanne. SFT/Turbo merge weights (XL 4B DiT) by jeankassio.
Links
- acestep.cpp - source code
- ACE-Step 1.5 - original Python implementation
- ACE-Step model hub - original weights
- Downloads last month
- 53,033
4-bit
5-bit
6-bit
8-bit
16-bit
Model tree for Serveurperso/ACE-Step-1.5-GGUF
Base model
ACE-Step/Ace-Step1.5
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Serveurperso/ACE-Step-1.5-GGUF", filename="", )