Instructions to use Serveurperso/ACE-Step-1.5-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Serveurperso/ACE-Step-1.5-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Serveurperso/ACE-Step-1.5-GGUF",
	filename="Qwen3-Embedding-0.6B-BF16.gguf",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use Serveurperso/ACE-Step-1.5-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Serveurperso/ACE-Step-1.5-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf Serveurperso/ACE-Step-1.5-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Serveurperso/ACE-Step-1.5-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf Serveurperso/ACE-Step-1.5-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Serveurperso/ACE-Step-1.5-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf Serveurperso/ACE-Step-1.5-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Serveurperso/ACE-Step-1.5-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Serveurperso/ACE-Step-1.5-GGUF:Q4_K_M

Use Docker

docker model run hf.co/Serveurperso/ACE-Step-1.5-GGUF:Q4_K_M

LM Studio
Jan
Ollama
How to use Serveurperso/ACE-Step-1.5-GGUF with Ollama:
```
ollama run hf.co/Serveurperso/ACE-Step-1.5-GGUF:Q4_K_M
```

Unsloth Studio new

How to use Serveurperso/ACE-Step-1.5-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Serveurperso/ACE-Step-1.5-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Serveurperso/ACE-Step-1.5-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Serveurperso/ACE-Step-1.5-GGUF to start chatting

Docker Model Runner
How to use Serveurperso/ACE-Step-1.5-GGUF with Docker Model Runner:
```
docker model run hf.co/Serveurperso/ACE-Step-1.5-GGUF:Q4_K_M
```

Lemonade

How to use Serveurperso/ACE-Step-1.5-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull Serveurperso/ACE-Step-1.5-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.ACE-Step-1.5-GGUF-Q4_K_M

List all available models

lemonade list

How to use from the

Use from the

llama-cpp-python library

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Serveurperso/ACE-Step-1.5-GGUF",
	filename="",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

ACE-Step 1.5 GGUF

Pre-quantized GGUF models for acestep.cpp, a portable C++17 implementation of the ACE-Step 1.5 AI Music Generation using GGML.

Text + lyrics in, stereo 48kHz audio out. Runs on CPU, CUDA, Metal, Vulkan.

Quick start

git clone --recurse-submodules https://github.com/ServeurpersoCom/acestep.cpp
cd acestep.cpp

pip install huggingface_hub
./models.sh           # downloads Q8_0 turbo essentials (~7.7 GB)

mkdir build && cd build
cmake .. -DGGML_CUDA=ON
cmake --build . --config Release -j$(nproc)
cd ..

./build/ace-server --models ./models --host 0.0.0.0 --port 8085

Open http://localhost:8085 in your browser. The embedded WebUI handles everything: write a caption, set lyrics, generate, play, and download tracks.

Models are loaded on demand and swapped automatically from the UI.

CLI tools (without the server)

# LM: generate lyrics + audio codes
./build/ace-lm \
    --request /tmp/request.json \
    --lm models/acestep-5Hz-lm-4B-Q8_0.gguf

# DiT + VAE: synthesize audio
./build/ace-synth \
    --request /tmp/request0.json \
    --embedding models/Qwen3-Embedding-0.6B-Q8_0.gguf \
    --dit models/acestep-v15-turbo-Q8_0.gguf \
    --vae models/vae-BF16.gguf

Available models

Text encoder

File	Quant	Size
Qwen3-Embedding-0.6B-BF16.gguf	BF16	1.2 GB
Qwen3-Embedding-0.6B-Q8_0.gguf	Q8_0	748 MB

Frozen Qwen3 encoder (28 layers, 1024-dim). The DiT was trained end-to-end with this exact model. Its CondEncoder projection weights (1024 to 2048) are baked into every DiT checkpoint, so the Text-Enc is architecturally locked to 0.6B.

LM (Qwen3 causal, audio code generation)

File	Params	Quant	Size
acestep-5Hz-lm-4B-BF16.gguf	4B	BF16	7.9 GB
acestep-5Hz-lm-4B-Q8_0.gguf	4B	Q8_0	4.2 GB
acestep-5Hz-lm-4B-Q6_K.gguf	4B	Q6_K	3.3 GB
acestep-5Hz-lm-4B-Q5_K_M.gguf	4B	Q5_K_M	2.9 GB
acestep-5Hz-lm-1.7B-BF16.gguf	1.7B	BF16	3.5 GB
acestep-5Hz-lm-1.7B-Q8_0.gguf	1.7B	Q8_0	1.9 GB
acestep-5Hz-lm-0.6B-BF16.gguf	0.6B	BF16	1.3 GB
acestep-5Hz-lm-0.6B-Q8_0.gguf	0.6B	Q8_0	677 MB

Small LMs (0.6B/1.7B) only have BF16 + Q8_0 (too small for aggressive quantization). The 4B LM does not have Q4_K_M (breaks audio code generation).

DiT (flow matching diffusion transformer)

Standard (2B)

Available for 7 variants: turbo, sft, sftturbo50, base, turbo-shift1, turbo-shift3, turbo-continuous.

Quant	Size per variant
BF16	4.5 GB
Q8_0	2.4 GB
Q6_K	1.9 GB
Q5_K_M	1.6 GB
Q4_K_M	1.4 GB

XL (4B)

Available for 4 variants: xl-turbo, xl-sft, xl-sftturbo50, xl-base.

Quant	Size per variant
BF16	9.3 GB
Q8_0	5.0 GB
Q6_K	3.9 GB
Q5_K_M	3.3 GB
Q4_K_M	2.8 GB

Turbo: 8 steps. SFT/Base: 32-50 steps. SftTurbo50: weight merge provides a little more richness from the SFT while maintaining the low number of Turbo steps.

VAE

File	Size
vae-BF16.gguf	322 MB

Always BF16 (small, bandwidth-bound, quality-critical).

Pipeline

Compose (ace-lm):

  Caption -> Qwen3 LM (0.6B/1.7B/4B) -> metadata + lyrics + audio codes (5Hz)

Synthesize (ace-synth / ace-server):

  Caption + Lyrics -> Text-Enc (Qwen3-Embedding-0.6B) -> CondEncoder
  Audio codes 5Hz -> FSQ detokenizer (neural net in DiT GGUF) -> latents 25Hz
  LoRA (optional) -> DiT (flow matching, Euler steps) -> latents 25Hz -> VAE decode -> WAV 48kHz

Cover modes (ace-server):

  Source audio -> VAE encode -> latents 25Hz -> DiT context
  Reference audio -> timbre conditioning

The LM and DiT were co-trained on the same music data. The LM operates at 5Hz (each token = 200ms of music, vocabulary of 64000 learned codes) and builds the global musical structure autoregressively with creative sampling. The DiT takes over at 25Hz (one frame every 40ms) and uses flow matching to render the high-frequency details: timbre, transients, vocal articulation, stereo imaging.

Both stages support batching for parallel generation.

Music guide

For tips on writing effective prompts, understanding inference parameters, and getting the best results:

A Musician's Guide (non-technical, for music makers)
Tutorial (design philosophy, architecture, hyperparameters)

Acknowledgements

Independent C++/GGML implementation based on ACE-Step 1.5 by ACE Studio and StepFun. All original model weights are theirs.

SFT/Turbo merge weights (standard 2B DiT) by Aryanne. SFT/Turbo merge weights (XL 4B DiT) by jeankassio.

Model tree for Serveurperso/ACE-Step-1.5-GGUF

Base model

ACE-Step/Ace-Step1.5

Quantized

(3)

this model

Serveurperso
/

ACE-Step-1.5-GGUF