Instructions to use SakuraLLM/Sakura-14B-Qwen2.5-v1.0-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use SakuraLLM/Sakura-14B-Qwen2.5-v1.0-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="SakuraLLM/Sakura-14B-Qwen2.5-v1.0-GGUF", filename="sakura-14b-qwen2.5-v1.0-iq4xs.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use SakuraLLM/Sakura-14B-Qwen2.5-v1.0-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf SakuraLLM/Sakura-14B-Qwen2.5-v1.0-GGUF # Run inference directly in the terminal: llama-cli -hf SakuraLLM/Sakura-14B-Qwen2.5-v1.0-GGUF
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf SakuraLLM/Sakura-14B-Qwen2.5-v1.0-GGUF # Run inference directly in the terminal: llama-cli -hf SakuraLLM/Sakura-14B-Qwen2.5-v1.0-GGUF
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf SakuraLLM/Sakura-14B-Qwen2.5-v1.0-GGUF # Run inference directly in the terminal: ./llama-cli -hf SakuraLLM/Sakura-14B-Qwen2.5-v1.0-GGUF
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf SakuraLLM/Sakura-14B-Qwen2.5-v1.0-GGUF # Run inference directly in the terminal: ./build/bin/llama-cli -hf SakuraLLM/Sakura-14B-Qwen2.5-v1.0-GGUF
Use Docker
docker model run hf.co/SakuraLLM/Sakura-14B-Qwen2.5-v1.0-GGUF
- LM Studio
- Jan
- Ollama
How to use SakuraLLM/Sakura-14B-Qwen2.5-v1.0-GGUF with Ollama:
ollama run hf.co/SakuraLLM/Sakura-14B-Qwen2.5-v1.0-GGUF
- Unsloth Studio new
How to use SakuraLLM/Sakura-14B-Qwen2.5-v1.0-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for SakuraLLM/Sakura-14B-Qwen2.5-v1.0-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for SakuraLLM/Sakura-14B-Qwen2.5-v1.0-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for SakuraLLM/Sakura-14B-Qwen2.5-v1.0-GGUF to start chatting
- Pi new
How to use SakuraLLM/Sakura-14B-Qwen2.5-v1.0-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf SakuraLLM/Sakura-14B-Qwen2.5-v1.0-GGUF
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "SakuraLLM/Sakura-14B-Qwen2.5-v1.0-GGUF" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use SakuraLLM/Sakura-14B-Qwen2.5-v1.0-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf SakuraLLM/Sakura-14B-Qwen2.5-v1.0-GGUF
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default SakuraLLM/Sakura-14B-Qwen2.5-v1.0-GGUF
Run Hermes
hermes
- Docker Model Runner
How to use SakuraLLM/Sakura-14B-Qwen2.5-v1.0-GGUF with Docker Model Runner:
docker model run hf.co/SakuraLLM/Sakura-14B-Qwen2.5-v1.0-GGUF
- Lemonade
How to use SakuraLLM/Sakura-14B-Qwen2.5-v1.0-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull SakuraLLM/Sakura-14B-Qwen2.5-v1.0-GGUF
Run and chat with the model
lemonade run user.Sakura-14B-Qwen2.5-v1.0-GGUF-{{QUANT_TAG}}List all available models
lemonade list
1.0版本——功能性更新:
- 改善翻译质量,提高翻译准确率,尤其是人称的准确率。
- 支持术语表(GPT字典),以保持专有名词和人称的一致性。
- 提高部分简单控制符的保留能力,尤其是单行内存在\n的情况下保留\n的能力。降低行数与原文不一致的概率。
- 由于底模使用GQA,推理速度和显存占用显著改善,可实现更快的多线程推理。关于多线程推理,可参考Sakura启动器GUI使用教程或SakuraLLMServer。
prompt格式:
gpt_dict = [{
"src": "原文1",
"dst": "译文1",
"info": "注释信息1",
},]
gpt_dict_text_list = []
for gpt in gpt_dict:
src = gpt['src']
dst = gpt['dst']
info = gpt['info'] if "info" in gpt.keys() else None
if info:
single = f"{src}->{dst} #{info}"
else:
single = f"{src}->{dst}"
gpt_dict_text_list.append(single)
gpt_dict_raw_text = "\n".join(gpt_dict_text_list)
user_prompt = "根据以下术语表(可以为空):\n" + gpt_dict_raw_text + "\n" + "将下面的日文文本根据对应关系和备注翻译成中文:" + japanese
prompt = "<|im_start|>system\n你是一个轻小说翻译模型,可以流畅通顺地以日本轻小说的风格将日文翻译成简体中文,并联系上下文正确使用人称代词,不擅自添加原文中没有的代词。<|im_end|>\n" \ # system prompt
+ "<|im_start|>user\n" + user_prompt + "<|im_end|>\n" \ # user prompt
+ "<|im_start|>assistant\n" # assistant prompt start
# 如果术语表为空,也可以使用如下prompt(在术语表为空时更加推荐)
user_prompt = "将下面的日文文本翻译成中文:" + japanese
prompt = "<|im_start|>system\n你是一个轻小说翻译模型,可以流畅通顺地以日本轻小说的风格将日文翻译成简体中文,并联系上下文正确使用人称代词,不擅自添加原文中没有的代词。<|im_end|>\n" \ # system prompt
+ "<|im_start|>user\n" + user_prompt + "<|im_end|>\n" \ # user prompt
+ "<|im_start|>assistant\n" # assistant prompt start
- Downloads last month
- 11,766
Hardware compatibility
Log In to add your hardware
We're not able to determine the quantization variants.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support