Instructions to use Jinx-org/Jinx-gpt-oss-20b-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use Jinx-org/Jinx-gpt-oss-20b-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Jinx-org/Jinx-gpt-oss-20b-GGUF", filename="jinx-gpt-oss-20b-Q2_K.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use Jinx-org/Jinx-gpt-oss-20b-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Jinx-org/Jinx-gpt-oss-20b-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf Jinx-org/Jinx-gpt-oss-20b-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Jinx-org/Jinx-gpt-oss-20b-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf Jinx-org/Jinx-gpt-oss-20b-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf Jinx-org/Jinx-gpt-oss-20b-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf Jinx-org/Jinx-gpt-oss-20b-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf Jinx-org/Jinx-gpt-oss-20b-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf Jinx-org/Jinx-gpt-oss-20b-GGUF:Q4_K_M
Use Docker
docker model run hf.co/Jinx-org/Jinx-gpt-oss-20b-GGUF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use Jinx-org/Jinx-gpt-oss-20b-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Jinx-org/Jinx-gpt-oss-20b-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Jinx-org/Jinx-gpt-oss-20b-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Jinx-org/Jinx-gpt-oss-20b-GGUF:Q4_K_M
- Ollama
How to use Jinx-org/Jinx-gpt-oss-20b-GGUF with Ollama:
ollama run hf.co/Jinx-org/Jinx-gpt-oss-20b-GGUF:Q4_K_M
- Unsloth Studio new
How to use Jinx-org/Jinx-gpt-oss-20b-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Jinx-org/Jinx-gpt-oss-20b-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Jinx-org/Jinx-gpt-oss-20b-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Jinx-org/Jinx-gpt-oss-20b-GGUF to start chatting
- Pi new
How to use Jinx-org/Jinx-gpt-oss-20b-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Jinx-org/Jinx-gpt-oss-20b-GGUF:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "Jinx-org/Jinx-gpt-oss-20b-GGUF:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use Jinx-org/Jinx-gpt-oss-20b-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Jinx-org/Jinx-gpt-oss-20b-GGUF:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default Jinx-org/Jinx-gpt-oss-20b-GGUF:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use Jinx-org/Jinx-gpt-oss-20b-GGUF with Docker Model Runner:
docker model run hf.co/Jinx-org/Jinx-gpt-oss-20b-GGUF:Q4_K_M
- Lemonade
How to use Jinx-org/Jinx-gpt-oss-20b-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull Jinx-org/Jinx-gpt-oss-20b-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.Jinx-gpt-oss-20b-GGUF-Q4_K_M
List all available models
lemonade list
MXFP4_MOE
Please, add MXFP4_MOE.
As far as I know, modified GPT-OSS is unable to support MXFP4 anymore. If you know of any solution for supporting MXFP4, please let us know.
Best,
Jinx Team.
As far as I know, modified GPT-OSS is unable to support MXFP4 anymore. If you know of any solution for supporting MXFP4, please let us know.
Best,
Jinx Team.
Hi! it exists, someone made it and I downloaded it, but then deleted it due to lack of space.
as a result, I also decided to focus on it, it was optimal. but she was no longer on huggingface, she was probably deleted.
Thanks for your information. I find it does support MXFP4_MOE. And I converted a MXFP4_MOE version. Then I manually tested some queries. The safety filter comes back from nowhere, and I don't know why. Thus I skip to make MXFP4_MOE for now. Maybe you can try it by yourself.
https://github.com/ggml-org/llama.cpp/blob/618575c5825d7d4f170e686e772178d2aae148ae/tools/quantize/quantize.cpp#L25
Best,
Jinx Team
Thanks for your information. I find it does support
MXFP4_MOE. And I converted aMXFP4_MOEversion. Then I manually tested some queries. The safety filter comes back from nowhere, and I don't know why. Thus I skip to makeMXFP4_MOEfor now. Maybe you can try it by yourself.
https://github.com/ggml-org/llama.cpp/blob/618575c5825d7d4f170e686e772178d2aae148ae/tools/quantize/quantize.cpp#L25Best,
Jinx Team
try add --tensor-type ".*ffn_gate_exps.weight=Q8_0 or Q4_1
It is notable that "safety" filter coming back with MXFP4 is something that was mentioned in another uncensored model: https://huggingface.co/huizimao/gpt-oss-20b-uncensored-mxfp4 (they have it mentioned in the model card, refusal rate increases by many times if MXFP4 is used). So this may be a general problem of GPT-OSS that is not specific to Jinx. My guess it may be because the model was fine-tuned in BF16, and after quantizing to MXFP4 some part of fine-tuning gets lost due to rounding errors, perhaps because MXFP4 quant for best results requires further direct fine-tuning of its MXFP4 weights after quantization.
In any case, thanks to Jinx for sharing a great uncensored model, I think this is the first uncensored GPT-OSS model that preserved intelligence properly. Would be great if 120B version made too if possible (even with just standard quantization, it still would be awesome).
Thanks for your information. I find it does support
MXFP4_MOE. And I converted aMXFP4_MOEversion. Then I manually tested some queries. The safety filter comes back from nowhere, and I don't know why. Thus I skip to makeMXFP4_MOEfor now. Maybe you can try it by yourself.
https://github.com/ggml-org/llama.cpp/blob/618575c5825d7d4f170e686e772178d2aae148ae/tools/quantize/quantize.cpp#L25Best,
Jinx Team
Try upcasting the model weights to F32 (convert the model to F32 instead of BF16), and then quantize to GGUF MXFP4_MOE. π
https://huggingface.co/Joseph717171/Jinx-gpt-OSS-20B-GGUF
Thanks Jinx team for your wonderful work on GPT-OSS-20B. I upcasted the model weights to F32, quantized to GGUF MXFP4_MOE.
Was the problem regarding the refusal when using MXFP4_MOE is still prevalent?
Hi @ParthProLegend , you can try our provided mxfp4 model. It works fine when we test it. Looking forward to hear your feedback!
I tested them(Q5, MXFP4), MXFP4 has better performance than Q5, and that "safety" filter is also NOT coming back. Not to mention, MXFP4 gives higher quality outputs too than Q5. Thanks for your excellent work man, much appreciated.

