Instructions to use dousery/medical-reasoning-gpt-oss-20b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use dousery/medical-reasoning-gpt-oss-20b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="dousery/medical-reasoning-gpt-oss-20b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("dousery/medical-reasoning-gpt-oss-20b", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use dousery/medical-reasoning-gpt-oss-20b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "dousery/medical-reasoning-gpt-oss-20b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dousery/medical-reasoning-gpt-oss-20b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/dousery/medical-reasoning-gpt-oss-20b

SGLang

How to use dousery/medical-reasoning-gpt-oss-20b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "dousery/medical-reasoning-gpt-oss-20b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dousery/medical-reasoning-gpt-oss-20b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "dousery/medical-reasoning-gpt-oss-20b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dousery/medical-reasoning-gpt-oss-20b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use dousery/medical-reasoning-gpt-oss-20b with Docker Model Runner:
```
docker model run hf.co/dousery/medical-reasoning-gpt-oss-20b
```

Medical Reasoning GPT-OSS-20B

Model Description

This is a fine-tuned version of openai/gpt-oss-20b specifically optimized for medical reasoning and clinical decision-making. The model has been trained on high-quality medical reasoning datasets to provide accurate and thoughtful responses to medical queries.

🏥 Key Features

Medical Expertise: Specialized in medical reasoning, diagnosis, and clinical decision-making
Complex Reasoning: Uses chain-of-thought reasoning for medical problems
Adapter-Only Training: Only LoRA layers are trained, base model remains frozen
Efficient: Lightweight fine-tuning, smaller storage footprint
Ready-to-Use: Requires base model + adapter for inference

🚀 Quick Start

#pip install torch --index-url https://download.pytorch.org/whl/cu128
#pip install "trl>=0.20.0" "peft>=0.17.0" "transformers>=4.55.0"

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
import re

base_model_name = "openai/gpt-oss-20b"
adapter_name = "dousery/medical-reasoning-gpt-oss-20b"

tokenizer = AutoTokenizer.from_pretrained(base_model_name)

base_model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

model = PeftModel.from_pretrained(base_model, adapter_name)
model = model.merge_and_unload()

messages = [
    {"role": "system", "content": "You are a medical reasoning assistant."},
    {"role": "user", "content": (
        """A 55-year-old man has chest pain and elevated troponin I without ST elevation.
         What is the diagnosis and what additional test would you order next?"""
    )}
]

prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=2048,
    temperature=0.2,
    do_sample=False
)

raw_output = tokenizer.decode(outputs[0], skip_special_tokens=False)

#  PARSING THE OUTPUT
thinking_pattern = r"<\|end\|><\|start\|>assistant<\|channel\|>analysis<\|message\|>(.*?)<\|end\|>"
final_pattern = r"<\|start\|>assistant<\|channel\|>final<\|message\|>(.*?)<\|return\|>"

thinking_match = re.search(thinking_pattern, raw_output, re.DOTALL)
final_match = re.search(final_pattern, raw_output, re.DOTALL)

thinking_text = thinking_match.group(1).strip() if thinking_match else "N/A"
final_text = final_match.group(1).strip() if final_match else "N/A"

print("Thinking:", thinking_text)
print("\nFinal:", final_text)

📊 Training Details

Training Data

Dataset: Freedomintelligence/medical-o1-reasoning-SFT
Language: English
Size: 19,704 medical reasoning examples
Format: Question-Answer pairs with complex chain-of-thought reasoning

Training Configuration

Base Model: unsloth/gpt-oss-20b (20B parameters)
Training Method: LoRA (adapter-only fine-tuning)
LoRA Rank: 8
Learning Rate: 5e-5
Batch Size: 4 per device, gradient_accumulation_steps=4
Epochs: 2
Max Sequence Length: 2048
LR Scheduler: Cosine, warmup_ratio=0.05
Final Training Loss: 1.22

Model Architecture

Parameters: 20.9 billion
Architecture: GPT-OSS (Transformer-based)
Context Length: 2.048 tokens
Trainable Parameters: 3.98M (0.02% of total)

🎯 Intended Use

Primary Use Cases

Medical Education: Explaining medical concepts and procedures
Clinical Reasoning: Analyzing symptoms and differential diagnosis
Research Support: Assisting in medical research and literature review
Decision Support: Providing reasoning for clinical decisions (with human oversight)

⚠️ Important Disclaimers

Not a Medical Device: This model is for educational and research purposes only
Human Oversight Required: All medical decisions should involve qualified healthcare professionals
Accuracy Not Guaranteed: Model outputs should be verified against current medical literature
Regional Variations: Training data may not reflect all regional medical practices

🔍 Evaluation

The model demonstrates strong performance in:

Medical concept explanation
Differential diagnosis reasoning
Treatment option analysis
Pathophysiology understanding

Note: Comprehensive clinical evaluation is ongoing. Always validate outputs with current medical guidelines.

🛠️ Technical Requirements

Minimum Requirements

GPU Memory: 16GB+ VRAM recommended
RAM: 32GB+ system memory
Storage: 40GB+ free space

📜 License

This model is released under the Apache 2.0 license. Please review the license terms before commercial use.

🙏 Acknowledgments

Base Model: openai/gpt-oss-20b
Adapter/Training: dousery/medical-reasoning-gpt-oss-20b
Dataset: Freedomintelligence
Infrastructure: Modal Labs for GPU compute

Downloads last month: 313

Model tree for dousery/medical-reasoning-gpt-oss-20b

Base model

openai/gpt-oss-20b

Finetuned

(513)

this model

Evaluation results

Training Loss on Medical O1 Reasoning SFT
self-reported

1.220