Instructions to use invincible-jha/Orsta-32B-0326 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use invincible-jha/Orsta-32B-0326 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="invincible-jha/Orsta-32B-0326")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("invincible-jha/Orsta-32B-0326")
model = AutoModelForImageTextToText.from_pretrained("invincible-jha/Orsta-32B-0326")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use invincible-jha/Orsta-32B-0326 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "invincible-jha/Orsta-32B-0326"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "invincible-jha/Orsta-32B-0326",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/invincible-jha/Orsta-32B-0326

SGLang

How to use invincible-jha/Orsta-32B-0326 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "invincible-jha/Orsta-32B-0326" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "invincible-jha/Orsta-32B-0326",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "invincible-jha/Orsta-32B-0326" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "invincible-jha/Orsta-32B-0326",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use invincible-jha/Orsta-32B-0326 with Docker Model Runner:
```
docker model run hf.co/invincible-jha/Orsta-32B-0326
```

One RL to See Them All: Visual Triple Unified Reinforcement Learning

🐙 GitHub Repo: MiniMax-AI/One-RL-to-See-Them-All
📜 Paper (arXiv): V-Triune: One RL to See Them All (arXiv:2505.18129)
💾 Dataset: Orsta-Data-47k on Hugging Face

Model Overview

Orsta-Orsta-32B-0326 is a cutting-edge vision-language model (VLM) designed to achieve superior performance across a wide spectrum of both visual reasoning and visual perception tasks. This model is a result of post-training with V-Triune, our novel unified reinforcement learning (RL) system.

The V-Triune system enables VLMs to be jointly optimized on diverse multimodal tasks within a single, cohesive training pipeline. Orsta-7B has been specifically trained using V-Triune on a carefully curated set of eight challenging visual tasks, fostering robust generalization and enhanced capabilities.

Training with V-Triune

Orsta-32B-0326's advanced abilities stem from its training with the V-Triune system. Key aspects of its training include:

Unified RL Framework (V-Triune): V-Triune is a Visual Triple-Unified Reinforcement Learning system featuring three core complementary components:
- Sample-Level Data Formatting (to unify diverse task inputs)
- Verifier-Level Reward Computation (to deliver custom rewards via specialized verifiers)
- Source-Level Metric Monitoring (to diagnose problems at the data-source level)
- It also incorporates an innovative Dynamic IoU reward mechanism, crucial for optimizing visual perception tasks. You can find more details in our paper: V-Triune
Diverse Joint Task Optimization: Orsta-32B-0326 was jointly optimized on the following eight visual tasks:
- Visual Reasoning Tasks: Mathematics, Science Question Answering, Chart Understanding, and Puzzle Solving.
- Visual Perception Tasks: Object Detection, Visual Grounding, Optical Character Recognition (OCR), and Object Counting.

This comprehensive training allows Orsta-32B-0326 to develop a deeper understanding of visual content and its relation to textual prompts, excelling in tasks that require intricate reasoning and precise perception.

Performance

Model	Knowledge	Mathematics	Perception	Coding	Info. Ex.	Planning	Science	Metrics	MEGA-Bench Core
Gemma3-27B	49.43	42.20	45.46	40.18	49.30	24.96	47.08	58.99	41.82 †
QwenVL-2.5-32B-0326	46.09	32.04	47.55	38.36	61.65	28.43	37.55	50.38	43.67
InternVL-3-38B	46.32	40.29	55.05	45.29	56.63	22.88	52.04	58.04	46.69
Skywork-R1V-38B 💡	25.59	28.45	22.95	19.88	19.53	9.74	22.64	37.55	21.54
Skywork-R1V2-38B 💡	17.08	12.38	15.65	7.14	9.90	17.60	14.29	0.0	15.39
Orsta-32B-0326 (Ours) 💡	46.78	37.43	50.86	38.92	63.14	28.05	42.68	53.01	45.78
-	-	-	-	-	-	-	-	-	-
Δ (Ours - Backbone)	+0.7	+5.4	+3.3	+0.6	+1.5	-0.4	+5.1	+2.6	+2.1

How to Use

Orsta-32B-0326 is developed by post-training the latest Qwen2.5-VL-32B-Instruct model using our V-Triune reinforcement learning system. Consequently, its core usage, particularly regarding input formatting and model interaction, largely follows the established patterns of the Qwen2.5-VL series.

For comprehensive details on the base model's capabilities, multi-turn dialogue format, image input encoding specifics, and other functionalities, we recommend referring to the official Qwen2.5-VL documentation.

Citation 🏆

If you use Orsta-32B-0326 or the V-Triune system in your research, please cite our work:

@article{ma2025one,
      title={One RL to See Them All: Visual Triple Unified Reinforcement Learning}, 
      author={Ma, Yan and Du, Linge and Shen, Xuyang and Chen, Shaoxiang and Li, Pengfei and Ren, Qibing and Ma, Lizhuang and Dai, Yuchao and Liu, Pengfei and Yan, Junjie},
      journal={arXiv preprint arXiv:2505.18129},
      year={2025}
}

Project Page

https://github.com/MiniMax-AI/One-RL-to-See-Them-All.

Downloads last month: 4

Safetensors

Model size

33B params

Tensor type

BF16

Model tree for invincible-jha/Orsta-32B-0326

Base model

Qwen/Qwen2.5-VL-32B-Instruct

Finetuned

(67)

this model

Dataset used to train invincible-jha/Orsta-32B-0326

Paper for invincible-jha/Orsta-32B-0326

One RL to See Them All: Visual Triple Unified Reinforcement Learning

Paper • 2505.18129 • Published May 23, 2025 • 62