Instructions to use invincible-jha/Orsta-32B-0321 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use invincible-jha/Orsta-32B-0321 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="invincible-jha/Orsta-32B-0321") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("invincible-jha/Orsta-32B-0321") model = AutoModelForImageTextToText.from_pretrained("invincible-jha/Orsta-32B-0321") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use invincible-jha/Orsta-32B-0321 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "invincible-jha/Orsta-32B-0321" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "invincible-jha/Orsta-32B-0321", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/invincible-jha/Orsta-32B-0321
- SGLang
How to use invincible-jha/Orsta-32B-0321 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "invincible-jha/Orsta-32B-0321" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "invincible-jha/Orsta-32B-0321", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "invincible-jha/Orsta-32B-0321" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "invincible-jha/Orsta-32B-0321", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use invincible-jha/Orsta-32B-0321 with Docker Model Runner:
docker model run hf.co/invincible-jha/Orsta-32B-0321
One RL to See Them All
- 🐙 GitHub Repo: MiniMax-AI/One-RL-to-See-Them-All
- 📜 Paper (arXiv): V-Triune: One RL to See Them All (arXiv:2505.18129)
- 💾 Dataset: Orsta-Data-47k on Hugging Face
Model Overview
Orsta-32B-0321 is a cutting-edge vision-language model (VLM) designed to achieve superior performance across a wide spectrum of both visual reasoning and visual perception tasks. This model is a result of post-training with V-Triune, our novel unified reinforcement learning (RL) system.
The V-Triune system enables VLMs to be jointly optimized on diverse multimodal tasks within a single, cohesive training pipeline. Orsta-32B-0321 has been specifically trained using V-Triune on a carefully curated set of eight challenging visual tasks, fostering robust generalization and enhanced capabilities.
Training with V-Triune
Orsta-32B-0321's advanced abilities stem from its training with the V-Triune system. Key aspects of its training include:
Unified RL Framework (V-Triune): V-Triune is a Visual Triple-Unified Reinforcement Learning system featuring three core complementary components:
- Sample-Level Data Formatting (to unify diverse task inputs)
- Verifier-Level Reward Computation (to deliver custom rewards via specialized verifiers)
- Source-Level Metric Monitoring (to diagnose problems at the data-source level) * It also incorporates an innovative Dynamic IoU reward mechanism, crucial for optimizing visual perception tasks. You can find more details in our paper: V-Triune
Diverse Joint Task Optimization: Orsta-32B-0321 was jointly optimized on the following eight visual tasks:
- Visual Reasoning Tasks: Mathematics, Science Question Answering, Chart Understanding, and Puzzle Solving.
- Visual Perception Tasks: Object Detection, Visual Grounding, Optical Character Recognition (OCR), and Object Counting.
This comprehensive training allows Orsta-32B-0321 to develop a deeper understanding of visual content and its relation to textual prompts, excelling in tasks that require intricate reasoning and precise perception.
Performance
| Model | Knowledge | Mathematics | Perception | Coding | Info. Ex. | Planning | Science | Metrics | MEGA-Bench Core |
|---|---|---|---|---|---|---|---|---|---|
| QwenVL-2.5-32B-0321 | 8.48 | 12.62 | 11.99 | 13.59 | 15.44 | 8.61 | 16.78 | 14.91 | 11.87 |
| MM-Eureka-32B 💡 | 12.20 | 20.19 | 21.88 | 15.86 | 21.23 | 15.47 | 19.95 | 22.77 | 18.57 |
| VL-Rethinker-32B 💡 | 12.16 | 28.09 | 22.99 | 11.89 | 21.50 | 15.09 | 28.10 | 15.73 | 19.41 |
| Orsta-32B-0321 (Ours) 💡 | 21.33 | 28.55 | 32.23 | 19.44 | 26.38 | 17.78 | 33.20 | 24.18 | 25.94 |
| - | - | - | - | - | - | - | - | - | - |
| Δ (Ours - Backbone) | +12.9 | +15.9 | +20.2 | +5.9 | +10.9 | +9.2 | +16.4 | +9.3 | +14.1 |
How to Use
Orsta-32B-0321 is developed by post-training the Qwen2.5-VL-32B-Instruct (0321 checkpoint) model using our V-Triune reinforcement learning system. The Qwen2.5-VL-32B-Instruct (0321 checkpoint) is a publicly available baseline known for its reliable core reasoning abilities, alongside certain recognized limitations in perception and output formatting (which have been addressed in subsequent Qwen releases). Applying V-Triune to this specific baseline demonstrates its powerful post-training capability to unlock the model's inherent potential and significantly elevate its performance by refining and amplifying existing strengths.
Consequently, the core usage of Orsta-32B-0321, particularly regarding input formatting and model interaction, largely follows the established patterns of the Qwen2.5-VL series. Users familiar with Qwen2.5-VL models should find the interface intuitive.
For comprehensive details on the general capabilities of Qwen2.5-VL models, including multi-turn dialogue format and image input specifics, we recommend referring to the official Qwen2.5-VL series documentation (please ensure to consult information relevant to the 32B Instruct version).
Citation 🏆
If you use Orsta-32B-0321 or the V-Triune system in your research, please cite our work:
@article{ma2025one,
title={One RL to See Them All: Visual Triple Unified Reinforcement Learning},
author={Ma, Yan and Du, Linge and Shen, Xuyang and Chen, Shaoxiang and Li, Pengfei and Ren, Qibing and Ma, Lizhuang and Dai, Yuchao and Liu, Pengfei and Yan, Junjie},
journal={arXiv preprint arXiv:2505.18129},
year={2025}
}
- Downloads last month
- 2
Model tree for invincible-jha/Orsta-32B-0321
Base model
Qwen/Qwen2.5-VL-32B-Instruct