P1-VL: Bridging Visual Perception and Scientific Reasoning in Physics Olympiads

📜 Paper | 💻 Code | 🌐 Project Page | 🏆 HiPhO Leaderboard

Flagship vision-language model achieving No.3 performance in physics reasoning

Model Description

P1-VL-235B-A22B is the flagship variant of the P1-VL series, a high-performance open-source vision-language model specialized in physics reasoning. It was introduced in P1-VL: Bridging Visual Perception and Scientific Reasoning in Physics Olympiads.

Built on Qwen3-VL-235B-A22B-Thinking and refined through multi-stage reinforcement learning on curated physics competition data, P1-VL-235B-A22B becomes the first open-source Vision-Language Model (VLM) to secure 12 gold medals on HiPhO, ranking No.3 in the model leaderboard. The model effectively solves tasks that require precise diagram-to-logic alignment, demonstrating exceptional performance in physics Olympiad competitions.

Key Highlights

🥇 HiPhO Excellence: First open-source VLM to secure 12 gold medals, ranking No.3 globally. When augmented with PhysicsMinions, P1-VL-235B-A22B achieves No.2.
🏆 IPhO 2025 Gold-tier Performance: Achieving gold medal performance on International Physics Olympiad
📊 FrontierScience-Olympiad: Total score of 64.3/100, outperforming text-only sibling by 2.3 points. When augmented with PhysicsMinions, secures state-of-the-art performance among all evaluated open-source models
🎯 STEM Generalization: Consistent improvements over base model across math, and multimodal benchmarks

Performance Benchmarks

HiPhO Results

Evaluated on HiPhO, a rigorous benchmark of 13 exams from 2024–2025, P1-VL-235B-A22B demonstrates top-tier physics reasoning capabilities.

Model	Ranking	Gold Medals	Performance
P1-VL-235B-A22B	No. 3	12 🥇	First open-source VLM with 12 gold medals
P1-VL-235B-A22B+PhysicsMinions	No. 2	12 🥇	Trailing only Gemini-3-Pro globally

FrontierScience-Olympiad Benchmark

P1-VL-235B-A22B achieves significant gains over its base counterpart across all three scientific domains. Remarkably, even on this predominantly text-based benchmark, the multimodal P1-VL-235B-A22B outperforms its text-only sibling (P1-235B-A22B) by a margin of 2.3 points.

Model	Biology/10	Chemistry/40	Physics/50	Total/100
P1-VL-235B-A22B+PhysicsMinions	26.3	77.2	67.3	67.1
P1-VL-235B-A22B	30.0	71.3	65.5	64.3
P1-235B-A22B+PhysicsMinions	30.0	71.0	68.0	65.4
P1-235B-A22B	22.5	67.2	65.8	62.0
Qwen3-VL-235B-A22B-Thinking	26.3	61.9	57.8	56.3
Qwen3-235B-A22B-Thinking-2507	26.3	58.1	57.3	54.5

STEM Benchmarks

Beyond physics reasoning, P1-VL-235B-A22B demonstrates strong generalization across multiple domains, consistently outperforming its base model Qwen3-VL-235B-A22B-Thinking on both text-only and multimodal benchmarks.

Benchmark	P1-VL-235B-A22B	Qwen3-VL-235B-A22B-Thinking
AIME24	93.8	93.3
AIME25	92.1	90.8
HMMT-Feb	83.3	72.9
HMMT-Nov	88.3	84.2
IMO-Answerbench	70.6	62.3
AMOBench	47.5	39.0
BeyondAIME	70.6	68.5
Brumo	93.3	90.0
CMICC	83.1	81.6
GPQA	81.4	77.1
LiveBench	79.9	79.4
HLE	15.9	13.9
MMMU	78.0	77.2
MMMU-Pro	70.2	69.7
EMMA-Mini	71.3	69.6
MathVista-Mini	83.9	82.6

Usage

from transformers import Qwen3VLMoeForConditionalGeneration, AutoProcessor
from PIL import Image

model_name = "PRIME-RL/P1-VL-235B-A22B"

# Load model and processor
model = Qwen3VLMoeForConditionalGeneration.from_pretrained(
    model_name, dtype="auto", device_map="auto"
)
processor = AutoProcessor.from_pretrained(model_name)

# Load diagram image
image = Image.open("physics_diagram.png")

# Physics problem with visual input
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": image,
            },
            {
                "type": "text",
                "text": """Analyze this physics diagram and solve the problem:

A block of mass m is placed on an inclined plane with angle θ.
The coefficient of kinetic friction is μ.
Calculate the acceleration of the block down the incline.""",
            },
        ],
    }
]

# Preparation for inference
inputs = processor.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_dict=True,
    return_tensors="pt"
)

# Inference: Generation of the output
generated_ids = model.generate(**inputs, max_new_tokens=8192)
generated_ids_trimmed = [
    out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text[0])

🙏 Acknowledgements

We are grateful to the open-source community for their invaluable contributions. Special thanks to:

Qwen3-VL - for providing the foundational base models that powered our research
verl - for the versatile reinforcement learning framework that enabled our training pipeline
vLLM - for the efficient LLM serving and inference infrastructure
Megatron-LM - for the large-scale model training framework

Citation

@misc{p1vl2025,
  title={P1-VL: Bridging Visual Perception and Scientific Reasoning in Physics Olympiads},
  author={P1 Team},
  year={2026},
  url={https://arxiv.org/abs/2602.09443}
}

Downloads last month: 22

Safetensors

Model size

236B params

Tensor type

BF16

Paper for PRIME-RL/P1-VL-235B-A22B

P1-VL: Bridging Visual Perception and Scientific Reasoning in Physics Olympiads

Paper • 2602.09443 • Published 4 days ago • 56