STELLA-VLM-32b

STELLA-VLM-32b is a fine-tuned version of Qwen2.5-VL-32B-Instruct using Group Relative Policy Optimization (GRPO) with LoRA.

Model Details

  • Base Model: Qwen/Qwen2.5-VL-32B-Instruct
  • Fine-tuning Method: GRPO (Group Relative Policy Optimization) with LoRA (rank=64)
  • Training Data: Scientific protocol datasets (jove_llamafactory, finebio)
  • Parameters: 34B total parameters with 566M trainable LoRA parameters (1.66%)

Training Configuration

  • LoRA rank: 64
  • LoRA alpha: 128
  • Training epochs: 3 (checkpoint saved at step 400)
  • Batch size: 4
  • Learning rate: 2e-4
  • Reward function: Rule-based with length and repetition penalties

Usage

from transformers import AutoModelForVision2Seq, AutoProcessor
import torch

model = AutoModelForVision2Seq.from_pretrained(
    "Zaixi/STELLA-VLM-32b",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True
)
processor = AutoProcessor.from_pretrained("Zaixi/STELLA-VLM-32b", trust_remote_code=True)

Fine-tuning Details

This model was fine-tuned using GRPO on scientific protocol datasets to improve instruction following and consistency in generating scientific content. The model shows improved performance on:

  • Scientific protocol understanding
  • Consistent response generation
  • Following detailed instructions
  • Multimodal reasoning tasks

License

Apache 2.0

Downloads last month
-
Safetensors
Model size
33B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Zaixi/STELLA-VLM-32b

Finetuned
(56)
this model
Quantizations
2 models