Safetensors
English
qwen2_5_vl
GUI-Owl-7B / README.md
Mizukiluke's picture
Create README.md
6d89162 verified
|
raw
history blame
2.73 kB
metadata
license: mit
language:
  - en
base_model:
  - Qwen/Qwen2.5-VL-7B-Instruct

GUI-Owl

GUI-Owl is a model series developed as part of the Mobile-Agent-V3 project. It achieves state-of-the-art performance across a range of GUI automation benchmarks, including ScreenSpot-V2, ScreenSpot-Pro, OSWorld-G, MMBench-GUI, Android Control, Android World, and OSWorld. Furthermore, it can be instantiated as various specialized agents within the Mobile-Agent-V3 multi-agent framework to accomplish more complex tasks.

Performance

ScreenSpot-V2, ScreenSpot-Pro and OSWorld-G

MMBench-GUI L1, L2 and Android Control

Android World and OSWorld-Verified

Usage

Please refer to our cookbook.

Deploy

We recommand deploy GUI-Owl-7B through vllm

This script has been validated on an A100 with 96 GB of VRAM.

PIXEL_ARGS='{"min_pixels":3136,"max_pixels":10035200}'
IMAGE_LIMIT_ARGS='image=2'
MP_SIZE=1
MM_KWARGS=(
    --mm-processor-kwargs $PIXEL_ARGS
    --limit-mm-per-prompt $IMAGE_LIMIT_ARGS
)

vllm serve $CKPT \
    --max-model-len 32768 ${MM_KWARGS[@]} \
    --tensor-parallel-size $MP_SIZE \
    --allowed-local-media-path '/' \
    --port 4243

If you want GUI-Owl to recieve more than two images, you could increase IMAGE_LIMIT_ARGS and reduce max_pixels.

For example:

PIXEL_ARGS='{"min_pixels":3136,"max_pixels":3211264}'
IMAGE_LIMIT_ARGS='image=5'
MP_SIZE=1
MM_KWARGS=(
    --mm-processor-kwargs $PIXEL_ARGS
    --limit-mm-per-prompt $IMAGE_LIMIT_ARGS
)

vllm serve $CKPT \
    --max-model-len 32768 ${MM_KWARGS[@]} \
    --tensor-parallel-size $MP_SIZE \
    --allowed-local-media-path '/' \
    --port 4243