metadata
license: mit
language:
- en
base_model:
- Qwen/Qwen2.5-VL-7B-Instruct
GUI-Owl
GUI-Owl is a model series developed as part of the Mobile-Agent-V3 project. It achieves state-of-the-art performance across a range of GUI automation benchmarks, including ScreenSpot-V2, ScreenSpot-Pro, OSWorld-G, MMBench-GUI, Android Control, Android World, and OSWorld. Furthermore, it can be instantiated as various specialized agents within the Mobile-Agent-V3 multi-agent framework to accomplish more complex tasks.
- Paper:
- GitHub Repository: https://github.com/X-PLUG/MobileAgent
- Online Demo: Comming soon
Performance
ScreenSpot-V2, ScreenSpot-Pro and OSWorld-G
MMBench-GUI L1, L2 and Android Control
Android World and OSWorld-Verified
Usage
Please refer to our cookbook.
Deploy
We recommand deploy GUI-Owl-7B through vllm
This script has been validated on an A100 with 96 GB of VRAM.
PIXEL_ARGS='{"min_pixels":3136,"max_pixels":10035200}'
IMAGE_LIMIT_ARGS='image=2'
MP_SIZE=1
MM_KWARGS=(
--mm-processor-kwargs $PIXEL_ARGS
--limit-mm-per-prompt $IMAGE_LIMIT_ARGS
)
vllm serve $CKPT \
--max-model-len 32768 ${MM_KWARGS[@]} \
--tensor-parallel-size $MP_SIZE \
--allowed-local-media-path '/' \
--port 4243
If you want GUI-Owl to recieve more than two images, you could increase IMAGE_LIMIT_ARGS and reduce max_pixels.
For example:
PIXEL_ARGS='{"min_pixels":3136,"max_pixels":3211264}'
IMAGE_LIMIT_ARGS='image=5'
MP_SIZE=1
MM_KWARGS=(
--mm-processor-kwargs $PIXEL_ARGS
--limit-mm-per-prompt $IMAGE_LIMIT_ARGS
)
vllm serve $CKPT \
--max-model-len 32768 ${MM_KWARGS[@]} \
--tensor-parallel-size $MP_SIZE \
--allowed-local-media-path '/' \
--port 4243