MedGemma-1.5-4b-it ExecuTorch

This repository contains the medgemma-1.5-4b-it model converted to ExecuTorch format for on-device inference in the Android application.

Conversion Details

The model was converted using a custom fork of optimum-executorch that includes critical fixes for:

Extended Context Window: Enables processing sequences up to 128K tokens (vs default 2048).
Correct EOS Handling: Properly sets End-of-Sequence token IDs [1, 106] for correct generation termination.

Prerequisites

# Setup environment
uv venv --python 3.12
source .venv/bin/activate

# Clone and setup the custom optimum-executorch repository
git clone https://github.com/kamalkraj/optimum-executorch.git
cd optimum-executorch
git checkout merge-eos-and-max-seq

# Install dependencies (requires torch 2.9.0 and torchao 0.14.1 for correct tracing)
uv pip install '.[dev]' torch==2.9.0 torchao==0.14.1

Export Command

The model is exported using the optimum-cli with XNNPACK recipe, using 8-bit dynamic activation and 4-bit weight quantization (8da4w).

optimum-cli export executorch \
        --model "google/medgemma-1.5-4b-it" \
        --task "multimodal-text-to-text" \
        --recipe "xnnpack" \
        --device cpu \
        --use_custom_sdpa \
        --use_custom_kv_cache \
        --qlinear 8da4w \
        --qlinear_group_size 32 \
        --qlinear_encoder "8da4w,8da8w" \
        --qlinear_encoder_group_size 32 \
        --qembedding "8w" \
        --qembedding_encoder "8w" \
        --max_seq_len 131072 \
        --output_dir="medgemma-1.5-4b-it-8da4w-executorch"

Memory Usage Note: The above command uses a maximum context length of 128K tokens, requiring around 10-11GB of RAM on-device. To reduce memory usage, you can decrease --max_seq_len (e.g., to 4096 or 8192) before exporting, which will still allow effective inference while fitting within the constraints of lower-end devices.

Downloads last month: 63

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kamalkraj/medgemma-1.5-4b-it-executorch

Base model

google/medgemma-1.5-4b-it

Finetuned

(57)

this model