How do I convert the model weights? Are there any example tutorials?

#1
by ChangyuLiu - opened

First of all, thank you for your excellent work.
Then, I'd like to know how you obtained this weight? Could you please explain your processing steps in detail? Thank you very much.

Hey @ChangyuLiu ,happy to explain how I got the weights from the Apple repo:
https://github.com/apple/ml-fastvlm

Here's some pseudo-ish code that probably won't run but should give you the gist. Note that you'll need to download one of their checkpoints. Basically I just load their model from a checkpoint, traverse the module tree and yank out the vision tower model.

from llava.model.builder import load_pretrained_model
from llava.mm_utils import tokenizer_image_token, process_images, get_model_name_from_path

model_name = get_model_name_from_path(model_path)
_, model, _, _ = load_pretrained_model(model_path, args.model_base, model_name, device="mps")
fast_vit = model.get_model().get_vision_tower().vision_tower.model
torch.save(fast_vit.state_dict(), os.path.join(fast_vit_path, "fast_vit.pth"))

This is a very useful tutorial, I understand it now. THX.

ChangyuLiu changed discussion status to closed

Sign up or log in to comment